fanf | Use your bonce

Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.

I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.

bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.

bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.

It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.

Flat | Top-Level Comments Only

From:

ptc24.livejournal.com

Interesting how the "o" changes pronunciation when you contract (presumably) "build once" to "bonce".

simont

In 1997 I had a summer job working with some code stored in ClearCase, and that did a very similar set of things: interface to source code repository via magic filesystem instead of checkouts in ordinary user filespace, special make utility automatically tracking dependencies, caching of build results so it could (in its terminology) "wink in" files that had already been built to the identical specification by somebody else.

Unfortunately it didn't speed matters up much in my case, since the back end of ClearCase is a big database file which, brilliantly, somebody thought it would be a good idea to mount via NFS from the other side of the Atlantic. A wink-in from that database took almost as long as a local compile would have!

I too have often thought that it would be neat to construct a make utility that automatically tracked dependencies via LD_PRELOAD, but never quite had the energy to try it.

fanf

Vesta has the advantage of a distributed replicated repository, so it should be local enough to be fast enough.

hsenag.livejournal.com

Clearcase has that too (you might have to pay more for it) but it's still slow as hell.

ingulf.livejournal.com

There is such a thing here: http://www.op59.net/yabs/readme.html

Yay lazyweb :-)

pjc50.livejournal.com

We have now developed our own build scheduler and cacheing system at Azuro, after finding that make+ccache+distcc is starting to show serious strain on 1.1m lines of C++ monolith. It took over 30 seconds simply for make to parse its .d files...

Mmm. Yes, I have that problem at work (not nearly so many lines of code, but since I maintain a C library, that means a huge number of very tiny objects and hence no end of .d files).

I'm inclined to think that if I were writing a dependency-tracking make utility of this nature, I'd do the same trick as the IDE that came with BeOS: every compile generated a .d, but the IDE then proactively read it and incorporated its contents into the project file. I'm sure it would be possible to set up some sort of database file alongside the makefile such that you could get away without having to think about all the dependencies every time: as soon as you know which files have changed, a dependency database with the right data structure should be able to tell you what recompiles need doing in O(number of changes) time rather than O(size of entire project).

If bonce is done right it should be able to do what your last sentence says - Vesta supposedly can.

Related problem: we believe the dependencies are excessive (e.g a header file is included but nothing from it is actually referenced). The current solution to this is a tool someone here wrote called rince that can exhaustively try #include removal. However, that doesn't identify opportunities where a slight functional change could produce good decoupling benefits.

Ultimately we're going to have to make some architectural moats that are difficult to cross and have more than one compile unit, but that's a huge amount of work.

One good example of excessive dependencies is all-encompassing header files, e.g. exim.h...

A few months ago, in a different area at work, I found I wanted to do a similar sort of analysis for the purpose of finding out whether any of a substantial code base was using a particular cranny of a big sprawling API, so that I'd know how feasible it was to redesign that bit. I thought for a while about how it should be done and came to the conclusion that what I really wanted was a command-line option in gcc that would output diagnostic information every time it matched up a source-language definition to some use of that definition (whether it was the use of a type, the expansion of a macro, calling of a function or C++ method, reference to a structure field, you name it), including the file name and line number of both the reference and the definition. Given that diagnostic data from the entire code base, I could easily have grepped through it to find out what I needed.

(This would have been easy enough in pure C, where every function would have had a distinct name and I could have got close enough just by grepping for all the function names. In C++ it's much more of a pain, because ninety-seven classes all have similarly named methods and only an actual compiler can disambiguate which occurrences of DoFoo() in the source code are references to which of them...)

Turned out that gcc was not designed helpfully for such an option – its front end has nothing remotely resembling a centralised point at which refs and defs get matched up. So I resorted to the low-tech approach of renaming everything in the affected region of header file to put a z on the front and then seeing what failed to compile, which was just about feasible given the volumes involved in my case.

But it now occurs to me that that sort of data would also be about right for trimming makefile dependencies as you suggest. Hmm. Perhaps someone ought to at least stick it on gcc's BTS, if it has a suitable one.

Edited Date: 2009-05-15 17:22 (UTC)

funos.livejournal.com

I think what you want already exists: http://www.cs.berkeley.edu/~billm/memoize.html

From: (Anonymous)

It's called ccache.

http://ccache.samba.org/

ccache doesn't do dependency analysis. It operates on a per-translation-unit basis after the source has been preprocessed, so the preprocessor limits the amount of time it can save you. It also cannot skip build commands that aren't C compiler invocations, whereas bonce could.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tony Finch's blog

Use your bonce

Use your bonce

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

ccache

Re: ccache

Profile

February 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags