Use your bonce
2009-05-15 09:44Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.
I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.
bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.
bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.
It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.
no subject
Date: 2009-05-15 09:20 (UTC)no subject
Date: 2009-05-15 09:25 (UTC)Unfortunately it didn't speed matters up much in my case, since the back end of ClearCase is a big database file which, brilliantly, somebody thought it would be a good idea to mount via NFS from the other side of the Atlantic. A wink-in from that database took almost as long as a local compile would have!
I too have often thought that it would be neat to construct a make utility that automatically tracked dependencies via
LD_PRELOAD, but never quite had the energy to try it.no subject
Date: 2009-05-15 10:26 (UTC)no subject
Date: 2009-05-15 10:53 (UTC)no subject
Date: 2009-05-15 18:46 (UTC)no subject
Date: 2009-05-15 18:52 (UTC)no subject
Date: 2009-05-15 11:01 (UTC)no subject
Date: 2009-05-15 11:09 (UTC).dfiles).I'm inclined to think that if I were writing a dependency-tracking make utility of this nature, I'd do the same trick as the IDE that came with BeOS: every compile generated a
.d, but the IDE then proactively read it and incorporated its contents into the project file. I'm sure it would be possible to set up some sort of database file alongside the makefile such that you could get away without having to think about all the dependencies every time: as soon as you know which files have changed, a dependency database with the right data structure should be able to tell you what recompiles need doing in O(number of changes) time rather than O(size of entire project).no subject
Date: 2009-05-15 13:33 (UTC)no subject
Date: 2009-05-15 14:17 (UTC)Ultimately we're going to have to make some architectural moats that are difficult to cross and have more than one compile unit, but that's a huge amount of work.
no subject
Date: 2009-05-15 14:34 (UTC)no subject
Date: 2009-05-15 17:21 (UTC)(This would have been easy enough in pure C, where every function would have had a distinct name and I could have got close enough just by grepping for all the function names. In C++ it's much more of a pain, because ninety-seven classes all have similarly named methods and only an actual compiler can disambiguate which occurrences of DoFoo() in the source code are references to which of them...)
Turned out that gcc was not designed helpfully for such an option – its front end has nothing remotely resembling a centralised point at which refs and defs get matched up. So I resorted to the low-tech approach of renaming everything in the affected region of header file to put a z on the front and then seeing what failed to compile, which was just about feasible given the volumes involved in my case.
But it now occurs to me that that sort of data would also be about right for trimming makefile dependencies as you suggest. Hmm. Perhaps someone ought to at least stick it on gcc's BTS, if it has a suitable one.
no subject
Date: 2009-05-15 16:11 (UTC)no subject
Date: 2009-05-15 16:15 (UTC)ccache
Date: 2010-02-17 03:07 (UTC)http://ccache.samba.org/
Re: ccache
Date: 2010-02-17 14:58 (UTC)