Version support for datastore/Progress

2,803 bytes added, 10:53, 5 June 2009
2009-06-01: report on VCS evaluation progress
would need to remember and pass through the resumed <code>version_id</code> in order
to use the corresponding branch on save.
Started evaluating Version Control Systems (VCS). From the 27 systems on the
[ comparison list] done by
the "Better SCM Initiative", 13 are open source with 8 of them being shipped by
all of Debian, Fedora and Ubuntu (the systems currently officially supported
in [[Development Team/Jhbuild|sugar-jhbuild]]).
While writing a [ benchmark]
to help in further elimination of candidates, I noticed our use case is actually quite distinct from that targeted by most systems
(which is storing a small number of projects each carrying source code, i.e. a large number of '''related''' files):
# we're going to store a large number of '''un'''related entries (i.e. "projects" in traditional VCS nomenclature)
# most of our entries are going to be rather small (compared to entire source trees)
# for space efficiency, we don't want to keep working copies around after the activity using it has finished
Point 3 offers an excellent chance for VCS' which expose their low level working primitives (e.g. git) to be tuned to our
use case as we might be able to directly access the repository instead of using the working directory as intermediate
storage. It will only affect timing, not repository size, though.
The sample set chosen for the benchmark (789 text files from [ Project Gutenberg], 295 MB) occupied
my desktop for about 10 hours, so while I've done only a single run (in multi user mode) yet the numbers should be accurate
enough for an initial impression.
==== Benchmark results ====
[[Image:Op-vs-time.png|thumb|Plot showing the time taken for operations common to our usage scenario for various Version Control Systems]]
The operations still missing from the benchmark are looking up and checking out intermediate versions (prior to branching it)
instead of the latest version of the branch. Since many VCS' store the latest version as-is ("full" copy) and only deltas
of the intermediate versions, this might change the timings quite a bit. I'll also need to do a weighted summary to account
for the prospective usage pattern (more commits than checkouts due to autosave and only few branches created).
[[Image:Op-vs-size.png|thumb|Plot showing the space occupied after the named operations have finished]]
While I included my favourite VCS, [ GNU arch], just out of curiousity
(unfortunately [ not maintained anymore]),
it's comparing very well with the other systems: It's on second place both for total (unweighted) runtime and
final repository size, giving perfect balance between those two goals.

