Changes

Jump to navigation Jump to search
→‎2009-06-01: report on VCS evaluation progress
Line 76: Line 76:  
would need to remember and pass through the resumed <code>version_id</code> in order
 
would need to remember and pass through the resumed <code>version_id</code> in order
 
to use the corresponding branch on save.
 
to use the corresponding branch on save.
 +
 +
Started evaluating Version Control Systems (VCS). From the 27 systems on the
 +
[http://better-scm.berlios.de/comparison/comparison.html comparison list] done by
 +
the "Better SCM Initiative", 13 are open source with 8 of them being shipped by
 +
all of Debian, Fedora and Ubuntu (the systems currently officially supported
 +
in [[Development Team/Jhbuild|sugar-jhbuild]]).
 +
 +
While writing a [http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/trees/master/benchmarks benchmark]
 +
to help in further elimination of candidates, I noticed our use case is actually quite distinct from that targeted by most systems
 +
(which is storing a small number of projects each carrying source code, i.e. a large number of '''related''' files):
 +
 +
# we're going to store a large number of '''un'''related entries (i.e. "projects" in traditional VCS nomenclature)
 +
# most of our entries are going to be rather small (compared to entire source trees)
 +
# for space efficiency, we don't want to keep working copies around after the activity using it has finished
 +
 +
Point 3 offers an excellent chance for VCS' which expose their low level working primitives (e.g. git) to be tuned to our
 +
use case as we might be able to directly access the repository instead of using the working directory as intermediate
 +
storage. It will only affect timing, not repository size, though.
 +
 +
The sample set chosen for the benchmark (789 text files from [http://www.gutenberg.org/ Project Gutenberg], 295 MB) occupied
 +
my desktop for about 10 hours, so while I've done only a single run (in multi user mode) yet the numbers should be accurate
 +
enough for an initial impression.
 +
 +
==== Benchmark results ====
 +
 +
[[Image:Op-vs-time.png|thumb|Plot showing the time taken for operations common to our usage scenario for various Version Control Systems]]
 +
 +
The operations still missing from the benchmark are looking up and checking out intermediate versions (prior to branching it)
 +
instead of the latest version of the branch. Since many VCS' store the latest version as-is ("full" copy) and only deltas
 +
of the intermediate versions, this might change the timings quite a bit. I'll also need to do a weighted summary to account
 +
for the prospective usage pattern (more commits than checkouts due to autosave and only few branches created).
 +
 +
 +
[[Image:Op-vs-size.png|thumb|Plot showing the space occupied after the named operations have finished]]
 +
 +
While I included my favourite VCS, [http://www.gnu.org/software/gnu-arch/ GNU arch], just out of curiousity
 +
(unfortunately [http://lists.gnu.org/archive/html/gnu-arch-users/2008-11/msg00001.html not maintained anymore]),
 +
it's comparing very well with the other systems: It's on second place both for total (unweighted) runtime and
 +
final repository size, giving perfect balance between those two goals.
344

edits

Navigation menu