Version support for datastore/Progress: Difference between revisions
Sascha silbe (talk | contribs) report on progress |
Sascha silbe (talk | contribs) →2009-06-01: report on VCS evaluation progress |
||
| Line 76: | Line 76: | ||
would need to remember and pass through the resumed <code>version_id</code> in order | would need to remember and pass through the resumed <code>version_id</code> in order | ||
to use the corresponding branch on save. | to use the corresponding branch on save. | ||
Started evaluating Version Control Systems (VCS). From the 27 systems on the | |||
[http://better-scm.berlios.de/comparison/comparison.html comparison list] done by | |||
the "Better SCM Initiative", 13 are open source with 8 of them being shipped by | |||
all of Debian, Fedora and Ubuntu (the systems currently officially supported | |||
in [[Development Team/Jhbuild|sugar-jhbuild]]). | |||
While writing a [http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/trees/master/benchmarks benchmark] | |||
to help in further elimination of candidates, I noticed our use case is actually quite distinct from that targeted by most systems | |||
(which is storing a small number of projects each carrying source code, i.e. a large number of '''related''' files): | |||
# we're going to store a large number of '''un'''related entries (i.e. "projects" in traditional VCS nomenclature) | |||
# most of our entries are going to be rather small (compared to entire source trees) | |||
# for space efficiency, we don't want to keep working copies around after the activity using it has finished | |||
Point 3 offers an excellent chance for VCS' which expose their low level working primitives (e.g. git) to be tuned to our | |||
use case as we might be able to directly access the repository instead of using the working directory as intermediate | |||
storage. It will only affect timing, not repository size, though. | |||
The sample set chosen for the benchmark (789 text files from [http://www.gutenberg.org/ Project Gutenberg], 295 MB) occupied | |||
my desktop for about 10 hours, so while I've done only a single run (in multi user mode) yet the numbers should be accurate | |||
enough for an initial impression. | |||
==== Benchmark results ==== | |||
[[Image:Op-vs-time.png|thumb|Plot showing the time taken for operations common to our usage scenario for various Version Control Systems]] | |||
The operations still missing from the benchmark are looking up and checking out intermediate versions (prior to branching it) | |||
instead of the latest version of the branch. Since many VCS' store the latest version as-is ("full" copy) and only deltas | |||
of the intermediate versions, this might change the timings quite a bit. I'll also need to do a weighted summary to account | |||
for the prospective usage pattern (more commits than checkouts due to autosave and only few branches created). | |||
[[Image:Op-vs-size.png|thumb|Plot showing the space occupied after the named operations have finished]] | |||
While I included my favourite VCS, [http://www.gnu.org/software/gnu-arch/ GNU arch], just out of curiousity | |||
(unfortunately [http://lists.gnu.org/archive/html/gnu-arch-users/2008-11/msg00001.html not maintained anymore]), | |||
it's comparing very well with the other systems: It's on second place both for total (unweighted) runtime and | |||
final repository size, giving perfect balance between those two goals. | |||