Changes

Version support for datastore/Progress (view source)

Revision as of 09:53, 5 June 2009

2,803 bytes added , 09:53, 5 June 2009

→‎2009-06-01: report on VCS evaluation progress

Line 76: Line 76:

would need to remember and pass through the resumed <code>version_id</code> in order

to use the corresponding branch on save.

+

Started evaluating Version Control Systems (VCS). From the 27 systems on the

+

[http://better-scm.berlios.de/comparison/comparison.html comparison list] done by

+

the "Better SCM Initiative", 13 are open source with 8 of them being shipped by

+

all of Debian, Fedora and Ubuntu (the systems currently officially supported

+

in [[Development Team/Jhbuild|sugar-jhbuild]]).

+

While writing a [http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/trees/master/benchmarks benchmark]

+

to help in further elimination of candidates, I noticed our use case is actually quite distinct from that targeted by most systems

+

(which is storing a small number of projects each carrying source code, i.e. a large number of '''related''' files):

+

# we're going to store a large number of '''un'''related entries (i.e. "projects" in traditional VCS nomenclature)

+

# most of our entries are going to be rather small (compared to entire source trees)

+

# for space efficiency, we don't want to keep working copies around after the activity using it has finished

+

Point 3 offers an excellent chance for VCS' which expose their low level working primitives (e.g. git) to be tuned to our

+

use case as we might be able to directly access the repository instead of using the working directory as intermediate

+

storage. It will only affect timing, not repository size, though.

+

The sample set chosen for the benchmark (789 text files from [http://www.gutenberg.org/ Project Gutenberg], 295 MB) occupied

+

my desktop for about 10 hours, so while I've done only a single run (in multi user mode) yet the numbers should be accurate

+

enough for an initial impression.

+

==== Benchmark results ====

+

[[Image:Op-vs-time.png|thumb|Plot showing the time taken for operations common to our usage scenario for various Version Control Systems]]

+

The operations still missing from the benchmark are looking up and checking out intermediate versions (prior to branching it)

+

instead of the latest version of the branch. Since many VCS' store the latest version as-is ("full" copy) and only deltas

+

of the intermediate versions, this might change the timings quite a bit. I'll also need to do a weighted summary to account

+

for the prospective usage pattern (more commits than checkouts due to autosave and only few branches created).

+

[[Image:Op-vs-size.png|thumb|Plot showing the space occupied after the named operations have finished]]

+

While I included my favourite VCS, [http://www.gnu.org/software/gnu-arch/ GNU arch], just out of curiousity

+

(unfortunately [http://lists.gnu.org/archive/html/gnu-arch-users/2008-11/msg00001.html not maintained anymore]),

+

it's comparing very well with the other systems: It's on second place both for total (unweighted) runtime and

+

final repository size, giving perfect balance between those two goals.

Sascha silbe

344

edits