Line 76: |
Line 76: |
| would need to remember and pass through the resumed <code>version_id</code> in order | | would need to remember and pass through the resumed <code>version_id</code> in order |
| to use the corresponding branch on save. | | to use the corresponding branch on save. |
| + | |
| + | Started evaluating Version Control Systems (VCS). From the 27 systems on the |
| + | [http://better-scm.berlios.de/comparison/comparison.html comparison list] done by |
| + | the "Better SCM Initiative", 13 are open source with 8 of them being shipped by |
| + | all of Debian, Fedora and Ubuntu (the systems currently officially supported |
| + | in [[Development Team/Jhbuild|sugar-jhbuild]]). |
| + | |
| + | While writing a [http://git.sugarlabs.org/projects/versionsupport-project/repos/mainline/trees/master/benchmarks benchmark] |
| + | to help in further elimination of candidates, I noticed our use case is actually quite distinct from that targeted by most systems |
| + | (which is storing a small number of projects each carrying source code, i.e. a large number of '''related''' files): |
| + | |
| + | # we're going to store a large number of '''un'''related entries (i.e. "projects" in traditional VCS nomenclature) |
| + | # most of our entries are going to be rather small (compared to entire source trees) |
| + | # for space efficiency, we don't want to keep working copies around after the activity using it has finished |
| + | |
| + | Point 3 offers an excellent chance for VCS' which expose their low level working primitives (e.g. git) to be tuned to our |
| + | use case as we might be able to directly access the repository instead of using the working directory as intermediate |
| + | storage. It will only affect timing, not repository size, though. |
| + | |
| + | The sample set chosen for the benchmark (789 text files from [http://www.gutenberg.org/ Project Gutenberg], 295 MB) occupied |
| + | my desktop for about 10 hours, so while I've done only a single run (in multi user mode) yet the numbers should be accurate |
| + | enough for an initial impression. |
| + | |
| + | ==== Benchmark results ==== |
| + | |
| + | [[Image:Op-vs-time.png|thumb|Plot showing the time taken for operations common to our usage scenario for various Version Control Systems]] |
| + | |
| + | The operations still missing from the benchmark are looking up and checking out intermediate versions (prior to branching it) |
| + | instead of the latest version of the branch. Since many VCS' store the latest version as-is ("full" copy) and only deltas |
| + | of the intermediate versions, this might change the timings quite a bit. I'll also need to do a weighted summary to account |
| + | for the prospective usage pattern (more commits than checkouts due to autosave and only few branches created). |
| + | |
| + | |
| + | [[Image:Op-vs-size.png|thumb|Plot showing the space occupied after the named operations have finished]] |
| + | |
| + | While I included my favourite VCS, [http://www.gnu.org/software/gnu-arch/ GNU arch], just out of curiousity |
| + | (unfortunately [http://lists.gnu.org/archive/html/gnu-arch-users/2008-11/msg00001.html not maintained anymore]), |
| + | it's comparing very well with the other systems: It's on second place both for total (unweighted) runtime and |
| + | final repository size, giving perfect balance between those two goals. |