Development Team/Datastore Rewrite: Difference between revisions

Newer edit →

Revision as of 07:54, 23 September 2008

Goals

Reliability

A good DataStore doesn't lose data easily.

Performance

Queries should be fast enough for the journal to be very responsive when browsing its contents.

Activities should be able to store their data quickly and present a fast UI to their users.

The shell should be able to quickly query the DS to allow the user to resume entries from other views than the journal.

Custom metadata properties

Activities should be able to store in their entries the metadata they wish, should not be limited to a predefined set.

More efficient file storage

Identical files should be stored just once.

Versioned entries (not fulfilled yet)

Entries may be related in version trees.

Design

Filesystem knows which entries are stored

By examining the directory structure, we know where is localized the data related to each entry. We don't depend any more on a binary structure that could become corrupted and unusable as a whole.

Metadata is stored as a single file for each entry

Each entry has its metadata stored in a single file, so that if corruption happened on one of those, the rest of the entries would be unaffected. The format in which metadata is stored even allows to recover from a malformed property by just dropping it.

Queries are accelerated with a disposable database

This allows us to efficiently query the stored entries, but as we only use the database to accelerate queries, we can drop and recreate it in case of corruption or update to an incompatible database format.

Detect identical files and hard-link them

This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once.

@@ Line 26: / Line 26: @@
 == Design ==
+=== Filesystem knows which entries are stored ===
+By examining the directory structure, we know where is localized the data related to each entry. We don't depend any more on a binary structure that could become corrupted and unusable as a whole.
+=== Metadata is stored as a single file for each entry ===
+Each entry has its metadata stored in a single file, so that if corruption happened on one of those, the rest of the entries would be unaffected. The format in which metadata is stored even allows to recover from a malformed property by just dropping it.
+=== Queries are accelerated with a disposable database ===
+This allows us to efficiently query the stored entries, but as we only use the database to accelerate queries, we can drop and recreate it in case of corruption or update to an incompatible database format.
+=== Detect identical files and hard-link them ===
+This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once.