Difference between revisions of "Development Team/Datastore Rewrite"
(→Goals) |
(→Design) |
||
Line 46: | Line 46: | ||
This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once. | This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once. | ||
+ | |||
+ | == Layout on disk == | ||
+ | |||
+ | The proposed implementation relies heavily on the data structures provided by the filesystem, thus the layout in which files are disposed on disk is a fundamental part of its design. | ||
+ | |||
+ | Example of a datastore containing 5 entries, two of them referring to the same file (with checksum 464493d8d929436b6152e868867ed451): | ||
+ | |||
+ | 1a | ||
+ | 1ab88287-766a-4d98-a7c0-4233dc48647a | ||
+ | 1ab88287-766a-4d98-a7c0-4233dc48647a | ||
+ | metadata | ||
+ | 2b | ||
+ | 2b90597c-0912-4e7f-8eeb-71a0f004490d | ||
+ | 2b90597c-0912-4e7f-8eeb-71a0f004490d -> checksums/464493d8d929436b6152e868867ed451 | ||
+ | checksum ~> checksums/464493d8d929436b6152e868867ed451 | ||
+ | extra_metadata | ||
+ | preview | ||
+ | metadata | ||
+ | 3c | ||
+ | 3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a | ||
+ | 3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a -> checksums/464493d8d929436b6152e868867ed451 | ||
+ | checksum ~> checksums/464493d8d929436b6152e868867ed451 | ||
+ | extra_metadata | ||
+ | preview | ||
+ | metadata | ||
+ | 4d | ||
+ | 4db11d29-2f07-4452-bd8e-22a6a483ac19 | ||
+ | 4db11d29-2f07-4452-bd8e-22a6a483ac19 | ||
+ | metadata | ||
+ | extra_metadata | ||
+ | preview | ||
+ | 5e | ||
+ | 5e9f2027-b41e-4015-a848-6b3972193eb8 | ||
+ | 5e9f2027-b41e-4015-a848-6b3972193eb8 | ||
+ | metadata | ||
+ | extra_metadata | ||
+ | preview | ||
+ | checksums | ||
+ | |||
+ | index |
Revision as of 08:47, 23 September 2008
Goals
Reliability
A good DataStore doesn't lose data easily.
Performance
Queries should be fast enough for the journal to be very responsive when browsing its contents.
Activities should be able to store their data quickly and present a fast UI to their users.
The shell should be able to quickly query the DS to allow the user to resume entries from other views than the journal.
Maintainability
The original implementation tried to achieve goals that were hard and that proved not to be necessary at this stage. This has caused the code base to be unnecessarily complex and several changes to the requirements added considerable confusion to it. We wish to focus the code on what is really needed and do it well.
Custom metadata properties
Activities should be able to store in their entries the metadata they wish, should not be limited to a predefined set.
More efficient file storage
Identical files should be stored just once.
Versioned entries (not fulfilled yet)
Entries may be related in version trees.
Design
Filesystem knows which entries are stored
By examining the directory structure, we know where is localized the data related to each entry. We don't depend any more on a binary structure that could become corrupted and unusable as a whole.
Metadata is stored as a single file for each entry
Each entry has its metadata stored in a single file, so that if corruption happened on one of those, the rest of the entries would be unaffected. The format in which metadata is stored even allows to recover from a malformed property by just dropping it.
Queries are accelerated with a disposable database
This allows us to efficiently query the stored entries, but as we only use the database to accelerate queries, we can drop and recreate it in case of corruption or update to an incompatible database format.
Detect identical files and hard-link them
This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once.
Layout on disk
The proposed implementation relies heavily on the data structures provided by the filesystem, thus the layout in which files are disposed on disk is a fundamental part of its design.
Example of a datastore containing 5 entries, two of them referring to the same file (with checksum 464493d8d929436b6152e868867ed451):
1a 1ab88287-766a-4d98-a7c0-4233dc48647a 1ab88287-766a-4d98-a7c0-4233dc48647a metadata 2b 2b90597c-0912-4e7f-8eeb-71a0f004490d 2b90597c-0912-4e7f-8eeb-71a0f004490d -> checksums/464493d8d929436b6152e868867ed451 checksum ~> checksums/464493d8d929436b6152e868867ed451 extra_metadata preview metadata 3c 3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a 3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a -> checksums/464493d8d929436b6152e868867ed451 checksum ~> checksums/464493d8d929436b6152e868867ed451 extra_metadata preview metadata 4d 4db11d29-2f07-4452-bd8e-22a6a483ac19 4db11d29-2f07-4452-bd8e-22a6a483ac19 metadata extra_metadata preview 5e 5e9f2027-b41e-4015-a848-6b3972193eb8 5e9f2027-b41e-4015-a848-6b3972193eb8 metadata extra_metadata preview checksums index