Changes

Jump to navigation Jump to search
m
Line 1: Line 1:  +
<noinclude>{{ GoogleTrans-en | es =show | bg =show | zh-CN =show | zh-TW =show | hr =show | cs =show | da =show | nl =show | fi =show | fr =show | de =show | el =show | hi =show | it =show | ja =show | ko =show | no =show | pl =show | pt =show | ro =show | ru =show | sv =show }}</noinclude>
 +
{{TOCright}}
 +
 +
== Introduction ==
 +
 +
This page describes the design of a new DataStore implementation that was shipped first in the [[0.84]] (April 2009) Sugar release.
 +
 
== Goals ==
 
== Goals ==
   Line 35: Line 42:  
By examining the directory structure, we know where is localized the data related to each entry. We don't depend any more on a binary structure that could become corrupted and unusable as a whole.
 
By examining the directory structure, we know where is localized the data related to each entry. We don't depend any more on a binary structure that could become corrupted and unusable as a whole.
   −
=== Metadata is stored as a single file for each entry ===
+
=== Metadata is stored in a single file per property ===
   −
Each entry has its metadata stored in a single file, so that if corruption happened on one of those, the rest of the entries would be unaffected. The format in which metadata is stored even allows to recover from a malformed property by just dropping it.
+
Metadata for each entry is stored in several files, one per property. In this way, if corruption happened on one those properties, the rest of the entry (and the other entries in the DS) would be unaffected.
 +
: See [[olpc:Low-level_Activity_API#Meta_Data]].
    
=== Queries are accelerated with a disposable database ===
 
=== Queries are accelerated with a disposable database ===
Line 46: Line 54:     
This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once.
 
This improves storage efficiency in general, but in our case is more important because we wish to record in the journal several interactions that refer to the same file. For example, "Downloaded lesson3.pdf", "Read lesson3.pdf", "Sent lesson3.pdf to Juan" would all refer to the same file and we need to only store it once.
 +
 +
== Layout on disk ==
 +
 +
The proposed implementation relies heavily on the data structures provided by the filesystem, thus the layout in which files are disposed on disk is a fundamental part of its design.
 +
 +
Example of a datastore containing 5 entries, two of them referring to the same file (with checksum 464493d8d929436b6152e868867ed451):
 +
 +
1a
 +
      1ab88287-766a-4d98-a7c0-4233dc48647a
 +
            data
 +
            metadata
 +
                  uid
 +
                  checksum
 +
                  activity_id
 +
                  mime_type
 +
                  preview
 +
                  share-scope
 +
                  timestamp
 +
                  title
 +
2b
 +
      2b90597c-0912-4e7f-8eeb-71a0f004490d
 +
            data
 +
            metadata
 +
                  uid
 +
                  checksum
 +
                  activity_id
 +
                  mime_type
 +
                  preview
 +
                  share-scope
 +
                  timestamp
 +
                  title
 +
3c
 +
      3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a
 +
            data -> 2b/2b90597c-0912-4e7f-8eeb-71a0f004490d/data
 +
            metadata
 +
                  uid
 +
                  checksum
 +
                  activity_id
 +
                  mime_type
 +
                  preview
 +
                  share-scope
 +
                  timestamp
 +
                  title
 +
4d
 +
      4db11d29-2f07-4452-bd8e-22a6a483ac19
 +
            data
 +
            metadata
 +
                  uid
 +
                  checksum
 +
                  activity_id
 +
                  mime_type
 +
                  preview
 +
                  share-scope
 +
                  timestamp
 +
                  title
 +
      4d9f2027-b41e-4015-a848-6b3972193eb8
 +
            data
 +
            metadata
 +
                  uid
 +
                  checksum
 +
                  activity_id
 +
                  mime_type
 +
                  preview
 +
                  share-scope
 +
                  timestamp
 +
                  title
 +
checksums
 +
      464493d8d929436b6152e868867ed451
 +
            2b90597c-0912-4e7f-8eeb-71a0f004490d
 +
            3cdf5f0e-7595-4166-b1f9-cbedfcfe1c4a
 +
index
 +
      flintlock
 +
      iamflint
 +
      postlist.baseA
 +
      postlist.baseB
 +
      postlist.DB
 +
      record.baseA
 +
      record.baseB
 +
      record.DB
 +
      termlist.baseA
 +
      termlist.baseB
 +
      termlist.DB
 +
      value.baseA
 +
      value.baseB
 +
      value.DB
 +
index_updated
 +
version
 +
 +
'''1a''': directory holding entries, it's only function is to avoid having too many directories in a single directory, as this is considered specially harmful on jffs2.
 +
 +
'''1a/1ab88287-...-4233dc48647a''': directory holding the files related to one entry
 +
 +
'''1a/1ab88287-...-4233dc48647a/data''': file related to an entry
 +
 +
'''1a/1ab88287-...-4233dc48647a/metadata''': directory containing a file for each metadata property of an entry
 +
 +
'''2b/2b90597c-...-71a0f004490d/metadata/activity_id''': file containing the value of the '''activity_id''' property
 +
 +
'''3c/3cdf5f0e-...-cbedfcfe1c4a/data''': hard link to the same file in the entry '''2b90597c-...-71a0f004490d'''
 +
 +
'''checksums''': directory containing a directory per each file contained in the DS, named by its md5 checksum
 +
 +
'''checksums/464493d8d929436b6152e868867ed451''': directory containing files named by each entry that contain a file with this checksum
 +
 +
'''checksums/464493d8d929436b6152e868867ed451/2b90597c-...-71a0f004490d''' file named by the uid of an entry the file of which has this checksum.
 +
 +
'''index''': directory containing all files that belong to the search database. Can be deleted and recreated from the rest of the DS if needed without incurring in data loss.
 +
 +
'''index_updated''': When this file is not present, is because the xapian index is being rebuilt because it cannot be opened. Queries should fall back to display all the files on disk until the index is rebuilt.
 +
 +
'''version''': file that contains the version of the file layout. Currently we are at 1. Is updated when an earlier version is migrated to a newer one.
 +
 +
== Source code ==
 +
 +
http://git.sugarlabs.org/projects/sugar-datastore

Navigation menu