Features/Semantic Web

Summary
Make use of Semantic Web technologies (for instance RDF, SPARQL) to enhance data management in Sugar.

Owner

 * Name: Christophe Gueret
 * Email: 

Current status

 * Targeted release: 0.98
 * Last updated:
 * Percentage of completion: 60%

Detailed Description
The idea is to leverage the flexible RDF data model to encode the data Sugar has to deal with. In association with SPARQL, a query language for RDF, it will be possible to have a universal query system as expressive as SQL but without the constraints imposed by databases (tables with fixed schema).

The model for the data encoded in RDF is that of a graph with directed labelled edges. The nodes of this graph are resources, for instance a file on the disk or a journal entry. Note that these nodes are are not the resources themselves but a representation of them (i.e. the content of a file is not serialized as the label of a node). Edges between the nodes maps to relations binding resources together. Considering a journal entry and a file, the relation would be that the journal entry has stored the file. This model is flexible as any kind of relation can be created. It is conceptually very close to key/value stores.

In relation to RDF and SPARQL, the principle of "Linked Data" suggests publishing data by using URIs for both the resources and the relations. The use of URIs removes the risk for ambiguity in data encoding, it also provides an easier integration for multiple languages

Benefit to Sugar
Sugar will benefit from the usage of RDF, SPARQL, and the Linked Data publication scheme at several points:
 * RDF is a standard pushed by the W3C and which is gaining popular adoption by public bodies (UK gov, ...) as well as companies (Facebook, Google, ...). Using that data model in Sugar would ease data integration processes with other data sources already using that format. For instance, an activity will be able to query Wikipedia (actually, DBpedia - it's RDF enabled flavor) for the population of a country, or a list of countries sharing some particular feature. The acquired knowledge could then be stored in the XO and connected to other information.
 * SPARQL is a query mechanism for RDF data that will expose the data contained in Sugar. In a class room, a teacher will be able to query all the XOs for their most used activity and generate statistics with a single SPARQL query sent to all the machines.
 * The implementation of Features/Tags_in_Journal is facilitated as it maps to picking up tags from a set of pre-defined resources and connected to them with an "hasTag" relation. Rather than being hard-coded within Sugar, the tags can be stored on a SPARQL enabled server on an XS and be queried when needed.
 * Enhanced multi-lang support. For instance, let's consider the tag "Car" and assume a French speaking class and a Spanish class want to use it to tag some activities usage. French pupils will tag "voiture", their Spanish friends will most likely use "coche". Consolidating the two terms as speaking about a car is tricky, data consolidation is better done upfront. Linked Data proposes to solve this issue by picking up a resource for "Car", for instance "http://laptop.org/ontology/Car" and relate it to "voiture"@fr and "coche"@es with a relation "rdf:label". Sugar will then display the french label or the spanish label depending on the locale but always use "http://laptop.org/ontology/Car" in the backend.
 * As illustrated by the previous example, using these technologies will facilitate data integration and re-use between schools, bringing collaborative learning to a new scale.
 * Similarly to the tags, the meta data an activities uses can be controlled via an external data source and a controlled vocabulary.
 * In general, Sugar can be turned into a Semantic Desktop. A concept which relates to a better integration of data between desktop applications and across desktops. It will be the first of this kind reaching a young public, that's something worth a bit of PR :)

Scope
The scope will vary with the degree of adoption to consider. It ranges from installing a python package to let activities query for RDF data to making a complete re-design of the Journal data store.

UI Design
There should be no need for a UI as most of the changes concerns the code of Sugar and activities.

How To Test
All the code is available on SugarLabs

User Experience
Users will eventually see that:
 * Activities can share information
 * They can directly contribute to the translation of parts of the software
 * They can share and link Journal entries from different instances of Sugar, even if remotely connected

Developers of activities will have:
 * An easy way to store graph-shaped data
 * A query system for fetching information published in RDF, for instance those in the LOD Cloud
 * A mechanism to query data from other Sugar instances

Dependencies
Most noticeably, the implementation of storing RDF data on Sugar and having it queried with SPARQL requires the addition of a triple store. Triple store are RDF databases that are optimized for storing and serving data in that format. Most of the triple stores have been designed for servers are too big to be run on an XO. For now, some interesting performances have been reached with RedStore. An other alternative to consider would be the popular 4store. These two seems however to be the only ones usable on the XO hardware, currently none of them is packaged for Fedora.

Contingency Plan
Well, revert to previous release behaviour should be enough to solve all problems.

Documentation
This feature is being investigated under the contributor project "SemanticXO", it has been explained and demoed at various conferences and demo events. Most of the documentation can be found on the blog of the project

Comments and Discussion

 * See |discussion tab for this feature