Difference between revisions of "Features/Semantic Web"
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Summary == | == Summary == | ||
Make use of Semantic Web technologies (for instance [http://en.wikipedia.org/wiki/Resource_Description_Framework RDF], [http://en.wikipedia.org/wiki/Sparql SPARQL]) to enhance data management in Sugar. | Make use of Semantic Web technologies (for instance [http://en.wikipedia.org/wiki/Resource_Description_Framework RDF], [http://en.wikipedia.org/wiki/Sparql SPARQL]) to enhance data management in Sugar. | ||
== Owner == | == Owner == | ||
− | + | * Name: [[User:Cgueret| Christophe Gueret]] | |
− | * Name: [[User: | + | * Email: <christophe.gueret@gmail.com> |
− | |||
− | |||
− | * Email: < | ||
== Current status == | == Current status == | ||
− | * Targeted release: | + | * Targeted release: 0.98 |
− | * Last updated: | + | * Last updated: {{REVISIONTIMESTAMP}} |
− | * Percentage of completion: | + | * Percentage of completion: 60% |
== Detailed Description == | == Detailed Description == | ||
− | + | The idea is to leverage the flexible RDF data model to encode the data Sugar has to deal with. In association with SPARQL, a query language for RDF, it will be possible to have a universal query system as expressive as SQL but without the constraints imposed by databases (tables with fixed schema). | |
+ | |||
+ | The model for the data encoded in RDF is that of a graph with directed labelled edges. The nodes of this graph are resources, for instance a file on the disk or a journal entry. Note that these nodes are are not the resources themselves but a representation of them (i.e. the content of a file is not serialized as the label of a node). Edges between the nodes maps to relations binding resources together. Considering a journal entry and a file, the relation would be that the journal entry has stored the file. This model is flexible as any kind of relation can be created. It is conceptually very close to key/value stores. | ||
+ | |||
+ | In relation to RDF and SPARQL, the principle of "Linked Data" suggests publishing data by using URIs for both the resources and the relations. The use of URIs removes the risk for ambiguity in data encoding, it also provides an easier integration for multiple languages | ||
== Benefit to Sugar == | == Benefit to Sugar == | ||
− | + | Sugar will benefit from the usage of RDF, SPARQL, and the Linked Data publication scheme at several points: | |
− | + | * RDF is a standard pushed by the W3C and which is gaining popular adoption by public bodies (UK gov, ...) as well as companies (Facebook, Google, ...). Using that data model in Sugar would ease data integration processes with other data sources already using that format. For instance, an activity will be able to query Wikipedia (actually, DBpedia - it's RDF enabled flavor) for the population of a country, or a list of countries sharing some particular feature. The acquired knowledge could then be stored in the XO and connected to other information. | |
− | ' | + | * SPARQL is a query mechanism for RDF data that will expose the data contained in Sugar. In a class room, a teacher will be able to query all the XOs for their most used activity and generate statistics with a single SPARQL query sent to all the machines. |
+ | * The implementation of [[Features/Tags_in_Journal]] is facilitated as it maps to picking up tags from a set of pre-defined resources and connected to them with an "hasTag" relation. Rather than being hard-coded within Sugar, the tags can be stored on a SPARQL enabled server on an XS and be queried when needed. | ||
+ | * Enhanced multi-lang support. For instance, let's consider the tag "Car" and assume a French speaking class and a Spanish class want to use it to tag some activities usage. French pupils will tag "voiture", their Spanish friends will most likely use "coche". Consolidating the two terms as speaking about a car is tricky, data consolidation is better done upfront. Linked Data proposes to solve this issue by picking up a resource for "Car", for instance "http://laptop.org/ontology/Car" and relate it to "voiture"@fr and "coche"@es with a relation "rdf:label". Sugar will then display the french label or the spanish label depending on the locale but always use "http://laptop.org/ontology/Car" in the backend. | ||
+ | * As illustrated by the previous example, using these technologies will facilitate data integration and re-use between schools, bringing collaborative learning to a new scale. | ||
+ | * Similarly to the tags, the meta data an activities uses can be controlled via an external data source and a controlled vocabulary. | ||
+ | * In general, Sugar can be turned into a [http://en.wikipedia.org/wiki/Semantic_desktop Semantic Desktop]. A concept which relates to a better integration of data between desktop applications and across desktops. It will be the first of this kind reaching a young public, that's something worth a bit of PR :) | ||
== Scope == | == Scope == | ||
− | + | The scope will vary with the degree of adoption to consider. It ranges from installing a python package to let activities query for RDF data to making a complete re-design of the Journal data store. | |
==UI Design== | ==UI Design== | ||
− | + | There should be no need for a UI as most of the changes concerns the code of Sugar and activities. | |
== How To Test == | == How To Test == | ||
+ | All the code is available [http://git.sugarlabs.org/semanticxo on SugarLabs] | ||
+ | |||
{{:{{PAGENAME}}/Testing}} | {{:{{PAGENAME}}/Testing}} | ||
+ | |||
== User Experience == | == User Experience == | ||
− | + | Users will eventually see that: | |
+ | * Activities can share information | ||
+ | * They can directly contribute to the translation of parts of the software | ||
+ | * They can share and link Journal entries from different instances of Sugar, even if remotely connected | ||
+ | |||
+ | Developers of activities will have: | ||
+ | * An easy way to store graph-shaped data | ||
+ | * A query system for fetching information published in RDF, for instance those in the [http://lod-cloud.net LOD Cloud] | ||
+ | * A mechanism to query data from other Sugar instances | ||
== Dependencies == | == Dependencies == | ||
− | + | Most noticeably, the implementation of storing RDF data on Sugar and having it queried with SPARQL requires the addition of a triple store. Triple store are RDF databases that are optimized for storing and serving data in that format. Most of the triple stores have been designed for servers are too big to be run on an XO. For now, some [http://semweb4u.wordpress.com/2011/11/02/does-it-scale/ interesting performances] have been reached with [http://www.aelius.com/njh/redstore/ RedStore]. An other alternative to consider would be the popular [http://4store.org/ 4store]. These two seems however to be the only ones usable on the XO hardware, currently none of them is packaged for Fedora. | |
== Contingency Plan == | == Contingency Plan == | ||
− | + | Well, revert to previous release behaviour should be enough to solve all problems. | |
== Documentation == | == Documentation == | ||
− | + | This feature is being investigated under the contributor project "SemanticXO", it has been explained and demoed at various conferences and demo events. Most of the documentation can be found on the [http://semweb4u.wordpress.com/category/semanticxo/ blog of the project] | |
== Release Notes == | == Release Notes == | ||
− | |||
== Comments and Discussion == | == Comments and Discussion == | ||
* See [[{{TALKPAGENAME}}|discussion tab for this feature]] <!-- This adds a link to the "discussion" tab associated with your page. This provides the ability to have ongoing comments or conversation without bogging down the main feature page. --> | * See [[{{TALKPAGENAME}}|discussion tab for this feature]] <!-- This adds a link to the "discussion" tab associated with your page. This provides the ability to have ongoing comments or conversation without bogging down the main feature page. --> | ||
+ | |||
+ | == Subpages == | ||
+ | {{Special:PrefixIndex/{{PAGENAMEE}}/}} | ||
+ | |||
+ | |||
+ | <noinclude> | ||
+ | [[Category:Feature Page Incomplete]] | ||
+ | [[Category:Feature|Semantic Web]] | ||
+ | </noinclude> | ||
+ | |||
+ | |||
+ | <!-- All fields on this form are required to be accepted. | ||
+ | We also request that you maintain the same order of sections so that all of the feature pages are uniform. --> | ||
+ | |||
+ | <!-- The actual name of your feature page should look something like: Features/Your Feature Name. This keeps all features in the same namespace --> |
Latest revision as of 03:30, 3 October 2012
Summary
Make use of Semantic Web technologies (for instance RDF, SPARQL) to enhance data management in Sugar.
Owner
- Name: Christophe Gueret
- Email: <christophe.gueret@gmail.com>
Current status
- Targeted release: 0.98
- Last updated: 20121003033033
- Percentage of completion: 60%
Detailed Description
The idea is to leverage the flexible RDF data model to encode the data Sugar has to deal with. In association with SPARQL, a query language for RDF, it will be possible to have a universal query system as expressive as SQL but without the constraints imposed by databases (tables with fixed schema).
The model for the data encoded in RDF is that of a graph with directed labelled edges. The nodes of this graph are resources, for instance a file on the disk or a journal entry. Note that these nodes are are not the resources themselves but a representation of them (i.e. the content of a file is not serialized as the label of a node). Edges between the nodes maps to relations binding resources together. Considering a journal entry and a file, the relation would be that the journal entry has stored the file. This model is flexible as any kind of relation can be created. It is conceptually very close to key/value stores.
In relation to RDF and SPARQL, the principle of "Linked Data" suggests publishing data by using URIs for both the resources and the relations. The use of URIs removes the risk for ambiguity in data encoding, it also provides an easier integration for multiple languages
Benefit to Sugar
Sugar will benefit from the usage of RDF, SPARQL, and the Linked Data publication scheme at several points:
- RDF is a standard pushed by the W3C and which is gaining popular adoption by public bodies (UK gov, ...) as well as companies (Facebook, Google, ...). Using that data model in Sugar would ease data integration processes with other data sources already using that format. For instance, an activity will be able to query Wikipedia (actually, DBpedia - it's RDF enabled flavor) for the population of a country, or a list of countries sharing some particular feature. The acquired knowledge could then be stored in the XO and connected to other information.
- SPARQL is a query mechanism for RDF data that will expose the data contained in Sugar. In a class room, a teacher will be able to query all the XOs for their most used activity and generate statistics with a single SPARQL query sent to all the machines.
- The implementation of Features/Tags_in_Journal is facilitated as it maps to picking up tags from a set of pre-defined resources and connected to them with an "hasTag" relation. Rather than being hard-coded within Sugar, the tags can be stored on a SPARQL enabled server on an XS and be queried when needed.
- Enhanced multi-lang support. For instance, let's consider the tag "Car" and assume a French speaking class and a Spanish class want to use it to tag some activities usage. French pupils will tag "voiture", their Spanish friends will most likely use "coche". Consolidating the two terms as speaking about a car is tricky, data consolidation is better done upfront. Linked Data proposes to solve this issue by picking up a resource for "Car", for instance "http://laptop.org/ontology/Car" and relate it to "voiture"@fr and "coche"@es with a relation "rdf:label". Sugar will then display the french label or the spanish label depending on the locale but always use "http://laptop.org/ontology/Car" in the backend.
- As illustrated by the previous example, using these technologies will facilitate data integration and re-use between schools, bringing collaborative learning to a new scale.
- Similarly to the tags, the meta data an activities uses can be controlled via an external data source and a controlled vocabulary.
- In general, Sugar can be turned into a Semantic Desktop. A concept which relates to a better integration of data between desktop applications and across desktops. It will be the first of this kind reaching a young public, that's something worth a bit of PR :)
Scope
The scope will vary with the degree of adoption to consider. It ranges from installing a python package to let activities query for RDF data to making a complete re-design of the Journal data store.
UI Design
There should be no need for a UI as most of the changes concerns the code of Sugar and activities.
How To Test
All the code is available on SugarLabs
It is possible to test SemanticXO on both an XO-1 or a desktop PC. However, it is recommended to use an XO-1 are the installation is easier on these and they are used as a primary target for all the development.
On an XO-1 running the software 12.1.0
- Put the files "setup.sh" and "semanticxo.tar.gz" somewhere on the XO
- Login as root, set setup.sh to be executable and type "./setup.sh setup". This will install the API of SemanticXO, the triple store and two demo activities
- Reboot
On a desktop PC
This is a step by step intruction guide to test the triple store based Journal backend. It assumes that all the packages for sugar are available and the sugar-emulator is running.
- Download and install RedStore
Follow the instructions on http://www.aelius.com/njh/redstore/ to download and install RedStore on the test machine. It should also be possible to use other triple stores such as Virtuoso or OWLIM as long as a SPARQL 1.1 compliant service is accessible on the port 8080.
- Install the dependencies for SemanticXO
- RDFLib http://code.google.com/p/rdflib/
- SPARQLWrapper http://sparql-wrapper.sourceforge.net/
- Download and install the code from SemanticXO
Clone the repository:
git clone git://git.sugarlabs.org/semanticxo/main.git semanticxo
Edit line 4 and 5 of datastore/bin/datastore-service to make it fit the location of the directory where you clone the code (by default ~/Code/SemanticXO) Locate the datastore-service startup script from the normal datastore
type datastore-service
Replace that daemon with datastore/bin/datastore-service
- Start redstore in debug mode, so that you will see the queries being executed ;-)
redstore -v
- Start the sugar emulator
sugar-emulator
- Play around with Sugar, creating and updating entries in the Journal
- With an explorer, within the emulator or outside, go to http://localhost:8080/ to see the content of the triple store. There should be one named graph per Journal entry
User Experience
Users will eventually see that:
- Activities can share information
- They can directly contribute to the translation of parts of the software
- They can share and link Journal entries from different instances of Sugar, even if remotely connected
Developers of activities will have:
- An easy way to store graph-shaped data
- A query system for fetching information published in RDF, for instance those in the LOD Cloud
- A mechanism to query data from other Sugar instances
Dependencies
Most noticeably, the implementation of storing RDF data on Sugar and having it queried with SPARQL requires the addition of a triple store. Triple store are RDF databases that are optimized for storing and serving data in that format. Most of the triple stores have been designed for servers are too big to be run on an XO. For now, some interesting performances have been reached with RedStore. An other alternative to consider would be the popular 4store. These two seems however to be the only ones usable on the XO hardware, currently none of them is packaged for Fedora.
Contingency Plan
Well, revert to previous release behaviour should be enough to solve all problems.
Documentation
This feature is being investigated under the contributor project "SemanticXO", it has been explained and demoed at various conferences and demo events. Most of the documentation can be found on the blog of the project
Release Notes
Comments and Discussion
Subpages