Revision as of 00:05, 24 January 2014

Harvest Project

Harvest project aims to make learning visible to educators and decision makers. Within the context of the Sugar Learning Platform, this can be achieved by collecting reliable metadata from the Journal. This project proposes a simple and continuous mechanism to obtain metadata from Journal entries, incrementally over time. Metadata can stored in a central repository for further statistical analysis.

What it is collecting?

Harvest collects most of the non-sensible journal entry metadata, but also collects anonymous information about the user.

Concepts

Activities refers to the sugar applications that are being used.
Learners refers to the sugar users.
Instances refers to the different sessions an particular activity, owned by one learner.
Launches refers to the different times the same session is started.

Metadata

Data
Concept	Attribute	Description	Type
Learners	serial_number	Hashed laptop identifier	String
	birthdate	Aproximate birthdate of the user	Unix time
	gender	Gender of the user	String
Activities	bundle_id	Activity identifier	String
Instances	object_id	Entry identifier	String
	filesize	Size in bytes of the content associated to the entry	Integer
	creation_time	Entry creation time	Unix time
	timestamp	Entry last modification time	Unix time
	buddies	Number of user's associated to the entry	Integer
	spent_time	Just a place holder for now. Still not supported in Sugar	Integer
	shared_scope	If entry was exposed through the collaboration service	Boolean
	title_set_by_user	If user has set a custom message to the entry	Boolean
	keep	If the entry has been explicitly kept in the journal	Boolean
	mime_type	Media type associated to the activity instance	String
Launches	timestamp	Launch time for an particular entry	Unix time

Observation: All the metadata names, matches the original names of the journal metadata.

How does it work?

The project comprises two pieces of software: a harvest server that can be localed anywhere in the cloud, and a harvest client that runs in the learners machine. The harvest server exposes a service, accessible from the Internet, for metadata storage. The harvest clients collect metadata from the Journal and sends it to server.

When does it collect?

Data is collected when Sugar starts and when Sugar successfully connects to a network.
Once it has successfully collected data, it won't sent another report until the next collecting period, weekly or monthly.
In order to avoid service peaks, Harvest applies a random chance for executing the collection process.
Also, if the server is unresponsive, it won't retry for couple hours.

What are the advantages?

Learners data are never copied nor transferred out of their machines.
The collection is being done continuously over time. This means that its sampling is very fine grained.
It is very lightweight. It can be deployed in a central server.
Does not require OS customization. The client is based on Sugar's web service framework, and it can be installed on any existing Sugar 0.100+ distribution.

What is implemented so far?

Pretty much everything as it concerns for metadata collection.

Harvest server

Back-end service for storage.
SSL data encryption.
API Key authorization.
Control scripts based on systemd.
DB migrations and continuous integration support.
RPM packaging.

Harvest client

Journal metadata collection.
Web service extension.
Extension controls from the web service control panel.
Random selection.
Exclusive log for debugging.
Hashed serial numbers.
Restricted retry policy.
RPM packaging.

Code

RPMs

Install tch's repo

$sudo vim /etc/yum.repos.d/tch.repo

 [tch]
 name=tch
 baseurl=http://www.sugarlabs.org/~tch/repos/f19/
 enabled=1
 metadata_expire=1d
 gpgcheck=0

Install harvest-server

 $sudo yum install harvest-sever
 $sudo service harvest start
 $sudo systemctl enable harvest.service

Observation: server's RPM installer assumes no password for the root MySQL user, this way it will do absolutely everything for you. Even when updating.

Observation: server's config can be found at /opt/harvest/etc/harvest.cfg. It is recommended to modify the api-key.

Install harvest-client

 $sudo install harvest-client

Settings

Clients can be setup in sugar's control panel "Web accounts" section, or it can be done via terminal:

 $gconftool-2 --set /desktop/sugar/collaboration/harvest_hostname https://your.hostname --type string
 $gconftool-2 --set /desktop/sugar/collaboration/harvest_api_key your-api-key --type string

Development

If you interested in contributing to this project please contact tch at sugarlabs dot org (Martin Abente Lahaye).

TODO

Server-side data visualization
Client-side (Sugar) modifications to collect run-times and other desired data

@@ Line 102: / Line 102: @@
 == What are the advantages? ==
-* No need to copy the learners journal content, no backups required either.
+* Learners data are never copied nor transferred out of their machines.
-* Does not require OS customization, it based on Sugar's web service framework, can be installed on any existing Sugar 0.100+ distribution.
+* The collection is being done continuously over time. This means that its sampling is very fine grained.
-* Does not depend on the school server, it can be deployed in a central server.
+* It is very lightweight. It can be deployed in a central server.
-* A LOT of potential information!
+* Does not require OS customization. The client is based on Sugar's web service framework, and it can be installed on any existing Sugar 0.100+ distribution.
 == What is implemented so far? ==