Activities/Wikipedia/HowTo: Difference between revisions

Godiard (talk | contribs)
No edit summary
Godiard (talk | contribs)
No edit summary
Line 5: Line 5:
This page describes how to create the data files needed to create a wikipedia activity like  
This page describes how to create the data files needed to create a wikipedia activity like  
[http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en]
[http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en]
The general idea is download a xml file with a dump (backup) with the state of the wikipedia pages, and process it to select a number of pages, and compress them, to include in a activity. Optionally, is possible download the images used in that pages.
You will need a computer with a lot of space on disk, and a working Sugar environment. May be using packages provided by your linux distribution or in a virtual machine. The wikipedia xml file is big (almost 6 GB to the spanish wikipedia, bigger in english), and you need more space to generate temporary files. The process takes a lot of time too, but is automatic, you only need check states at finish of every stage. 
This page is a work in progress. If you have doubts or the information provided is not good enough, please contact me at gonzalo at laptop dot org and I will try to improve it. 
== Download the wikipedia base activity ==
You will need download the wikipedia base from http://dev.laptop.org/~gonzalo/wikibase.zip. This file include the activity and the tools to create the data files.
You need create a directory in your Activities directory for example WikipediaEs.activity and unzip wikibase.zip inside.


== Download a dump ==
== Download a dump ==


Create a directory inside the activity and download the wikipedia dump file
Wikipedia provide a almost daily xml files dump for every language.  
Wikipedia provide a almost daily xml files dump for every language.  
This test was done with the spanish dump.  
This test was done with the spanish dump. The file used was eswiki-20111112-pages-articles.xml.bz2 from http://dumps.wikimedia.org/eswiki/20110810/
The file used was eswiki-20111112-pages-articles.xml.bz2 from http://dumps.wikimedia.org/eswiki/20110810/
You need create a directory inside the create activity and download the wikipedia dump file


The first two letters from your directory must be the language code example: es_es or en_us
The first two letters from your directory must be the language code example: es_es or en_us
Line 108: Line 119:
in another directory to acelerate the process.
in another directory to acelerate the process.


== Create your new activity ==
== Modify your activity to use the data files ==
 
You need can modify the file activity_es.py and modify the lines:
 
        self.WIKIDB = 'es_new/eswiki-20111112-pages-articles.xml'
        self.HOME_PAGE = '/static/index_es.html'
 
to point to your new data files or create a new different file, for example activity_pt.py.
 
If you create a new file, you will need modify the file activity/activity.info to point to this new file.


TODO
You can create a new icon too, or modify the existing activity/activity-wikipedia-es.svg file.