Activities/Wikipedia/HowTo: Difference between revisions
| (12 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
''"Crear uno, dos, tres... mil Wikipedias" Comandante Ernesto Wales'' | ''"Crear uno, dos, tres... mil Wikipedias" Comandante Ernesto Wales'' | ||
=== Object of this HowTo === | |||
This HowTo explains how to update the data files in the wikipedia activities or how to create new activities with other languages or different selections of articles. | |||
The procedure is not very difficult if you already have a Sugar environment setup. If you have doubts or the information provided is not adequate, please contact me at ''gonzalo at laptop dot org'' or in the sugar-devel mailing list and I will try to help and improve this page. | |||
If you want to create a wikipedia activity in your language, and do not have the technical resources, but can help translating a few files and doing quality control, contact me and I will help you to create the activity. | |||
=== How to Create a new wikipedia activity or update an existing activity === | === How to Create a new wikipedia activity or update an existing activity === | ||
This page describes how to generate the data files needed to create a wikipedia activity like | This page describes how to generate the data files needed to create a wikipedia activity like | ||
[http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en] | [http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en]. | ||
The general idea is to download an XML dump-file (backup) containing the current Wikipedia pages for a given language, then process the dump and select certain pages and compress them into a self-contained Sugar activity. Whether or not to include the images from the wiki articles will have a large impact on the size of the activity. | |||
Generating a Wikipedia activity requires a computer with a lot of available disk space, ideally lots of RAM and a working Sugar environment. It is probably best to use packages provided by your favorite Linux distribution or in a virtual machine. The wikipedia xml file is very large (almost 6 GB for the Spanish wikipedia, and it is even bigger in English), and you will need lots of space to generate temporary files. The process does take a lot of time, but it is mostly automated, although you will need to confirm success at each stage of the process before moving on to the next one. | |||
== Download the wikipedia base activity == | == Download the wikipedia base activity == | ||
You will need to download the wikipedia base from http://dev.laptop.org/~gonzalo/wikiserver/WikipediaBase- | You will need to download the wikipedia base from http://dev.laptop.org/~gonzalo/wikiserver/WikipediaBase-35.xo. This package includes the activity and the tools to create the data files. | ||
You need unzip it in your Activities directory, or install it, if you do not have | You need to unzip it in your Activities directory, or install it, if you do not have another wikipedia activity already installed. | ||
The git repository is here | The git repository is here https://github.com/godiard/wikipedia-activity . | ||
== Download a Wikipedia dump file== | == Download a Wikipedia dump file== | ||
| Line 108: | Line 114: | ||
To have faster results we will apply templates substitutions in all the pages. | To have faster results we will apply templates substitutions in all the pages. | ||
== | == Optimize the data and download images == | ||
To expand the templates you need go out of the data directory: | To expand the templates you need go out of the data directory: | ||
| Line 122: | Line 128: | ||
mv eswiki-20111112-pages-articles.xml.processed_expanded eswiki-20111112-pages-articles.xml.processed | mv eswiki-20111112-pages-articles.xml.processed_expanded eswiki-20111112-pages-articles.xml.processed | ||
../tools2/create_index.py -- | ../tools2/create_index.py --delete_old | ||
The option -- | The option --delete_old is used to remove the old index | ||
If you want to include images in your wikipedia activity, you can go again to your data directory and do: | If you want to include images in your wikipedia activity, you can go again to your data directory and do: | ||
| Line 141: | Line 147: | ||
== Modify your activity to use the data files == | == Modify your activity to use the data files == | ||
You | To create a wikipedia in a new language, you will need create the following files: | ||
* activity/activity.info.''lang'': is the activity.info file for your language. You can copy | |||
one from other language, and modify the name, the bundle_id, the icon and the exec line. | |||
* activity/activity-wikipedia-''lang''.svg: is the activity icon. The file can be copied from | |||
another language, and modify with a text editor the last text element, to put the labugage code. | |||
If you need edit the image with a graphic editor (like Inkscape) remember add the entities lines | |||
in the header and replace the entities for stroke_color and fill_color, after that. | |||
* '''DEPRECATED, SEE BELOW:''' activity_''lang''.py: is the startup class, sets the configuration values and starts the server. | |||
You can copy the class from another language and set the parameters. You need set the name of the class, | |||
equal than the value in the exec value in the activity/activity.info.lang file. | |||
* static/about_''lang''.html: Is a static about page. Translate it from a similar page from other language. | |||
to | * static/index_''lang''.html: is the activity home page. Will have links to good pages to start to explore. | ||
If you create your favorite list based in a translation of the home page from other language, would be a good idea translate the home page too. | |||
Now, you can test your changes, starting the wikipedia server: | '''DEPRECATED, SEE BELOW:''' Now, you can test your changes, starting the wikipedia server: | ||
./ | ./activity_''lang''.py es_lat/eswiki-20111112-pages-articles.xml 8000 | ||
The first parameter is your xml data file and the second parameter a number of port. | The first parameter is your xml data file and the second parameter a number of port. | ||
| Line 161: | Line 179: | ||
[[File:Wikipedia_test.png]] | [[File:Wikipedia_test.png]] | ||
Finally, to create the new .xo file and distribute it, you must do: | Finally, to create the new .xo file and distribute it, you must do: | ||
| Line 171: | Line 185: | ||
Now, in the directory dist, a new .xo file will be created and you can distribute it. | Now, in the directory dist, a new .xo file will be created and you can distribute it. | ||
=== Notes on updates in the process === | |||
After version 38, with the intention of make more standard the process to allow package the activity | |||
in distributions, we added a standard setup.py. To use it, is needed add the wikipedia initialization | |||
parameters to the activity.info file, as is displayed in the file activity.info.en_simple | |||
https://github.com/godiard/wikipedia-activity/blob/master/activity/activity.info.en_simple | |||
[Wikipedia] | |||
path = en_simple/simplewiki-20130724-pages-articles.xml | |||
port = 8011 | |||
home_page = /static/index_en_simple.html | |||
templateprefix = Template: | |||
wpheader = From Wikipedia, The Free Encyclopedia | |||
wpfooter = Content available under the | |||
<a href="/static/es-gfdl.html">GNU Free Documentation License</a>. | |||
<br/> Wikipedia is a registered trademark of the non-profit | |||
Wikimedia Foundation, Inc.<br/><a href="/static/about_en.html"> | |||
About Wikipedia</a> | |||
resultstitle = Search results for '%s'. | |||
Another change important is that now is not needed create a activity_<lang>.py file, | |||
because the activity starts and read the config from the activity.info file, the "exec" line need be: | |||
exec = sugar-activity activity.WikipediaActivity | |||
Then to create the .xo you can do: | |||
./setup.py dist_xo es_lat/eswiki-20111112-pages-articles.xml | |||
or to create the sources tar.bz2 file: | |||
./setup.py dist_source es_lat/eswiki-20111112-pages-articles.xml | |||
With this new version, testing the wiki can be done on the command line doing: | |||
./test_server.py es_lat/eswiki-20111112-pages-articles.xml 8000 | |||
The two parameters are optional, if are not provided, the parameters in activity.info file will be used. | |||
== Other changes needed == | == Other changes needed == | ||
| Line 177: | Line 231: | ||
If after finish the process of the files, the images are not displayed in the pages, check if the image identifier is included in the set imageKeywords in the file mwlib/parser.py. For example, in the Quechua wikipedia, the image identifier is "rikcha" and we needed add it because was not included. | If after finish the process of the files, the images are not displayed in the pages, check if the image identifier is included in the set imageKeywords in the file mwlib/parser.py. For example, in the Quechua wikipedia, the image identifier is "rikcha" and we needed add it because was not included. | ||
== More tools == | |||
=== Big image files === | |||
There are cases where a small group of images are very big, if you want remove them to have a smaller activity, can do: | |||
mkdir big-images | |||
find images -size +100k -exec mv {} big-images \; | |||
(in this example, moving images with more than 100k to another directory) | |||
== Old information == | |||
http://wiki.laptop.org/go/User:Godiard/WkipediaDataRebuild | |||