Line 1: |
Line 1: |
| ''"Crear uno, dos, tres... mil Wikipedias" Comandante Ernesto Wales'' | | ''"Crear uno, dos, tres... mil Wikipedias" Comandante Ernesto Wales'' |
| + | |
| + | === Object of this HowTo === |
| + | |
| + | This HowTo explains how to update the data files in the wikipedia activities or how to create new activities with other languages or different selections of articles. |
| + | |
| + | The procedure is not very difficult if you already have a Sugar environment setup. If you have doubts or the information provided is not adequate, please contact me at ''gonzalo at laptop dot org'' or in the sugar-devel mailing list and I will try to help and improve this page. |
| + | |
| + | If you want to create a wikipedia activity in your language, and do not have the technical resources, but can help translating a few files and doing quality control, contact me and I will help you to create the activity. |
| | | |
| === How to Create a new wikipedia activity or update an existing activity === | | === How to Create a new wikipedia activity or update an existing activity === |
| | | |
| This page describes how to generate the data files needed to create a wikipedia activity like | | This page describes how to generate the data files needed to create a wikipedia activity like |
− | [http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en] | + | [http://activities.sugarlabs.org/es-ES/sugar/addon/4401 Wikipedia es] or [http://activities.sugarlabs.org/es-ES/sugar/addon/4411 Wikipedia en]. |
− | | |
− | The general idea is to download an XML dump-file (backup) containing the current Wikipedia pages for a given language, this will be processed to select certain pages and compress them into a self-contained Sugar activity. Whether or not to include the images from the wiki articles will have a large impact on the size of the activity.
| |
| | | |
− | Generating a Wikipedia activity requires a computer with a lot of available disk space, ideally lots of RAM and a working Sugar environment. It is probably best to use packages provided by your favorite Linux distribution or in a virtual machine. The wikipedia xml file is very large (almost 6 GB for the Spanish wikipedia, and it is even bigger in English), and you will need lots of space to generate temporary files. The process has a long run-time, but it is mostly automated, although you will need to confirm success at each stage of the process before moving on to the next.
| + | The general idea is to download an XML dump-file (backup) containing the current Wikipedia pages for a given language, then process the dump and select certain pages and compress them into a self-contained Sugar activity. Whether or not to include the images from the wiki articles will have a large impact on the size of the activity. |
| | | |
− | This page is a work in progress. If you have doubts or the information provided is not adequate, please contact me at gonzalo at laptop dot org and I will try to improve it.
| + | Generating a Wikipedia activity requires a computer with a lot of available disk space, ideally lots of RAM and a working Sugar environment. It is probably best to use packages provided by your favorite Linux distribution or in a virtual machine. The wikipedia xml file is very large (almost 6 GB for the Spanish wikipedia, and it is even bigger in English), and you will need lots of space to generate temporary files. The process does take a lot of time, but it is mostly automated, although you will need to confirm success at each stage of the process before moving on to the next one. |
| | | |
| == Download the wikipedia base activity == | | == Download the wikipedia base activity == |
| | | |
− | You will need to download the wikipedia base from http://dev.laptop.org/~gonzalo/wikiserver/WikipediaBase-33.xo. This package includes the activity and the tools to create the data files. | + | You will need to download the wikipedia base from http://dev.laptop.org/~gonzalo/wikiserver/WikipediaBase-35.xo. This package includes the activity and the tools to create the data files. |
| | | |
− | You need unzip it in your Activities directory, or install it, if you do not have other wikipedia activity already installed. | + | You need to unzip it in your Activities directory, or install it, if you do not have another wikipedia activity already installed. |
| | | |
− | The git repository is here http://dev.laptop.org/git/projects/wikiserver | + | The git repository is here https://github.com/godiard/wikipedia-activity . |
| | | |
| == Download a Wikipedia dump file== | | == Download a Wikipedia dump file== |
Line 108: |
Line 114: |
| To have faster results we will apply templates substitutions in all the pages. | | To have faster results we will apply templates substitutions in all the pages. |
| | | |
− | == Optimze the data and download images == | + | == Optimize the data and download images == |
| | | |
| To expand the templates you need go out of the data directory: | | To expand the templates you need go out of the data directory: |
Line 122: |
Line 128: |
| | | |
| mv eswiki-20111112-pages-articles.xml.processed_expanded eswiki-20111112-pages-articles.xml.processed | | mv eswiki-20111112-pages-articles.xml.processed_expanded eswiki-20111112-pages-articles.xml.processed |
− | ../tools2/create_index.py --delete_all | + | ../tools2/create_index.py --delete_old |
| | | |
− | The option --delete_all is used to remove the old index | + | The option --delete_old is used to remove the old index |
| | | |
| If you want to include images in your wikipedia activity, you can go again to your data directory and do: | | If you want to include images in your wikipedia activity, you can go again to your data directory and do: |
Line 141: |
Line 147: |
| == Modify your activity to use the data files == | | == Modify your activity to use the data files == |
| | | |
− | You need can modify the file activity_es.py and modify the lines: | + | To create a wikipedia in a new language, you will need create the following files: |
| + | |
| + | * activity/activity.info.''lang'': is the activity.info file for your language. You can copy |
| + | one from other language, and modify the name, the bundle_id, the icon and the exec line. |
| + | |
| + | * activity/activity-wikipedia-''lang''.svg: is the activity icon. The file can be copied from |
| + | another language, and modify with a text editor the last text element, to put the labugage code. |
| + | If you need edit the image with a graphic editor (like Inkscape) remember add the entities lines |
| + | in the header and replace the entities for stroke_color and fill_color, after that. |
| + | |
| + | |
| + | * '''DEPRECATED, SEE BELOW:''' activity_''lang''.py: is the startup class, sets the configuration values and starts the server. |
| + | You can copy the class from another language and set the parameters. You need set the name of the class, |
| + | equal than the value in the exec value in the activity/activity.info.lang file. |
| | | |
− | self.WIKIDB = 'es_new/eswiki-20111112-pages-articles.xml'
| + | * static/about_''lang''.html: Is a static about page. Translate it from a similar page from other language. |
− | self.HOME_PAGE = '/static/index_es.html'
| |
| | | |
− | to point to your new data files or create a new different file, for example activity_pt.py. | + | * static/index_''lang''.html: is the activity home page. Will have links to good pages to start to explore. |
| + | If you create your favorite list based in a translation of the home page from other language, would be a good idea translate the home page too. |
| | | |
− | If you create a new file, you will need to modify the file activity/activity.info to point to this new file.
| |
| | | |
− | Now, you can test your changes, starting the wikipedia server: | + | '''DEPRECATED, SEE BELOW:''' Now, you can test your changes, starting the wikipedia server: |
| | | |
− | ./server.py es_lat/eswiki-20111112-pages-articles.xml 8000 | + | ./activity_''lang''.py es_lat/eswiki-20111112-pages-articles.xml 8000 |
| | | |
| The first parameter is your xml data file and the second parameter a number of port. | | The first parameter is your xml data file and the second parameter a number of port. |
Line 161: |
Line 179: |
| | | |
| [[File:Wikipedia_test.png]] | | [[File:Wikipedia_test.png]] |
− |
| |
− | You can create a new icon too, or modify the existing activity/activity-wikipedia-es.svg file.
| |
− |
| |
− | If you are creating a new wikipedia activity, it's important change the name and the bundle_id in the activity/activity.info file. If you are updating the data in a existinting activity, the activity_version value must be changed.
| |
| | | |
| Finally, to create the new .xo file and distribute it, you must do: | | Finally, to create the new .xo file and distribute it, you must do: |
Line 171: |
Line 185: |
| | | |
| Now, in the directory dist, a new .xo file will be created and you can distribute it. | | Now, in the directory dist, a new .xo file will be created and you can distribute it. |
| + | |
| + | === Notes on updates in the process === |
| + | |
| + | After version 38, with the intention of make more standard the process to allow package the activity |
| + | in distributions, we added a standard setup.py. To use it, is needed add the wikipedia initialization |
| + | parameters to the activity.info file, as is displayed in the file activity.info.en_simple |
| + | |
| + | https://github.com/godiard/wikipedia-activity/blob/master/activity/activity.info.en_simple |
| + | |
| + | [Wikipedia] |
| + | path = en_simple/simplewiki-20130724-pages-articles.xml |
| + | port = 8011 |
| + | home_page = /static/index_en_simple.html |
| + | templateprefix = Template: |
| + | wpheader = From Wikipedia, The Free Encyclopedia |
| + | wpfooter = Content available under the |
| + | <a href="/static/es-gfdl.html">GNU Free Documentation License</a>. |
| + | <br/> Wikipedia is a registered trademark of the non-profit |
| + | Wikimedia Foundation, Inc.<br/><a href="/static/about_en.html"> |
| + | About Wikipedia</a> |
| + | resultstitle = Search results for '%s'. |
| + | |
| + | Another change important is that now is not needed create a activity_<lang>.py file, |
| + | because the activity starts and read the config from the activity.info file, the "exec" line need be: |
| + | |
| + | exec = sugar-activity activity.WikipediaActivity |
| + | |
| + | Then to create the .xo you can do: |
| + | |
| + | ./setup.py dist_xo es_lat/eswiki-20111112-pages-articles.xml |
| + | |
| + | or to create the sources tar.bz2 file: |
| + | |
| + | ./setup.py dist_source es_lat/eswiki-20111112-pages-articles.xml |
| + | |
| + | With this new version, testing the wiki can be done on the command line doing: |
| + | |
| + | ./test_server.py es_lat/eswiki-20111112-pages-articles.xml 8000 |
| + | |
| + | The two parameters are optional, if are not provided, the parameters in activity.info file will be used. |
| | | |
| == Other changes needed == | | == Other changes needed == |
Line 177: |
Line 231: |
| | | |
| If after finish the process of the files, the images are not displayed in the pages, check if the image identifier is included in the set imageKeywords in the file mwlib/parser.py. For example, in the Quechua wikipedia, the image identifier is "rikcha" and we needed add it because was not included. | | If after finish the process of the files, the images are not displayed in the pages, check if the image identifier is included in the set imageKeywords in the file mwlib/parser.py. For example, in the Quechua wikipedia, the image identifier is "rikcha" and we needed add it because was not included. |
| + | |
| + | == More tools == |
| + | |
| + | === Big image files === |
| + | |
| + | There are cases where a small group of images are very big, if you want remove them to have a smaller activity, can do: |
| + | |
| + | mkdir big-images |
| + | find images -size +100k -exec mv {} big-images \; |
| + | |
| + | (in this example, moving images with more than 100k to another directory) |
| + | |
| + | == Old information == |
| + | |
| + | http://wiki.laptop.org/go/User:Godiard/WkipediaDataRebuild |