Summer of Code/2013/Translation Server

From Sugar Labs
< Summer of Code‎ | 2013
Revision as of 10:28, 3 May 2013 by Erik Price (talk | contribs) (Created page with "<noinclude>{{TOCright}} Category:2013 GSoC applications Category:GSoC </noinclude> ====About you==== '''What is your name?''' Erik Price '''What is your email addr...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


About you

What is your name?

Erik Price

What is your email address?

erik@erikprice.net

What is your Sugar Labs wiki username?

Erik Price

What is your IRC nickname?

boredomist / bdmst on Freenode

What is your primary language? (We have mentors who speak multiple languages and can match you with one of them if you'd prefer.)

English

Where are you located, and what hours do you tend to work? (We also try to match mentors by general time zone if possible.)

New Jersey, United States. UTC-4. Working hours are very flexible since this GSoC project will most likely be my primary focus this summer. I will however set aside a time (likely around or after 6-7PM) where I will be present daily on IRC to check in, answer questions, get help, and update the mentor on progress.

Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?

Yes, I maintain a Github profile with most of my projects here.

There are some other kinds of contributions that don't show up there, like reports and fixes on various bug trackers, documentation fixes, and involvement in community IRC channels.

About your project

What is the name of your project?

Pluggable Translation Server.

Doesn't really roll off the tongue so well, huh?

Describe your project in 10-20 sentences. What are you making? Who are you making it for, and why do they need it? What technologies (programming languages, etc.) will you be using?

Very short version: Server that provides machine translations using multiple backend plugins with a client API to access it from XOs (or wherever else)

As a global project, internationalization is a central tenet of Sugar and OLPC. The aim of this project is to establish a server program and client API that can be used in activities to introduce a way to reliably access quality machine translations of arbitrary strings.

Since accurate machine translation is a computationally and memory expensive operation, it is not reasonable to expect good results from running directly on an XO. A server to supply these translations to a larger network of XOs is therefore a preferable solution to create these translations.

As not all translators are created equal for all possible language pairs, or may not be possible in a given situation (due to hardware, software, monetary, etc. limitations), it is advantageous to give our translation server program the ability to access multiple services, via a plug-in architecture.

For example, Google Translate will likely offer very high quality translations for many language pairs, but the associated cost of $20 USD/1 Million translated characters through the API means that it is irresponsible to require it. Likewise, a FOSS project such as Apertium may well provide good es<->en translations, but has no way of translating e.g. de->ru, which limits the global usefulness.

To overcome these obstacles, pooling all possible translation sources into a single server allows a convenient and consistent means of providing reliable machine translation for any purpose.

After creating the server, a pair of translation backend plugins, and the client API, I will create a simple translation Activity using the service to guarantee the viability of the server/client.

This project will be built entirely in Python, using minimal dependencies on the server (a suitable web library is essentially all that is needed) and no external dependencies on the client. In an effort to be as general and broadly applicable as possible, the server will operate using HTTP requests and JSON, which should allow the server to be usable from essentially any other programming language that a client API could be written for. All components of this project will be developed using copious tests and continuous integration testing to identify issues quickly and ensure quality.

The first order of business for this project is to establish a minimal-dependency Python HTTP server application with a plug-in architecture to facilitate any interested developer to add machine translator backends later on in the project.

Along with this, some initial backends will of course need to be created. I plan to add one that would run on the same server, and one that would use a web service, to ensure the robustness and generality of the server architecture.

The first of these plug-ins would be using Apertium, the FOSS project already used by Sugar through the #meeting-es irc channel on freenode. Next, Bing Translate will likely be added, due to it being one of the major web translators that provides a free API key.

Google Translate is another high priority service due to its quality, but will not be added initially because its API has no free tier for usage.

Some of these other plugins will be considered for any remaining time left at the end of the project, but these are of course far lower priority than the initial two systems, and will only be added during GSoC if possible. (If not, I'll likely just add some other systems after GSoC has finished)

The next leg in the project will involve creating a Python client API to request and receive translations from a given server. This will of course be designed before any coding starts on the server, and will be designed to be as generic and straightforward to use as possible, so it can be used easily and efficiently even outside of the sugar environment.

From the point of view of the client API, the backend the server is using to actually translate the text is unimportant. it will just send a call to the server, specifying the language pair and source text, and receive a resultant string, or appropriate error. The server will handle selecting the appropriate translator and any fallbacks that may be needed.

The API user need only specify the source text and the language pair to translate in order to interact with the server.

A new Translate activity will be developed in addition. This activity will be very, very simplistic, and while functional, essentially a demo of how to use the client API and server. This will also allow me to give some additional real world testing to the programs, so that any potential issues can be caught while there's still time to fix them.

What is the timeline for development of your project? The Summer of Code work period is June 17 - September 23; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (August 2); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then.

Week Plan
June 17 - June 24 Begin creating the server application. I'll have a skeleton of the server by the end of the week which while returning fake data, should at least have the basic structure and functionality of the end product.
June 25 - July 1 Do any finalizing that needs to be done for the server architecture, this means establishing the plugin system that will be used as well as any outstanding functionality.
July 2 - July 8 This week, I'll begin working on some translation backends. I will create a plugin for Apertium and a web based service (yet undecided, possibly Bing Translate) to test the server. Any issues that are found with the server should also be fixed this week.
July 9 - July 15 This week I will finish up any work remaining on the first two translation backends or the server in general, and will begin working on the client API.
July 16 - July 22 This week will be dedicated to establishing the primary functionality of the client API, along with relevant tests and fixes to the server, as needed.
July 23 - July 29 Before midterm evaluations, I want to polish the client API by writing more tests, examples, and documentation, as well as fleshing out any functionality that was not finished in the previous week.
Midterm Evaluations
July 30 - August 5 This week I will begin work on the translation Activity for Sugar using the client API and server. Any issues identified during the midterm evaluations should also be addressed this week.
August 6 - August 12 I will continue working on the translation activity, hopefully completing functionality by the end of this week and testing thoroughly.
August 13 - August 19 This week will be dedicated to finishing up any loose ends left in the project, tidying up source code, writing more tests, fixing outstanding bugs, and so on.
August 20 - August 26 Similar to the previous week, I aim to work on refactoring any code that requires it and writing documentation. This will be the last week I will be able to dedicate full time to the project, so I will try to get to a point where something could be considered done as is, working only to polish and improve functionality from here on.
August 27 - September 2 I will be relocating back to my university during this week, so any changes made this week and later will focus on documentation or bug fixes.
September 3 - September 9 This week I begin my fall semester, and as such will not have time to dedicate to the project full time, for the next two weeks, I will write more test cases, documentation, respond to any issues or bugs that are found, and so on.
September 10 - September 16 Ditto.
End of Summer of Code

Convince us, in 5-15 sentences, that you will be able to successfully complete your project in the timeline you have described. This is usually where people describe their past experiences, credentials, prior projects, schoolwork, and that sort of thing, but be creative. Link to prior work or other resources as relevant.

So why should you believe that I'll be able to reach my goals within the timeline provided? Well, I believe I have a very solid idea and plan on how to effectively get my work done. There is also some flexibility built into the timeline, to allow for me to take longer than expected for some segments without throwing off the entire project. In addition, after having worked on scientific computing for the past 9 months, I am incredibly enthusiastic to jump to a higher level language and work on an interesting project that doesn't involve segmentation faults and checking individual bits of output for where bugs are being created.

I absolutely love learning new things, experimenting, and creating, and working on a Summer of Code project would give me an opportunity to spend an entire summer doing just that. What could be better than that?

And of course, I think this is a very useful project, one that I can get excited about from both a technical standpoint as well as a practicality standpoint. After working with OLPC this past semester, I've been very excited to get involved.

You and the community

If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.

I think the greatest benefit of this project will be its generality. This server could, in theory, be used from any programming language capable of making an HTTP request and parsing JSON. That makes it very broadly applicable. More specific to Sugar Labs though, by the end of this project, there will be a de facto solution to providing high quality machine translations anywhere within Sugar. This makes it trivial to access translations without code duplication or boilerplate. So why is that useful? Because while machine translation cannot and should not replace proper language learning, it can be a great aid to quickly improving understanding or seeing the meaning of a certain word or phrase in context. In my opinion, making the accessibility of this type of translation trivial opens up a whole world of possibilities for various activities in Sugar that deal with language learning and internationalization. That is, at least for me, an exciting concept.

Sugar Labs will be working to set up a small (5-30 unit) Sugar pilot near each student project that is accepted to GSoC so that you can immediately see how your work affects children in a deployment. We will make arrangements to either supply or find all the equipment needed. Do you have any ideas on where you would like your deployment to be, who you would like to be involved, and how we can help you and the community in your area begin it?

My township library does various summer classes for younger children in order to encourage education, reading, art, and so on. I've volunteered there over previous summers and had a great time working with the kids. If possible, I would like to work with the library to bring the laptops into the existing programs, either to reenforce what's already being done, or just to add a nice change of pace for the students. My experience working with 3rd and 4th grade students this semester (mainly focusing on Turtle Art and Scratch) gave me the idea to apply to this Summer of Code project in the first place, and I feel that it would be a nice way to incorporate my previous experience.

What will you do if you get stuck on your project and your mentor isn't around?

The answer to this of course depends on what exactly I get stuck on, but Sugar has a lovely community in IRC that are very helpful with questions, and of course there is the mailing list. If I were to get stuck on a Python problem, as another example, I'd be in a similar situation, there is a very lively community on #python on Freenode, as well as channels dedicated to some of the larger frameworks/libraries I might be using. And lastly, there is of course Google. With these resources, I very much doubt that I'll ever end up in a situation where I get completely stuck.

How do you propose you will be keeping the community informed of your progress and any problems or questions you might have over the course of the project?

I naturally plan to do this development entirely in the open, which allows anyone interested to browse the commit messages of the repository and get a sense of my project. I will also maintain a daily changelog that will be published somewhere at all times, and then sent to the mailing list on a weekly basis along with some additional comments on the week's progress and any deviation from the intended timeline. In addition to this, I also plan on utilizing the IRC community to the fullest extent possible, and ask questions and probe for feedback there as needed.


Miscellaneous

We want to make sure that you can set up a development environment before the summer starts. Please send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "Restart." See the image on the right for an example. It's normal to need assistance with this, so please visit our IRC channel, #sugar on irc.freenode.net, and ask for help.

ErikPriceDeveloperChallenge.png

What is your t-shirt size? (Yes, we know Google asks for this already; humor us.)

Men's medium.

Describe a great learning experience you had as a child.

When I was in elementary school, I was part of the gifted and talented program, which met once a week, and instead of going to regular class, I spent the day with maybe 15 other students as we did projects and exercises. The most memorable of these was the month or so long project in which we had to design an island nation of our creation. This involved researching different styles of governments, setting up how the politics of the island would be structured, as well as things like making topography and political maps of the island on paper, and then building them out in clay. I still have a poster at home of the language I created for this island (it was English except with a different alphabet and pronunciation for each letter).

Looking back on this project, it was such a wonderful learning experience for me. The instructor of the course took a single project objective (design an island) and managed to teach us about such a wide array of topics in an engaging way. I think projects like Sugar have the capability of creating experiences like this, and are a marvelous resource to educators everywhere.

Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?

This past semester, I started working with a student-run OLPC chapter at my school, and it's been a wonderful experience for me. In the past, the group had done work in other countries, but this past year, we focused our efforts locally, and worked with a local after-school program for elementary school children. We met with around 10 of these children every week (first group was third graders, second was fourth graders) and worked with several XOs to have them learn some basics of programming. Working with Turtle Art and Scratch, we had them do various little games and tasks. It was wonderful to see the whole idea of programming click with some kids, and pushing them to find out the way to do things on their own. Some needed a lot more help, of course, but a couple of children in each group took off right away, programming little games in Scratch or fun patterns in Turtle Art.

I had heard of Sugar and OLPC before this, of course, but participating in this organization really opened my eyes to the immense potential these little machines and bits of software have for creating a rich, interactive educational experience.


Thanks for considering the project, I hope to be working with you all soon!