Summer of Code/2010/speech-recognition: Difference between revisions

Created page with '{{TOCright}} ====About you==== Q.1: '''What is your name? A: Chirag Jain ---- Q.2: '''What is your email address? A: chiragjain1989{AT}gmail{DOT}com ---- …'
 
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 75: Line 75:
Q.8:  '''What is the name of your project?
Q.8:  '''What is the name of your project?


A:      Speech Recognition
A:      Sugar Voice Control


----
----
Line 142: Line 142:
    
    
:      A keyboard shortcut like <Alt+S> can also be provided for starting speech recognition. The corresponding hooks for the key shortcut must be made in the Sugar UI source code.
:      A keyboard shortcut like <Alt+S> can also be provided for starting speech recognition. The corresponding hooks for the key shortcut must be made in the Sugar UI source code.
:    '''Gnome Voice Control to Sugar Voice Control
:    Gnome Voice Control is a Gnome Desktop Voice Control system which allows to control the entire system by speaking commands.
:    The system consists in an application that will be monitoring the audio input(microphone) and when a significant audio signal has been detected, the software catches, processes and recognizes the signal and then executes the desired action over the Gnome Desktop.
:    For more details please visit: http://live.gnome.org/GnomeVoiceControl
:    Gnome Voice Control uses Pocket Sphinx. The idea is to sugarize it to implement "Sugar Voice Conrol"
:    A block view of the above implementation plan is as shown below:
[[Image:Svc.jpg|center|Block view of Sugar Voice Control]]


----   
----   
Line 147: Line 163:
Q.10: '''What is the timeline for development of your project? The Summer of Code work period is 7 weeks long, May 23 - August 10; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (July 6-13); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then.
Q.10: '''What is the timeline for development of your project? The Summer of Code work period is 7 weeks long, May 23 - August 10; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (July 6-13); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then.


:    [TODO]
:    '''Tasks Division:
:    As I have already mentioned, a lot of features can be implemented around Speech Recognition. I have sub-divided my proposal into  following parts:
 
:    a) My first priority this summer is to enable "Sugar Voice Control". This includes:


:    1. Testing Pocket Sphinx on Sugar
:    2. Studying more about Gnome Voice Control.
:    3. Sugarizing the Gnome Voice Control.
:    4. A command line interface that will start speech recognition in the background and will start taking "Speech Commands".
:    b) After the successful implementation of Sugar Voice control, we can then look into providing speech recognized text to unmodified sugar activities. Thus activities like Write can be made to get the required inputs either from Keyboard or through microphone. This includes:
:    1.  Providing a Speech recognition button in the sugar frame (for example on Top Right hand side) which when clicked will automatically start recognizing speech in the background. Clicking the same button again will stop the recognition process.
:    2.  A key board shortcut like Alt+S for starting speech recognition
:    3. Speech recognition control panel for controlling the various parameters.
:    c) The last part can be creating an API for providing easy Speech Recognition access to activity developers.
:    My aim is to atleast achieve part a) this summer and if time permits I would also like to implement part b). Part c) can be taken care off later.
:    '''Detailed time line:
:    Present to May 24 (before actual working for GsoC starts): I will be studying more about Gnome Voice Control and Pocket Sphinx. Upto this time I will be sure and confident about how Sugar Voice Control has to be derived from Gnome Voice Control. Also we require to test the compatibility of Pocket Sphinx on Sugar.
:  '''Weekdays
:  During this time, I am involved in studies too. I am having classes from morning 9:30 AM to Evening 4:30 PM. Thus from Present to end May I will be working around 2 hours per day between 8 PM to 11 PM (IST).
:  '''Weekends
:  I have weekends off, so I can spare around 4 hours per day on weekends. During weekends I can communicate with my mentor any time suitable for him/her.
   
:    From May end I will be getting my summer break which will continue till August end. Thus I will be completely free of any other distraction and thus can spare all my energies on development. During this period I can spare around 4-5 hours per day. Again I can communicate with the mentor any time as I have the habit of working late night too.   
:    May 24 to June 13: Sugarizing the Gnome Voice Control to obtain "Sugar Voice Control". Implementation of a Command line interface, which will run the speech recognition in the background and will take the simple speech commands like open an activity, go to home or desktop, close activity etc.
:    June 14 - June 25: Test the implemented framework of Sugar Voice Control on limited resource devices like the XO-1.0. Take the community feedback on the current implementation. Add more "Control Commands" to the framework after discussions.
:    Thus upto end June we should be completed with the implementation of part a) as mentioned above.
:    June 26- July 11: Implementation of Sugar Voice Control button in the GUI. This button will be implemented in the sugar frame (for example on Top Right hand side) which when clicked will automatically start recognizing speech in the background. Clicking the same button again will stop the recognition process. Implementation of Sugar Voice Control Panel as mentioned in the GUI considerations part.
:    Thus before mid term evaluations we should be done with the part a) and part b) as mentioned above.
:    July 12-July 16: Submitting mid term evaluations.
:    July 17 - July 30: Creating different Language models and datasets so that "Sugar Voice Control" can support different types of Languages.     
:    Aug 1 - Aug 8: Testing the different language models on XOs. Specifically I would like to create a language model for recognizing Hindi control commands. Then I would like to test the implementation in a Primary school situated in my locality.
:  Aug 9- Aug 16: Documenting the entire work and specially how to create language models. I have gone through some tutorials on how to create them, but most of them are very complicated. I would like to create a simple documentation, so that anyone can create simple language models of their favourite languages. In this way Sugar Voice Control will be extensible for multilingual users.
 
----
----


Line 166: Line 229:
Q.12:  '''If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.
Q.12:  '''If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.


A:    [TODO]
A:    '''My answer:
:      If Sugar Voice Control gets successfully implemented, then it will greatly increase the usability of Sugar. This is because now sugar can be controlled by physically challenged children too and thus Sugar will have a reach to a greater section of users.
----
----


Q.13:  '''Sugar Labs will be working to set up a small (5-30 unit) Sugar pilot near each student project that is accepted to GSoC so that you can immediately see how your work affects children in a deployment. We will make arrangements to either supply or find all the equipment needed. Do you have any ideas on where you would like your deployment to be, who you would like to be involved, and how we can help you and the community in your area begin it?
Q.13:  '''Sugar Labs will be working to set up a small (5-30 unit) Sugar pilot near each student project that is accepted to GSoC so that you can immediately see how your work affects children in a deployment. We will make arrangements to either supply or find all the equipment needed. Do you have any ideas on where you would like your deployment to be, who you would like to be involved, and how we can help you and the community in your area begin it?


A:      [TODO]
A:      As I already mentioned, I would like to implement Hindi language models too that will help me testing the framework in my locality. We have some primary schools where students know Hindi very well although they have poor English speaking skills. So testing with Hindi Language and seeing how this affects the children will be a great idea and I am more than happy to set up the Sugar plot. 
----
----


Line 214: Line 278:
Q.19:  '''Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?
Q.19:  '''Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?


A:    [TODO]
A:    Nopes :-)


<noinclude>[[Category:2009 GSoC applications]]</noinclude>
<noinclude>[[Category:2010 GSoC applications]]</noinclude>


[[Category:GSoC]]
[[Category:GSoC]]