Speech-recognition: Difference between revisions
No edit summary |
No edit summary |
||
| (3 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
{{TOCright}} | |||
====About you==== | ====About you==== | ||
| Line 141: | Line 142: | ||
: A keyboard shortcut like <Alt+S> can also be provided for starting speech recognition. The corresponding hooks for the key shortcut must be made in the Sugar UI source code. | : A keyboard shortcut like <Alt+S> can also be provided for starting speech recognition. The corresponding hooks for the key shortcut must be made in the Sugar UI source code. | ||
: '''Gnome Voice Control to Sugar Voice Control | |||
: Gnome Voice Control is a Gnome Desktop Voice Control system which allows to control the entire system by speaking commands. | |||
: The system consists in an application that will be monitoring the audio input(microphone) and when a significant audio signal has been detected, the software catches, processes and recognizes the signal and then executes the desired action over the Gnome Desktop. | |||
: For more details please visit: http://live.gnome.org/GnomeVoiceControl | |||
: Gnome Voice Control uses Pocket Sphinx. The idea is to sugarize it to implement "Sugar Voice Conrol" | |||
---- | ---- | ||
| Line 146: | Line 159: | ||
Q.10: '''What is the timeline for development of your project? The Summer of Code work period is 7 weeks long, May 23 - August 10; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (July 6-13); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then. | Q.10: '''What is the timeline for development of your project? The Summer of Code work period is 7 weeks long, May 23 - August 10; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (July 6-13); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then. | ||
: | : '''Tasks Division: | ||
: As I already mentioned, a lot of features can be implemented around Speech Recognition. I have sub-divided my proposal into following parts: | |||
: a) My first priority this summer is to enable "Sugar Voice Control". This includes: | |||
: 1. Testing Pocket Sphinx on Sugar | |||
: 2. Studying more about Gnome Voice Control. | |||
: 3. Sugarizing the Gnome Voice Control. | |||
: 4. A command line interface that will start speech recognition in the background and will start taking "Speech Commands". | |||
: b) After the successful implementation of Sugar Voice control, we can then look into providing speech recognized text to unmodified sugar activities. Thus activities like Write can be made to get the required inputs either from Keyboard or through microphone. This includes: | |||
: 1. Providing a Speech recognition button in the sugar frame (for example on Top Right hand side) which when clicked will automatically start recognizing speech in the background. Clicking the same button again will stop the recognition process. | |||
: 2. A key board shortcut like Alt+S for starting speech recognition | |||
: 3. Speech recognition control panel for controlling the various parameters. | |||
: c) The last part can be creating an API for providing easy Speech Recognition access to activity developers. | |||
: My aim is to atleast achieve part a) this summer and if time permits I would also like to implement part b). Part c) can be taken care off later. | |||
: '''Detailed time line: | |||
: Present to May 24 (before actual working for GsoC starts): I will be studying more about Gnome Voice Control and Pocket Sphinx. Upto this time I will be sure and confident about how Sugar Voice Control has to be derived from Gnome Voice Control. Also we require to test the compatibility of Pocket Sphinx on Sugar. | |||
: May 24 to June 13: Sugarizing the Gnome Voice Control to obtain "Sugar Voice Control". Implementation of a Command line interface, which will run the speech recognition in the background and will take the simple speech commands like open an activity, go to home or desktop, close activity etc. | |||
: June 14 - June 25: Test the implemented framework of Sugar Voice Control on limited resource devices like the XO-1.0. Take the community feedback on the current implementation. Add more "Control Commands" to the framework after discussions. | |||
: Thus upto end June we should be completed with the implementation of part a) as mentioned above. | |||
: June 26- July 11: Implementation of Sugar Voice Control button in the GUI. This button will be implemented in the sugar frame (for example on Top Right hand side) which when clicked will automatically start recognizing speech in the background. Clicking the same button again will stop the recognition process. Implementation of Sugar Voice Control Panel as mentioned in the GUI considerations part. | |||
: Thus before mid term evaluations we should be done with the part a) and part b) as mentioned above. | |||
: July 12-July 16: Submitting mid term evaluations. | |||
: July 17 - July 30: Creating different Language models and datasets so that "Sugar Voice Control" can support different types of Languages. | |||
: Aug 1 - Aug 8: Testing the different language models on XOs. Specifically I would like to create a language model for recognizing Hindi control commands. Then I would like to test the implementation in a Primary school situated in my locality. | |||
: Aug 9- Aug 16: Documenting the entire work and specially how to create language models. I have gone through some tutorials on how to create them, but most of them are very complicated. I would like to create a simple documentation, so that anyone can create simple language models of their favourite languages. In this way Sugar Voice Control will be extensible for multilingual users. | |||
---- | ---- | ||
| Line 215: | Line 269: | ||
A: [TODO] | A: [TODO] | ||
<noinclude>[[Category: | <noinclude>[[Category:2010 GSoC applications]]</noinclude> | ||
[[Category:GSoC]] | [[Category:GSoC]] | ||