Changes

Summer of Code/2010/speech-recognition (view source)

Revision as of 04:47, 5 April 2010

2,258 bytes added , 04:47, 5 April 2010

no edit summary

Line 41: Line 41:

Q.7: '''Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?

−

A: Yes, I have been actively involved in open source projects from last one year. As a Software Engineer, Products and Services at SEETA, New Delhi, India http://seeta.in, I am mangaing the design and development of speech related projects. Please visit my profile at http://seeta.in/j/team.html

+

A: Yes, I have been actively involved in open source projects from last one year. As a Software Engineer, Products and : Services at SEETA, New Delhi, India http://seeta.in, I am mangaing the design and development of speech related projects. Please visit my profile at http://seeta.in/j/team.html

: My Major contributions are:

Line 83: Line 83:

A: Sugar has got all the potential to become an excellent educational platform. One particular problem that I feel with current version of sugar is the lack of features that can help even physically challenged users to interact with the system easily. This limits us to reach this section of chidren. But when we have technology, then why to restrict ourselves?

−

: My project for this summer, aims at integrating Speech recognition into sugar that will open whole new set of opportunities both for Activity developers and end users (especially for physically ~~challenged~~.)

+

: My project for this summer, aims at integrating Speech recognition into sugar that will open whole new set of opportunities both for Activity developers and end users (especially for physically challlenged.)

Q.10: '''What is Speech Recognition?

Line 91: Line 91:

Q.11: '''How Speech Recognition can help Sugar become better?

−

A: As I mentioned previously, speech recognition can help physcially challenged children to interact with a system running sugar. Imagine a child who is not able to operate keypad and touchpad can now open the activities by just speaking "Open Write Activity". They can even type into the write activity and others by simply speaking the appropriate commands. This is more of less like the Microsoft Speech Recognition system, where you can control the entire Windows by just speaking commands.

+

A: As I mentioned previously, speech recognition can help physcially challenged children to interact with a system running sugar. Imagine a child who is not able to operate keypad and touchpad can now open the activities by just speaking "Open Write Activity" or "Open turtle art" etc. They can even type into the write activity and others by simply speaking the appropriate commands. This is more of less like the Microsoft Speech Recognition system, where you can control the entire Windows by just speaking commands.

: Correct Pronunciation is the first lesson given in any educational system. With the help of Speech recognition, we can develop activities to conduct automatic oral testing. We can create language models, for particular set of words and if a child is speaking them correctly then they should be properly recognized or not.

Line 116: Line 116:

: For a speech recognition system, we require a Speech recognition engine that can be integrated into sugar over which we can develop the entire framework. The major requirements of such an engine are:

−

: 1. It should be capable of running on Linux.

+

: 1. It should be capable of running on Linux which is the core of sugar.

: 2. It should be open source so that we can modify it accordingly as per our needs and requirements.

: 3. It should not consume a lot of memory during run time.

Line 131: Line 131:

: 3. Sphinx 4

−

: Sphinx 4 is the latest version which has been developed entirely in JAVA. Sphinx 3 and pocket sphinx are older versions but still are the famous ones. Using Sphinx 4 for integration in sugar does not seem feasible because it has been written in JAVA. So we are left with two options of either using Sphinx 3 or Pocket Sphinx. Now the decision between these two can only be made by experimenting them with sugar. This will also depend on the devices currently being aimed by sugar and thus the main focus will be on OLPC XO laptops. The XOs have 256 MB of RAM and the run time requirement of Pocket Sphinx is around 20 MB ~~whereas~~ the requirements of Sphinx 3 is more than 30 MB. Pocket Sphinx is light weight and is designed primarily for embedded devices like PDA. Sphinx 3 on the other hand is developed to run on desktops and consumes considerable amount of memory. So at least Pocket Sphinx can be implemented in sugar and the feasibility of Sphinx 3 will be tested soon.

+

: Sphinx 4 is the latest version which has been developed entirely in JAVA. Sphinx 3 and pocket sphinx are older versions but still are the famous ones. Using Sphinx 4 for integration in sugar does not seem feasible because it has been written in JAVA. So we are left with two options of either using Sphinx 3 or Pocket Sphinx. Now the decision between these two can only be made by experimenting them with sugar. This will also depend on the devices currently being aimed by sugar and thus the main focus will be on OLPC XO laptops. The XOs have 256 MB of RAM and the run time requirement of Pocket Sphinx is around 20 MB. At this time I am not sure about the requirements of Sphinx 3 but this should be more than 30 MB. Pocket Sphinx is light weight and is designed primarily for embedded devices like PDA. Sphinx 3 on the other hand is developed to run on desktops and consumes considerable amount of memory. So at least Pocket Sphinx can be implemented in sugar and the feasibility of Sphinx 3 will be tested soon.

: '''Language Support

−

: Sphinx engines require training data sets and language models for recognizing speech. Thus we can set them to recognize many languages. At present they have been tested for recognizing ~~English,~~ Chinese, Spanish, Dutch, German, Hindi, Italic, Icelandic and Russian successfully. Thus we can target a wide range of users belonging to different parts of world speaking different languages. I have collected all this data after discussion with a Sphinx developer on IRC and I am testing the Sphinx 3 and Pocket sphinx too.

+

: Sphinx engines require training data sets and language models for recognizing speech. Thus we can set them to recognize many languages. At present they have been tested for recognizing Chinese, Spanish, Dutch, German, Hindi, Italic, Icelandic and Russian successfully. Thus we can target a wide range of users belonging to different parts of world speaking different languages. I have collected all this data after discussion with a Sphinx developer on IRC and I am testing the Sphinx 3 and Pocket sphinx too.

: '''GUI considerations

Line 142: Line 142:

: A keyboard shortcut like <Alt+S> can also be provided for starting speech recognition. The corresponding hooks for the key shortcut must be made in the Sugar UI source code.

+

: '''Gnome Voice Control to Sugar Voice Control

+

: Gnome Voice Control is a Gnome Desktop Voice Control system which allows to control the entire system by speaking commands.

+

: The system consists in an application that will be monitoring the audio input(microphone) and when a significant audio signal has been detected, the software catches, processes and recognizes the signal and then executes the desired action over the Gnome Desktop.

+

: For more details please visit: http://live.gnome.org/GnomeVoiceControl

+

: Gnome Voice Control uses Pocket Sphinx. The idea is to sugarize it to implement "Sugar Voice Conrol"

+

----

Line 147: Line 159:

Q.10: '''What is the timeline for development of your project? The Summer of Code work period is 7 weeks long, May 23 - August 10; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (July 6-13); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then.

−

: ~~[TODO]~~

+

: '''Tasks Division:

+

: As I already mentioned, a lot of features can be implemented around Speech Recognition. I have sub-divided my proposal into following parts:

+

: a) My first priority this summer is to enable "Sugar Voice Control". This includes:

+

: 1. Testing Pocket Sphinx on Sugar

+

: 2. Studying more about Gnome Voice Control.

+

: 3. Sugarizing the Gnome Voice Control.

+

: 4. A command line interface that will start speech recognition in the background and will start taking "Speech Commands".

+

: b) After the successful implementation of Sugar Voice control, we can then look into providing speech recognized text to unmodified sugar activities. Thus activities like Write can be made to get the required inputs either from Keyboard or through microphone. This includes:

+

: 1. Providing a Speech recognition button in the sugar frame (for example on Top Right hand side) which when clicked will automatically start recognizing speech in the background. Clicking the same button again will stop the recognition process.

+

: 2. A key board shortcut like Alt+S for starting speech recognition

+

: 3. Speech recognition control panel for controlling the various parameters.

+

: c) The last part can be creating an API for providing easy Speech Recognition access to activity developers.

+

: My aim is to atleast achieve part a) this summer and if time permits I would also like to implement part b). Part c) can be taken care off later.

+

: I will be presenting a more detailed time line.

----

Chiragjain1989

67

edits

Changes

Summer of Code/2010/speech-recognition (view source)

Revision as of 04:47, 5 April 2010

Navigation menu

Search