Changes

Summer of Code/2010/speech-recognition (view source)

Revision as of 05:17, 3 April 2010

88 bytes removed , 05:17, 3 April 2010

no edit summary

Line 41: Line 41:

Q.7: '''Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?

−

A: Yes, I have been actively involved in open source projects from last one year. As a Software Engineer, Products and : Services at SEETA, New Delhi, India http://seeta.in, I am mangaing the design and development of speech related projects. Please visit my profile at http://seeta.in/j/team.html

+

A: Yes, I have been actively involved in open source projects from last one year. As a Software Engineer, Products and Services at SEETA, New Delhi, India http://seeta.in, I am mangaing the design and development of speech related projects. Please visit my profile at http://seeta.in/j/team.html

: My Major contributions are:

Line 83: Line 83:

A: Sugar has got all the potential to become an excellent educational platform. One particular problem that I feel with current version of sugar is the lack of features that can help even physically challenged users to interact with the system easily. This limits us to reach this section of chidren. But when we have technology, then why to restrict ourselves?

−

: My project for this summer, aims at integrating Speech recognition into sugar that will open whole new set of opportunities both for Activity developers and end users (especially for physically ~~challlenged~~.)

+

: My project for this summer, aims at integrating Speech recognition into sugar that will open whole new set of opportunities both for Activity developers and end users (especially for physically challenged.)

Q.10: '''What is Speech Recognition?

Line 91: Line 91:

Q.11: '''How Speech Recognition can help Sugar become better?

−

A: As I mentioned previously, speech recognition can help physcially challenged children to interact with a system running sugar. Imagine a child who is not able to operate keypad and touchpad can now open the activities by just speaking "Open Write Activity" ~~or "Open turtle art" etc~~. They can even type into the write activity and others by simply speaking the appropriate commands. This is more of less like the Microsoft Speech Recognition system, where you can control the entire Windows by just speaking commands.

+

A: As I mentioned previously, speech recognition can help physcially challenged children to interact with a system running sugar. Imagine a child who is not able to operate keypad and touchpad can now open the activities by just speaking "Open Write Activity". They can even type into the write activity and others by simply speaking the appropriate commands. This is more of less like the Microsoft Speech Recognition system, where you can control the entire Windows by just speaking commands.

: Correct Pronunciation is the first lesson given in any educational system. With the help of Speech recognition, we can develop activities to conduct automatic oral testing. We can create language models, for particular set of words and if a child is speaking them correctly then they should be properly recognized or not.

Line 116: Line 116:

: For a speech recognition system, we require a Speech recognition engine that can be integrated into sugar over which we can develop the entire framework. The major requirements of such an engine are:

−

: 1. It should be capable of running on Linux ~~which is the core of sugar~~.

+

: 1. It should be capable of running on Linux.

: 2. It should be open source so that we can modify it accordingly as per our needs and requirements.

: 3. It should not consume a lot of memory during run time.

Line 131: Line 131:

: 3. Sphinx 4

−

: Sphinx 4 is the latest version which has been developed entirely in JAVA. Sphinx 3 and pocket sphinx are older versions but still are the famous ones. Using Sphinx 4 for integration in sugar does not seem feasible because it has been written in JAVA. So we are left with two options of either using Sphinx 3 or Pocket Sphinx. Now the decision between these two can only be made by experimenting them with sugar. This will also depend on the devices currently being aimed by sugar and thus the main focus will be on OLPC XO laptops. The XOs have 256 MB of RAM and the run time requirement of Pocket Sphinx is around 20 MB~~. At this time I am not sure about~~ the requirements of Sphinx 3 ~~but this should be~~ more than 30 MB. Pocket Sphinx is light weight and is designed primarily for embedded devices like PDA. Sphinx 3 on the other hand is developed to run on desktops and consumes considerable amount of memory. So at least Pocket Sphinx can be implemented in sugar and the feasibility of Sphinx 3 will be tested soon.

+

: Sphinx 4 is the latest version which has been developed entirely in JAVA. Sphinx 3 and pocket sphinx are older versions but still are the famous ones. Using Sphinx 4 for integration in sugar does not seem feasible because it has been written in JAVA. So we are left with two options of either using Sphinx 3 or Pocket Sphinx. Now the decision between these two can only be made by experimenting them with sugar. This will also depend on the devices currently being aimed by sugar and thus the main focus will be on OLPC XO laptops. The XOs have 256 MB of RAM and the run time requirement of Pocket Sphinx is around 20 MB whereas the requirements of Sphinx 3 is more than 30 MB. Pocket Sphinx is light weight and is designed primarily for embedded devices like PDA. Sphinx 3 on the other hand is developed to run on desktops and consumes considerable amount of memory. So at least Pocket Sphinx can be implemented in sugar and the feasibility of Sphinx 3 will be tested soon.

: '''Language Support

−

: Sphinx engines require training data sets and language models for recognizing speech. Thus we can set them to recognize many languages. At present they have been tested for recognizing Chinese, Spanish, Dutch, German, Hindi, Italic, Icelandic and Russian successfully. Thus we can target a wide range of users belonging to different parts of world speaking different languages. I have collected all this data after discussion with a Sphinx developer on IRC and I am testing the Sphinx 3 and Pocket sphinx too.

+

: Sphinx engines require training data sets and language models for recognizing speech. Thus we can set them to recognize many languages. At present they have been tested for recognizing English, Chinese, Spanish, Dutch, German, Hindi, Italic, Icelandic and Russian successfully. Thus we can target a wide range of users belonging to different parts of world speaking different languages. I have collected all this data after discussion with a Sphinx developer on IRC and I am testing the Sphinx 3 and Pocket sphinx too.

: '''GUI considerations

Chiragjain1989

67

edits