Changes

Jump to navigation Jump to search
297 bytes added ,  08:20, 28 March 2009
no edit summary
Line 70: Line 70:  
# '''''Writing a system service that will take speech as an input and generate corresponding keystrokes and then proceed as if the input was given through the keyboard.''''' (This method was suggested by Benjamin M. Schwartz as an simpler approach as compared to writing a speech library in Python which would use DBUS to connect the engine to the activities in which case changes have to be made to the existing activities to use the library.)
 
# '''''Writing a system service that will take speech as an input and generate corresponding keystrokes and then proceed as if the input was given through the keyboard.''''' (This method was suggested by Benjamin M. Schwartz as an simpler approach as compared to writing a speech library in Python which would use DBUS to connect the engine to the activities in which case changes have to be made to the existing activities to use the library.)
 
# '''''Starting with recognition of alphabets of a language rather than full-blown speech recognition.''''' This will give an achievable target for the summer of code. As the alphabet set is limited to a small number for most languages, this target will be feasible considering both computational power requirements and attainable efficiency.
 
# '''''Starting with recognition of alphabets of a language rather than full-blown speech recognition.''''' This will give an achievable target for the summer of code. As the alphabet set is limited to a small number for most languages, this target will be feasible considering both computational power requirements and attainable efficiency.
 +
# '''''Introduce a command mode.''''' This would be based on the system service mentioned in step 2 but would differ in interpretation of speech. It will handle speech as commands instead of stream of characters.
 
# '''''Demonstrating its use by applying it to activities like listen and spell which can benefit immediately from this feature.''''' (see the benefits section below.)
 
# '''''Demonstrating its use by applying it to activities like listen and spell which can benefit immediately from this feature.''''' (see the benefits section below.)
 
# '''''Create acoustic models where the corpus is recorded by children and where the dictionary maps to the vocabulary of children to improve recognition.''''' (I have been working on creating acoustic models for Indian English and Hindi. This part needs active community participation to bring in support for more languages. The Qt application can come in handy for anyone who is interested in contributing.)
 
# '''''Create acoustic models where the corpus is recorded by children and where the dictionary maps to the vocabulary of children to improve recognition.''''' (I have been working on creating acoustic models for Indian English and Hindi. This part needs active community participation to bring in support for more languages. The Qt application can come in handy for anyone who is interested in contributing.)
 
# '''''Use the model in activities like Speak and implement a dictation activity.'''''
 
# '''''Use the model in activities like Speak and implement a dictation activity.'''''
# '''''Introduce a command mode.''''' This would be based on the system service mentioned in step 2 but would differ in interpretation of speech. It will handle speech as commands instead of stream of characters.
+
 
    
=====Proposal for GSoC 09=====
 
=====Proposal for GSoC 09=====
Line 79: Line 80:  
The above mentioned goals are very long term goals and some of those will need active participation from the community. I have already made progress with Steps 1 and 3 (which could go on concurrently).
 
The above mentioned goals are very long term goals and some of those will need active participation from the community. I have already made progress with Steps 1 and 3 (which could go on concurrently).
   −
'''I propose to implement steps 2, 3 and 4 in GSoC. As the basic speech engine is working, these steps can be treated as independent of the later steps and will have immediate benefits.''' i.e.
+
'''I propose to implement steps 2, 3, 4 and 5 in GSoC. As the basic speech engine is working, these steps can be treated as independent of the later steps and will have immediate benefits.''' i.e.
    
# Writing a system service.
 
# Writing a system service.
Line 90: Line 91:     
The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can be 'initiated' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).
 
The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can be 'initiated' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).
  −
I will be using (and have been using) Juilus as the speech recognition tool. Julius is suited for both dictation (continuous speech recognition) and command and control. A grammar-based recognition parser named "Julian" is integrated into Julius which is modified to use hand-designed DFA grammar as a language model. And hence it is suited for voice command system of small vocabulary, or various spoken dialog system tasks.
      
So our flow is:
 
So our flow is:
Line 154: Line 153:     
Beyond this I would like to implement an activity that can quite well demonstrate the use of this service. I plan to Implement Speak Spell which will be a spelling activity where children can spell out the words show to them. Single character recognition can have very high recognition rates.  
 
Beyond this I would like to implement an activity that can quite well demonstrate the use of this service. I plan to Implement Speak Spell which will be a spelling activity where children can spell out the words show to them. Single character recognition can have very high recognition rates.  
 +
 +
 +
'''Technologies used'''
 +
 +
I will be using (and have been using) Juilus as the speech recognition tool. Julius is suited for both dictation (continuous speech recognition) and command and control. A grammar-based recognition parser named "Julian" is integrated into Julius which is modified to use hand-designed DFA grammar as a language model. And hence it is suited for voice command system of small vocabulary, or various spoken dialog system tasks.
 +
 +
The coding will be done in C and shell scripts and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.
    
----
 
----
52

edits

Navigation menu