Changes

Jump to navigation Jump to search
2,860 bytes added ,  16:01, 8 April 2009
m
fix camelcase links
Line 41: Line 41:  
Ans: Yes, I was introduced to Open Source through GSoC last year where I worked on Bootlimn: Extending Bootchart to use Systemtap for The Fedora Project. ( http://code.google.com/p/bootlimn/ ) or ( http://code.google.com/p/google-summer-of-code-2008-fedora/ ). I am currently working on the following projects.
 
Ans: Yes, I was introduced to Open Source through GSoC last year where I worked on Bootlimn: Extending Bootchart to use Systemtap for The Fedora Project. ( http://code.google.com/p/bootlimn/ ) or ( http://code.google.com/p/google-summer-of-code-2008-fedora/ ). I am currently working on the following projects.
 
#Introducing Speech Recognition in OLPC and making a dictation activity. ( http://wiki.laptop.org/go/Speech_to_Text )
 
#Introducing Speech Recognition in OLPC and making a dictation activity. ( http://wiki.laptop.org/go/Speech_to_Text )
#Introducing Java Profiling in Systemtap.(A work from home internship for Red Hat Inc.). This project involved extensive research which took most of the past 4 months I have been working on it. Coding has just begun.
+
#Introducing Java Profiling in Systemtap (A work from home internship for Red Hat Inc.). This project involved extensive research which took most of the past 4 months I have been working on it. Coding has just begun.
 
#A sentiment analysis project for Indian financial markets. (My B. Tech major project that I plan to release under GPLv2.) I can put up the source code on https://blogs-n-stocks.dev.java.net/ after mid-April when I am done with my final evaluations in my college.
 
#A sentiment analysis project for Indian financial markets. (My B. Tech major project that I plan to release under GPLv2.) I can put up the source code on https://blogs-n-stocks.dev.java.net/ after mid-April when I am done with my final evaluations in my college.
   Line 65: Line 65:  
I have been working towards achieving this goal for the past 6 months. The task can be accomplished by breaking the problem into the following smaller subsets and tackling them one by one:
 
I have been working towards achieving this goal for the past 6 months. The task can be accomplished by breaking the problem into the following smaller subsets and tackling them one by one:
   −
# '''''Port an existing speech engine to the less powerful computers like XO.''''' This has been a part of the work that I have been doing so far. I chose Julius as the Speech engine as it is lighter and written in C. I have been able to compile Julius on the XO and am continuing to optimize it to make it work faster. Also XO-1 is the bare minimum case on which I'll be testing it. If it works on this it will most certainly work anywhere else.
+
# '''''Port an existing speech engine to less powerful computers like XO.''''' This has been a part of the work that I have been doing so far. I chose Julius as the Speech engine as it is lighter and written in C. I have been able to compile Julius on the XO and am continuing to optimize it to make it work faster. Also XO-1 is the bare minimum case on which I'll be testing it. If it works on this it will most certainly work anywhere else.
 
# '''''Writing a system service that will take speech as an input and generate corresponding keystrokes and then proceed as if the input was given through the keyboard.''''' This method was suggested by Benjamin M. Schwartz as a simpler approach as compared to writing a speech library in Python (which would use DBUS to connect the engine to the activities) in which case changes have to be made to the existing activities to use the library.
 
# '''''Writing a system service that will take speech as an input and generate corresponding keystrokes and then proceed as if the input was given through the keyboard.''''' This method was suggested by Benjamin M. Schwartz as a simpler approach as compared to writing a speech library in Python (which would use DBUS to connect the engine to the activities) in which case changes have to be made to the existing activities to use the library.
 
# '''''Starting with recognition of alphabets of a language rather than full-blown speech recognition.''''' This will give an achievable target for the initial stages. As the alphabet set is limited to a small number for most languages, this target will be feasible considering both computational power requirements and attainable efficiency.
 
# '''''Starting with recognition of alphabets of a language rather than full-blown speech recognition.''''' This will give an achievable target for the initial stages. As the alphabet set is limited to a small number for most languages, this target will be feasible considering both computational power requirements and attainable efficiency.
Line 81: Line 81:     
# Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell.
 
# Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell.
# Introduce modes in the system service. Dictation mode will process input as a stream of characters as described in deliverable 1 and a new mode called command mode will process the audio input to recognize a known set of commands.
+
# Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands.
 
# Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs.
 
# Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs.
   Line 113: Line 113:  
Dictation Mode:
 
Dictation Mode:
   −
This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
+
In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
    
  // Get the currently focused window.
 
  // Get the currently focused window.
Line 126: Line 126:  
                        
 
                        
 
The above code will send one character to the window. This can be looped to generate a continuous stream (An even nicer way to do this would be set a timer delay to make it look like a typed stream).
 
The above code will send one character to the window. This can be looped to generate a continuous stream (An even nicer way to do this would be set a timer delay to make it look like a typed stream).
  −
Sayamindu has pointed me to XTEST extension as well which seems to be the easier way. I'll do some research on that write back my finiding in this section. It has useful routines like XTestFakeKeyEvent, XTestFakeButtonEvent, etc which will make life more easier in this task.
      
Command Mode:
 
Command Mode:
    
Similarly, a whole host of events can be catered to using the X11 Input Server. Words like "Close" etc (which will be defined in a list of commands that the engine will recognize) need not be parsed and broken into letters and can just be sent as events like XCloseDisplay().
 
Similarly, a whole host of events can be catered to using the X11 Input Server. Words like "Close" etc (which will be defined in a list of commands that the engine will recognize) need not be parsed and broken into letters and can just be sent as events like XCloseDisplay().
 +
 +
'''''Note 1: Sayamindu has pointed me to XTEST extension as well which seems to be the easier way. I'll do some research on that and write back my findings in this section. It has useful routines like XTestFakeKeyEvent, XTestFakeButtonEvent, etc which will make life more easier in this task.'''''
 +
 +
'''''Note 2: Bemasc suggested I include the single character recognition (voice typing) in command mode by treating letters as commands to type out the letters and have the dictation mode exclusively for word dictation to avoid ambiguity in dictation mode for words like tee, bee etc.'''''
 +
    
All of this basically needs to be wrapped in a single service that can run in the background. That service can be implemented as a Sugar Feature that enables starting and stopping of this service.
 
All of this basically needs to be wrapped in a single service that can run in the background. That service can be implemented as a Sugar Feature that enables starting and stopping of this service.
Line 155: Line 158:       −
'''II. Make a recording tool/acitivty so Users can use it to make their own language models and improve it for their own needs:'''
+
'''II. Make a recording tool/activity so Users can use it to make their own language models and improve it for their own needs:'''
    
This tool will help users in creating new Dictionary Based Language Models. They can use this to create language models in their own language and further extend the abilities of the service by training the Speech Recognition Engine.
 
This tool will help users in creating new Dictionary Based Language Models. They can use this to create language models in their own language and further extend the abilities of the service by training the Speech Recognition Engine.
   −
The tool will have an interface similar to the one shown in the screenshot at http://wiki.laptop.org/go/Speech_to_Text (this was built in Qt and was a very simple tool). Our tool will of course follow the Sugar UI look-n-feel as it will be an acitivty built in PyGTK. It'll have a language model browser/manager and will allow modification of existing models. Users can type in the words, define their pronunciations and record the samples all within the tool itself.
+
The tool will have an interface similar to the one shown in the screenshot at http://wiki.laptop.org/go/Speech_to_Text (this was built in Qt and was a very simple tool). Our tool will of course follow the Sugar UI look-n-feel as it will be an activity built in PyGTK. It'll have a language model browser/manager and will allow modification of existing models. Users can type in the words, define their pronunciations and record the samples all within the tool itself.
    
Major Components:
 
Major Components:
# A language model browser which shows all the current samples and dictionary. Can create new ones or delete exisiting ones.
+
# A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones.
 
# Ability to edit/record new samples and input new dictionary entries and save changes.
 
# Ability to edit/record new samples and input new dictionary entries and save changes.
   Line 174: Line 177:  
The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.
 
The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.
    +
The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands.
    
----
 
----
Line 193: Line 197:  
'''Second Week:'''
 
'''Second Week:'''
 
* Complete writing the wrapper.  
 
* Complete writing the wrapper.  
* Implement a Sugar UI feature for enabling/disbling the Speech Service.
+
* Implement a Sugar UI feature for enabling/disabling the Speech Service.
    
'''Third Week:'''
 
'''Third Week:'''
Line 203: Line 207:  
'''Fourth Week:'''
 
'''Fourth Week:'''
    +
* Add a few basic commands.
 
* Implement the mode menu.
 
* Implement the mode menu.
* Add command mode.
+
* Put the existing functionality in command mode and make provisions of the dictation mode.
      −
'''Milstone 1 Completed'''
+
'''Milestone 1 Completed'''
       
'''Fifth Week:'''
 
'''Fifth Week:'''
 
* Complete the interface
 
* Complete the interface
* Start writing code for the model browser and recorder.
+
* Start writing code for the language browser and recorder.
    
'''Sixth Week:'''
 
'''Sixth Week:'''
* Complete the language model browser.
+
* Complete the language browser.
 
* Write down the recording and dictionary creation code for the tool.
 
* Write down the recording and dictionary creation code for the tool.
 
* Package everything in an activity.
 
* Package everything in an activity.
Line 227: Line 232:     
'''Infinity and Beyond:'''
 
'''Infinity and Beyond:'''
* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :)
+
* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities.
      Line 234: Line 239:  
Q4: '''Convince us, in 5-15 sentences, that you will be able to successfully complete your project in the timeline you have described.'''
 
Q4: '''Convince us, in 5-15 sentences, that you will be able to successfully complete your project in the timeline you have described.'''
   −
Ans:  I have been working on speech recognition for XO since November last year. My research has helped me understand the requirements on this project. I have made some progress as shown in http://wiki.laptop.org/go/Speech_to_Text which wiill help me in this project.  I am also familiar with the development environment.  
+
Ans:  I have been working on speech recognition for XO since November last year. My research has helped me understand the requirements on this project. I have made some progress as shown in http://wiki.laptop.org/go/Speech_to_Text which will help me in this project.  I am also familiar with the development environment.  
   −
Apart from this, I have worked on a few real life projects ( some open source as mentioned above and an Internship in HCL Infosystems) including one GSoC project for Fedora which has taught me how to work within the stipulated timeframe and accomplish the task.   
+
Apart from this, I have worked on a few real life projects ( some open source as mentioned above and an Internship in HCL Infosystems) including one GSoC project for Fedora which has taught me how to work within the stipulated time frame and accomplish the task.   
    
----
 
----
Line 253: Line 258:     
-Me
 
-Me
 +
 +
 +
 +
I see the benefits as follows:
 +
 +
# '''''A framework to stay ahead of the curve''''' It is important to experiment with and implement newer ways of interacting with Sugar. This framework lays the foundation for not only core Sugar itself, but also it has the potential to expose a new method of interaction for Activities, benefiting the entire Activity developer community.
 +
# '''''Cool demo-ability''''' This can be a very cool demo-able feature, if not anything else. During conferences, trade-shows, etc, it is essential to have ways to grab attention of random (but interested) individuals within a very short period of time. This kind of feature tends to attract attention immediately.
 +
# '''''Potential accessibility support component''''' We (as a community) need to think about Sugar a11y (accessibility) seriously. We have had numerous queries in the past about using Sugar for children with various disabilities. While speech is only a part of the puzzle (and speech recognition is a subset of the entire speech problem - synthesis being the other one), this project lays one of the fundamental cornerstone of a11y support, which in the end, should increase the appeal of Sugar for a significantly large set of use cases. I'm not sure how many "educational software" care about accessibility, but I don't think the number is very large. Sugar will have a distinct competitive advantage if we manage to pull of a11y properly.
 +
 +
[[User:SayaminduDasgupta|SayaminduDasgupta]] 22:28, 29 March 2009 (UTC)
 +
 +
 +
 +
Speech recognition, as described in this proposal, would make Sugar more effective both directly and indirectly. 
 +
 +
#Directly, it provides a capability useful to users learning literacy or language. Children often can spell letter names out loud before they can write them, so this proposal assists learners to make the name-symbol connection.
 +
#Indirectly, this proposal provides a technically marvelous capability, which will inevitably be a subject of fascination to children.  By experimenting with its behaviors in response to various sounds, children will implicitly learn about the phonemic structure of language, and about the technology of speech recognition.
 +
 +
[[User:Bemasc|Benjamin M. Schwartz(Bemasc)]]
    
----
 
----
Line 268: Line 292:  
If my mentor is not around,
 
If my mentor is not around,
 
# The first thing I will do is try to Google.
 
# The first thing I will do is try to Google.
# If I cannot find a solution, I will more specifically go through the Mailing list archives wikis and forums of Sugarlabs, Julius or Xorg depending on where I am stuck.
+
# If I cannot find a solution, I will more specifically go through the Mailing list archives wikis and forums of Sugar Labs, Julius or Xorg depending on where I am stuck.
 
# If I can still not find a solution, then I will ask on the respective IRC channels and Mailing Lists.  
 
# If I can still not find a solution, then I will ask on the respective IRC channels and Mailing Lists.  
   Line 278: Line 302:     
----
 
----
      
===Miscellaneous===
 
===Miscellaneous===
 
[[Image:SatyaScreenshot.png|thumb|right| Screenshot]]
 
[[Image:SatyaScreenshot.png|thumb|right| Screenshot]]
Q1. '''We want to make sure that you can set up a [[DevelopmentTeam#Development_systems|development environment]] before the summer starts. Please send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "Restart."'''
+
Q1. '''We want to make sure that you can set up a [[Development Team#Development_systems|development environment]] before the summer starts. Please send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "Restart."'''
    
Ans: Screenshot on right.
 
Ans: Screenshot on right.
Line 296: Line 319:  
Q3. '''Describe a great learning experience you had as a child.'''
 
Q3. '''Describe a great learning experience you had as a child.'''
   −
Ans: My mother tongue is Telugu and I was born and brought up in New Delhi where Hindi is the local language. This communication gap made me struggle in school in my Nursery and Prep classes when the kids are not very good with English. My nursery teacher Mrs. Sengupta noticed that I used to be alone and would coome and play with me often. I used to talk to her in Telugu and even though she never really understood what I am saying, she always listened and nodded. Gradually, I learnt Hindi too and was able to interact with everyone but that initial phase when my teacher had been so sweet left a lasting impact. All my school life, I was never afraid to go upto them and ask them a lot of questions. In retrospect, I realize how much I would have bothered them with my silly doubts but thankfully, they were all very good to me and even taught me a lot of stuff beyond the curriculum. This also gave me the aptitude to learn new languages much more quickly.
+
Ans: My mother tongue is Telugu and I was born and brought up in New Delhi where Hindi is the local language. This communication gap made me struggle in school in my Nursery and Prep classes when the kids are not very good with English. My nursery teacher Mrs. Sengupta noticed that I used to be alone and would come and play with me often. I used to talk to her in Telugu and even though she never really understood what I am saying, she always listened and nodded. Gradually, I learnt Hindi too and was able to interact with everyone but that initial phase when my teacher had been so sweet left a lasting impact. All my school life, I was never afraid to go up to them and ask them a lot of questions. In retrospect, I realize how much I would have bothered them with my silly doubts but thankfully, they were all very good to me and even taught me a lot of stuff beyond the curriculum. This also gave me the aptitude to learn new languages much more quickly.
    
----
 
----
2,751

edits

Navigation menu