Changes

342 bytes added , 16:01, 8 April 2009

m

fix camelcase links

Line 41: Line 41:

Ans: Yes, I was introduced to Open Source through GSoC last year where I worked on Bootlimn: Extending Bootchart to use Systemtap for The Fedora Project. ( http://code.google.com/p/bootlimn/ ) or ( http://code.google.com/p/google-summer-of-code-2008-fedora/ ). I am currently working on the following projects.

#Introducing Speech Recognition in OLPC and making a dictation activity. ( http://wiki.laptop.org/go/Speech_to_Text )

−

#Introducing Java Profiling in Systemtap.(A work from home internship for Red Hat Inc.). This project involved extensive research which took most of the past 4 months I have been working on it. Coding has just begun.

+

#Introducing Java Profiling in Systemtap (A work from home internship for Red Hat Inc.). This project involved extensive research which took most of the past 4 months I have been working on it. Coding has just begun.

#A sentiment analysis project for Indian financial markets. (My B. Tech major project that I plan to release under GPLv2.) I can put up the source code on https://blogs-n-stocks.dev.java.net/ after mid-April when I am done with my final evaluations in my college.

Line 65: Line 65:

I have been working towards achieving this goal for the past 6 months. The task can be accomplished by breaking the problem into the following smaller subsets and tackling them one by one:

−

# '''''Port an existing speech engine to ~~the~~ less powerful computers like XO.''''' This has been a part of the work that I have been doing so far. I chose Julius as the Speech engine as it is lighter and written in C. I have been able to compile Julius on the XO and am continuing to optimize it to make it work faster. Also XO-1 is the bare minimum case on which I'll be testing it. If it works on this it will most certainly work anywhere else.

+

# '''''Port an existing speech engine to less powerful computers like XO.''''' This has been a part of the work that I have been doing so far. I chose Julius as the Speech engine as it is lighter and written in C. I have been able to compile Julius on the XO and am continuing to optimize it to make it work faster. Also XO-1 is the bare minimum case on which I'll be testing it. If it works on this it will most certainly work anywhere else.

# '''''Writing a system service that will take speech as an input and generate corresponding keystrokes and then proceed as if the input was given through the keyboard.''''' This method was suggested by Benjamin M. Schwartz as a simpler approach as compared to writing a speech library in Python (which would use DBUS to connect the engine to the activities) in which case changes have to be made to the existing activities to use the library.

# '''''Starting with recognition of alphabets of a language rather than full-blown speech recognition.''''' This will give an achievable target for the initial stages. As the alphabet set is limited to a small number for most languages, this target will be feasible considering both computational power requirements and attainable efficiency.

Line 81: Line 81:

# Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell.

−

# Introduce modes in the system service. Dictation mode will process input as a stream of characters ~~as described in deliverable 1~~ and ~~a new mode called~~ command mode will process the audio input to recognize a known set of commands.

+

# Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands.

# Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs.

Line 113: Line 113:

Dictation Mode:

−

This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.

+

In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.

// Get the currently focused window.

Line 127: Line 127:

The above code will send one character to the window. This can be looped to generate a continuous stream (An even nicer way to do this would be set a timer delay to make it look like a typed stream).

−

Note 1: Sayamindu has pointed me to XTEST extension as well which seems to be the easier way. I'll do some research on that and write back my findings in this section. It has useful routines like XTestFakeKeyEvent, XTestFakeButtonEvent, etc which will make life more easier in this task.

−

Note 2: Bemasc suggested I include the single character recognition (voice typing) in command mode by treating letters as commands to type out the letters and have the dictation mode exclusively for word dictation to avoid ambiguity in dictation mode for words like tee, bee etc.

Command Mode:

Similarly, a whole host of events can be catered to using the X11 Input Server. Words like "Close" etc (which will be defined in a list of commands that the engine will recognize) need not be parsed and broken into letters and can just be sent as events like XCloseDisplay().

+

'''''Note 1: Sayamindu has pointed me to XTEST extension as well which seems to be the easier way. I'll do some research on that and write back my findings in this section. It has useful routines like XTestFakeKeyEvent, XTestFakeButtonEvent, etc which will make life more easier in this task.'''''

+

'''''Note 2: Bemasc suggested I include the single character recognition (voice typing) in command mode by treating letters as commands to type out the letters and have the dictation mode exclusively for word dictation to avoid ambiguity in dictation mode for words like tee, bee etc.'''''

+

All of this basically needs to be wrapped in a single service that can run in the background. That service can be implemented as a Sugar Feature that enables starting and stopping of this service.

Line 163: Line 165:

Major Components:

−

# A language model browser which shows all the current samples and dictionary. Can create new ones or delete ~~exisiting~~ ones.

+

# A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones.

# Ability to edit/record new samples and input new dictionary entries and save changes.

Line 175: Line 177:

The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.

+

The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands.

----

Line 204: Line 207:

'''Fourth Week:'''

+

* Add a few basic commands.

* Implement the mode menu.

−

* ~~Add~~ command mode.

+

* Put the existing functionality in command mode and make provisions of the dictation mode.

−

'''~~Milstone~~ 1 Completed'''

+

'''Milestone 1 Completed'''

'''Fifth Week:'''

* Complete the interface

−

* Start writing code for the ~~model~~ browser and recorder.

+

* Start writing code for the language browser and recorder.

'''Sixth Week:'''

−

* Complete the language ~~model~~ browser.

+

* Complete the language browser.

* Write down the recording and dictionary creation code for the tool.

* Package everything in an activity.

Line 228: Line 232:

'''Infinity and Beyond:'''

−

* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :)

+

* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities.

Line 288: Line 292:

If my mentor is not around,

# The first thing I will do is try to Google.

−

# If I cannot find a solution, I will more specifically go through the Mailing list archives wikis and forums of ~~Sugarlabs~~, Julius or Xorg depending on where I am stuck.

+

# If I cannot find a solution, I will more specifically go through the Mailing list archives wikis and forums of Sugar Labs, Julius or Xorg depending on where I am stuck.

# If I can still not find a solution, then I will ask on the respective IRC channels and Mailing Lists.

Line 301: Line 305:

===Miscellaneous===

[[Image:SatyaScreenshot.png|thumb|right| Screenshot]]

−

Q1. '''We want to make sure that you can set up a [[~~DevelopmentTeam~~#Development_systems|development environment]] before the summer starts. Please send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "Restart."'''

+

Q1. '''We want to make sure that you can set up a [[Development Team#Development_systems|development environment]] before the summer starts. Please send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "Restart."'''

Ans: Screenshot on right.

Dfarning

2,751

edits

Changes

USpeak (view source)

Revision as of 16:01, 8 April 2009

Navigation menu

Search