USpeak: Difference between revisions

Line 81:

# Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell.

# Introduce modes in the system service. Dictation mode will process input as a stream of characters ~~as described in deliverable 1~~ and ~~a new mode called~~ command mode will process the audio input to recognize a known set of commands.

# Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands.

# Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs.

Line 113:

Dictation Mode:

This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.

In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.

// Get the currently focused window.

Line 165:

Major Components:

# A language model browser which shows all the current samples and dictionary. Can create new ones or delete ~~exisiting~~ ones.

# A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones.

# Ability to edit/record new samples and input new dictionary entries and save changes.

Line 177:

The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.

The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands.

----

Line 206:

Line 207:

'''Fourth Week:'''

* Add a few basic commands.

* Implement the mode menu.

* ~~Add~~ command mode.

* Put the existing functionality in command mode and make provisions of the dictation mode.

'''~~Milstone~~ 1 Completed'''

'''Milestone 1 Completed'''

'''Fifth Week:'''

* Complete the interface

* Start writing code for the ~~model~~ browser and recorder.

* Start writing code for the language browser and recorder.

'''Sixth Week:'''

* Complete the language ~~model~~ browser.

* Complete the language browser.

* Write down the recording and dictionary creation code for the tool.

* Package everything in an activity.

Line 230:

Line 232:

'''Infinity and Beyond:'''

* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :)

* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities.

@@ Line 81: / Line 81: @@
 # Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell.
-# Introduce modes in the system service. Dictation mode will process input as a stream of characters as described in deliverable 1 and a new mode called command mode will process the audio input to recognize a known set of commands.
+# Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands.
 # Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs.
@@ Line 113: / Line 113: @@
 Dictation Mode:
-This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
+In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
   // Get the currently focused window.
@@ Line 165: / Line 165: @@
 Major Components:
-# A language model browser which shows all the current samples and dictionary. Can create new ones or delete exisiting ones.
+# A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones.
 # Ability to edit/record new samples and input new dictionary entries and save changes.
@@ Line 177: / Line 177: @@
 The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time.
+The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands.
 ----
@@ Line 206: / Line 207: @@
 '''Fourth Week:'''
+* Add a few basic commands.
 * Implement the mode menu.
-* Add command mode.
+* Put the existing functionality in command mode and make provisions of the dictation mode.
-'''Milstone 1 Completed'''
+'''Milestone 1 Completed'''
 '''Fifth Week:'''
 * Complete the interface
-* Start writing code for the model browser and recorder.
+* Start writing code for the language browser and recorder.
 '''Sixth Week:'''
-* Complete the language model browser.
+* Complete the language browser.
 * Write down the recording and dictionary creation code for the tool.
 * Package everything in an activity.
@@ Line 230: / Line 232: @@
 '''Infinity and Beyond:'''
-* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :)
+* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities.