USpeak: Difference between revisions
No edit summary |
No edit summary |
||
| Line 81: | Line 81: | ||
# Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell. | # Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell. | ||
# Introduce modes in the system service. Dictation mode will process input as a stream of characters | # Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands. | ||
# Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs. | # Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs. | ||
| Line 113: | Line 113: | ||
Dictation Mode: | Dictation Mode: | ||
This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done. | In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done. | ||
// Get the currently focused window. | // Get the currently focused window. | ||
| Line 165: | Line 165: | ||
Major Components: | Major Components: | ||
# A language model browser which shows all the current samples and dictionary. Can create new ones or delete | # A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones. | ||
# Ability to edit/record new samples and input new dictionary entries and save changes. | # Ability to edit/record new samples and input new dictionary entries and save changes. | ||
| Line 177: | Line 177: | ||
The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time. | The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time. | ||
The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands. | |||
---- | ---- | ||
| Line 206: | Line 207: | ||
'''Fourth Week:''' | '''Fourth Week:''' | ||
* Add a few basic commands. | |||
* Implement the mode menu. | * Implement the mode menu. | ||
* | * Put the existing functionality in command mode and make provisions of the dictation mode. | ||
''' | '''Milestone 1 Completed''' | ||
'''Fifth Week:''' | '''Fifth Week:''' | ||
* Complete the interface | * Complete the interface | ||
* Start writing code for the | * Start writing code for the language browser and recorder. | ||
'''Sixth Week:''' | '''Sixth Week:''' | ||
* Complete the language | * Complete the language browser. | ||
* Write down the recording and dictionary creation code for the tool. | * Write down the recording and dictionary creation code for the tool. | ||
* Package everything in an activity. | * Package everything in an activity. | ||
| Line 230: | Line 232: | ||
'''Infinity and Beyond:''' | '''Infinity and Beyond:''' | ||
* Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. | * Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. | ||