Line 81: |
Line 81: |
| | | |
| # Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell. | | # Writing a system service that has support for recognition of characters and a demonstration that it works by running it with Listen and Spell. |
− | # Introduce modes in the system service. Dictation mode will process input as a stream of characters as described in deliverable 1 and a new mode called command mode will process the audio input to recognize a known set of commands. | + | # Introduce modes in the system service. Dictation mode will process input as a stream of characters and send corresponding keystrokes and command mode will process the audio input to recognize a known set of commands. |
| # Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs. | | # Make a recording tool/activity so Users can use it to make their own models and improve it for their own needs. |
| | | |
Line 113: |
Line 113: |
| Dictation Mode: | | Dictation Mode: |
| | | |
− | This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done. | + | In this mode, the users speech will be recognized and the corresponding keystrokes will be sent as is. This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done. |
| | | |
| // Get the currently focused window. | | // Get the currently focused window. |
Line 165: |
Line 165: |
| | | |
| Major Components: | | Major Components: |
− | # A language model browser which shows all the current samples and dictionary. Can create new ones or delete exisiting ones. | + | # A language model browser which shows all the current samples and dictionary. Can create new ones or delete existing ones. |
| # Ability to edit/record new samples and input new dictionary entries and save changes. | | # Ability to edit/record new samples and input new dictionary entries and save changes. |
| | | |
Line 177: |
Line 177: |
| The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time. | | The coding will be done in C, shell scripts and Python and recording will be done on an external computer and the compiled model will be stored on the XO. I own an XO because of my previous efforts and hence I plan to work natively on it and test the performance real time. |
| | | |
| + | The recording utility will be implemented using PyGTK for UI and <code>aplay</code> and <code>arecord</code> for play and record commands. |
| | | |
| ---- | | ---- |
Line 206: |
Line 207: |
| '''Fourth Week:''' | | '''Fourth Week:''' |
| | | |
| + | * Add a few basic commands. |
| * Implement the mode menu. | | * Implement the mode menu. |
− | * Add command mode. | + | * Put the existing functionality in command mode and make provisions of the dictation mode. |
| | | |
| | | |
− | '''Milstone 1 Completed''' | + | '''Milestone 1 Completed''' |
| | | |
| | | |
| '''Fifth Week:''' | | '''Fifth Week:''' |
| * Complete the interface | | * Complete the interface |
− | * Start writing code for the model browser and recorder. | + | * Start writing code for the language browser and recorder. |
| | | |
| '''Sixth Week:''' | | '''Sixth Week:''' |
− | * Complete the language model browser. | + | * Complete the language browser. |
| * Write down the recording and dictionary creation code for the tool. | | * Write down the recording and dictionary creation code for the tool. |
| * Package everything in an activity. | | * Package everything in an activity. |
Line 230: |
Line 232: |
| | | |
| '''Infinity and Beyond:''' | | '''Infinity and Beyond:''' |
− | * Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :) | + | * Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. |
| | | |
| | | |