Line 97: |
Line 97: |
| I have been working on my engineering thesis project, which has a strong focus on speech recognition and voice-enabled user interfaces, for almost a year now. Its title loosely translates to: “Design of Speech Recognition Based User Interfaces”. Some of my early work can be found at: https://github.com/jorgeramirez/step | | I have been working on my engineering thesis project, which has a strong focus on speech recognition and voice-enabled user interfaces, for almost a year now. Its title loosely translates to: “Design of Speech Recognition Based User Interfaces”. Some of my early work can be found at: https://github.com/jorgeramirez/step |
| | | |
− | As a part of my thesis, I developed an voice-user interface to control TamTam Listens, an existing Sugar Activity for music composition. For TamTam Listens, I programmed a daemon process to run the Pocketsphinx speech recognition engine, which produced text output based on user-pronounced voice commands. Text output was later parsed to get the commands in the appropriate format. | + | As a part of my thesis, I developed an voice-user interface to control TamTam Edit, an existing Sugar Activity for music composition. In order to provide speech recognition functionality to TamTam Edit, I programmed a daemon process to run the Pocketsphinx speech recognition engine, which produced text output based on user-pronounced voice commands. |
| | | |
− | Recognized commands were published through a D-Bus service which allowed TamTam Listens to integrate speech recognition with | + | As input to Pocketsphinx, I used the Voxforge spanish acoustic model and defined a custom grammar-based language model in JSGF format. |
− | minimum coupling. The final step was to modify TamTam Listens in order to make the graphical interface respond to the commands. Later on, a usability study was conducted with 12 users in order to draw conclusions about speech-based user interfaces. | + | Text output produced by the engine was later parsed to get the commands in the appropriate format. Recognized commands were published through a D-Bus service which allowed TamTam Edit to integrate speech recognition capabilities with minimum coupling. The last development step was to modify TamTam Edit in order to make the graphical interface respond to the commands. |
| + | |
| + | Implementing TamTam Listens (custom name for TamTam Edit + speech recognition) involved solving issues related to software integration |
| + | and speech recognition itself, like handling out-of-vocabulary words. After development was over, a usability study was conducted with 12 users in order to draw conclusions about speech-based user interfaces. |
| | | |
| The architecture of the developed solution resembles the one included in the project description to a great degree. I used, and in consequence I am familiar with, Pockesphinx, Voxforge and D-Bus. Although some improvements are still needed, such as multi-language support, I believe my experience with the field and the tools would be of great help to the success of the project. | | The architecture of the developed solution resembles the one included in the project description to a great degree. I used, and in consequence I am familiar with, Pockesphinx, Voxforge and D-Bus. Although some improvements are still needed, such as multi-language support, I believe my experience with the field and the tools would be of great help to the success of the project. |