Changes

Summer of Code/2014/Sugar Listens (view source)

Revision as of 14:21, 21 March 2014

448 bytes added , 14:21, 21 March 2014

m

no edit summary

Line 97: Line 97:

I have been working on my engineering thesis project, which has a strong focus on speech recognition and voice-enabled user interfaces, for almost a year now. Its title loosely translates to: “Design of Speech Recognition Based User Interfaces”. Some of my early work can be found at: https://github.com/jorgeramirez/step

−

As a part of my thesis, I developed an voice-user interface to control TamTam ~~Listens~~, an existing Sugar Activity for music composition. ~~For~~ TamTam ~~Listens~~, I programmed a daemon process to run the Pocketsphinx speech recognition engine, which produced text output based on user-pronounced voice commands~~. Text output was later parsed to get the commands in the appropriate format~~.

+

As a part of my thesis, I developed an voice-user interface to control TamTam Edit, an existing Sugar Activity for music composition. In order to provide speech recognition functionality to TamTam Edit, I programmed a daemon process to run the Pocketsphinx speech recognition engine, which produced text output based on user-pronounced voice commands.

−

Recognized commands were published through a D-Bus service which allowed TamTam ~~Listens~~ to integrate speech recognition with

+

As input to Pocketsphinx, I used the Voxforge spanish acoustic model and defined a custom grammar-based language model in JSGF format.

−

minimum coupling. The ~~final~~ step was to modify TamTam ~~Listens~~ in order to make the graphical interface respond to the commands. ~~Later on~~, a usability study was conducted with 12 users in order to draw conclusions about speech-based user interfaces.

+

Text output produced by the engine was later parsed to get the commands in the appropriate format. Recognized commands were published through a D-Bus service which allowed TamTam Edit to integrate speech recognition capabilities with minimum coupling. The last development step was to modify TamTam Edit in order to make the graphical interface respond to the commands.

+

Implementing TamTam Listens (custom name for TamTam Edit + speech recognition) involved solving issues related to software integration

+

and speech recognition itself, like handling out-of-vocabulary words. After development was over, a usability study was conducted with 12 users in order to draw conclusions about speech-based user interfaces.

The architecture of the developed solution resembles the one included in the project description to a great degree. I used, and in consequence I am familiar with, Pockesphinx, Voxforge and D-Bus. Although some improvements are still needed, such as multi-language support, I believe my experience with the field and the tools would be of great help to the success of the project.

Rparra

8

edits

Changes

Summer of Code/2014/Sugar Listens (view source)

Revision as of 14:21, 21 March 2014

Navigation menu

Search