Changes

USpeak (view source)

Revision as of 08:29, 28 March 2009

10 bytes added , 08:29, 28 March 2009

no edit summary

Line 80: Line 80:

The above mentioned goals are very long term goals and some of those will need active participation from the community. I have already made progress with Steps 1 and 6 (and these are continuous tasks in the background to help improve the accuracy).

−

'''I propose to implement steps 2, 3, 4 and 5 in GSoC. As the basic speech engine is working, these steps can be treated as independent of the ~~later steps~~ and will have immediate benefits.''' i.e.

+

'''I propose to implement steps 2, 3, 4 and 5 in GSoC. As the basic speech engine is working, these steps can be treated as independent of the other tasks and will have immediate benefits.''' i.e.

−

# Writing a system service.

+

# Writing a system service that has support for recognition of characters and simple words

−

~~# Enabling~~ recognition of characters.

# Demonstrating its use with activities like Listen and Spell.

−

~~'''The rest this wiki page will refer to the steps proposed for GSoC 09 as project.'''~~

'''I. The Speech Service:'''

−

The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can be 'initiated' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).

+

The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can be 'initiated' and 'stopped' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).

So our flow is:

Line 136: Line 133:

# User activates the Speech Service

# The service listens for audio when a particular key combo is pressed (a small popup/notification can be shown to show that the daemon is listening).

−

# Service passes the spoke audio to the Speech Engine.

+

# Service passes the spoke audio to the Speech Engine after the key combo is pressed again to stop the listening.

# The output of the speech engine is grabbed and analysis is done on it to see what the input is.

# If the input is a recognized 'Word Command' then it performs that command otherwise it generates key events as mentioned above and is sent to the currently focused window.

Mavu

52

edits

Changes

USpeak (view source)

Revision as of 08:29, 28 March 2009

Navigation menu

Search