Changes

Jump to navigation Jump to search
761 bytes added ,  07:05, 28 March 2009
no edit summary
Line 89: Line 89:  
I. The Speech Service:
 
I. The Speech Service:
   −
The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can 'initiate' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).
+
The speech service will be a daemon running in the background that can be activated to provide input to the Sugar Interface using speech. This daemon can be activated by the user and can be 'initiated' via a hotkey. This daemon will transfer the audio to Julius Speech Engine and will process its output to generate a stream of keystrokes and are passed as input method to other activities. Also the generated text data can be any Unicode character or text and will not be restricted to XKeyEvent data of X11 (helps in foreign languages).
 +
 
 +
I will be using (and have been using) Juilus as the speech recognition tool. Julius is suited for both dictation (continuous speech recognition) and command and control. A grammar-based recognition parser named "Julian" is integrated into Julius which is modified to use hand-designed DFA grammar as a language model. And hence it is suited for voice command system of small vocabulary, or various spoken dialog system tasks.
    
So our flow is:
 
So our flow is:
Line 97: Line 99:  
                                               |
 
                                               |
 
                                               V
 
                                               V
                                   characters/Words/Phrases
+
                                   Characters/Words/Phrases
 +
                                              |
 +
                                              | 
 +
                                              V
 +
                                        [System Service]
 
                                               |
 
                                               |
 
                                               |   
 
                                               |   
Line 110: Line 116:  
This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
 
This can be done via simple calls to the X11 Server. Here is a snippet of how that can be done.
   −
                       <code>
+
                        
 
                       XGetInputFocus(...); //to focus to the window.
 
                       XGetInputFocus(...); //to focus to the window.
 
                       // Create the event
 
                       // Create the event
 
                       XKeyEvent event = createKeyEvent(...);
 
                       XKeyEvent event = createKeyEvent(...);
                       // Send the KEYCODE. We can define these using XK_ constasnts
+
                       // Send the KEYCODE. We can define these using XK_ constants
 
                       XSendEvent(...);
 
                       XSendEvent(...);
 
                       // Resend the event to emulate the key release
 
                       // Resend the event to emulate the key release
 
                       event = createKeyEvent(...);
 
                       event = createKeyEvent(...);
 
                       XSendEvent(...);
 
                       XSendEvent(...);
                       </code>
+
                        
 
     −
The above code will send one character to the window. This can be looped to generate a continuous stream (An even nicer way to do this would be set a timer to make it look like a typed stream).
+
The above code will send one character to the window. This can be looped to generate a continuous stream (An even nicer way to do this would be set a timer delay to make it look like a typed stream).
   −
Similarly a whole host of events can be catered to using the X11 Input Server. Words like "Close" etc need not be parsed as as words and can just send events like XCloseDisplay().
+
Similarly a whole host of events can be catered to using the X11 Input Server. Words like "Close" etc need not be parsed and broken into letters and can just send events like XCloseDisplay().
    
All of this basically needs to be wrapped in a single service that can run in the background. That service can be implemented as a Sugar Feature that enables starting and stopping of this service.
 
All of this basically needs to be wrapped in a single service that can run in the background. That service can be implemented as a Sugar Feature that enables starting and stopping of this service.
Line 143: Line 148:  
This approach will simplify quite a few aspects and will be efficient.  
 
This approach will simplify quite a few aspects and will be efficient.  
   −
Firstly, speech recognition is a very CPU consuming process. In the above approach the Speech Engine need not run all the time. Only when required it'll be initiated.  
+
Firstly, speech recognition is a very CPU consuming process. In the above approach the Speech Engine need not run all the time. Only when required it'll be initiated. Julius speech engine can perform realtime recognition with upto a 60,000 word vocabulary. So that will not be a problem.
    
Secondly, need of DBUS is eliminated as all of this can be done by generating X11 events and communication with Julius can be done simply by executing the process within the program itself and reading off the output.  
 
Secondly, need of DBUS is eliminated as all of this can be done by generating X11 events and communication with Julius can be done simply by executing the process within the program itself and reading off the output.  
52

edits

Navigation menu