Changes

USpeak (view source)

Revision as of 10:11, 28 March 2009

2,068 bytes added , 10:11, 28 March 2009

no edit summary

Line 80: Line 80:

The above mentioned goals are very long term goals and some of those will need active participation from the community. I have already made progress with Steps 1 and 6 (and these are continuous tasks in the background to help improve the accuracy).

−

~~'''I propose to implement steps 2, 3, 4 and 5 in GSoC. As the basic speech engine is working, these steps can be treated as independent of the other tasks and will have immediate benefits.''' i.e.~~

# Writing a system service that has support for recognition of characters and simple words

Line 145: Line 143:

Thirdly, Since this is just a service any activity can use this and not worry about changing their code and importing a library. The speech daemon becomes just another keyboard (a virtual one).

+

Once this is done it can be tested on any activity (say Listen and Spell) to demonstrate its use.

+

'''II. Make a recording tool/acitivty so Users can use it make their own language models and improve it for their own needs:'''

−

~~'''II~~. ~~Demonstrate its utility using Listen~~ and ~~Spell:'''~~

+

This tool will help users in creating new Dictionary Based Language Models. They can use this to create language models in their own language and further extend the abilities of the service by training the Speech Recognition Engine.

−

~~Beyond this I would like~~ to ~~implement an activity that can quite well demonstrate~~ the ~~use of~~ this ~~service~~. ~~I plan to Implement Speak Spell which~~ will ~~be a spelling activity where children~~ can ~~spell out~~ the words ~~show~~ to ~~them~~. ~~Single character recognition can have very high recognition rates~~.

+

The tool will have an interface similar to the one shown in the screenshot (this was built in Qt and was a very simple tool). It'll have a language model browser/manager and will allow modification of existing models. Users can type in the words, define their pronunciations and record the samples all within the tool itself.

+

Major Components:

+

- A language model browser which shows all the current samples and dictionary. Can create new ones or delete exisiting ones.

+

- Ability to edit/record new samples and input new dictionary entries and save changes.

+

The recording will be done via <code>arecord</code> and <code>aplay</code> which are good enough for recording Speech Samples.

Line 163: Line 171:

Ans:

+

'''Before 'official' coding period begins:'''

+

# Study Sugar, PyGTK, X11

+

# Start gathering a list of all commands that can be put in XO.

+

# Decide on a small limited dictionary based on the above

+

# Record samples for the English alphabets

+

'''First week:'''

+

- Write and complete the scripts for the service interaction with Julius.

+

- Start with the wrapper for Simulating Events on X11.

+

'''Second Week:'''

+

- Complete writing the wrapper.

+

- Implement a Sugar UI feature for enabling disbling the Speech Service.

+

'''Third Week:'''

+

- Hook up the UI, Service, Speech Engine.

+

- Wrap up for mid term evaluations and test the language model for accuracy on letters and spoken commands.

+

'''Fourth Week:'''

+

- Test this tool on Listen Spell and tweak out any problems.

+

- Get feedback from the community.

+

- Start writing the interface for the language tool.

+

'''Fifth Week:'''

+

- Complete the interface.

+

- Start writing code for the language model browser and recorder.

+

'''Sixth Week:'''

+

- Complete the language model browser.

+

- Write down the recording and dictionary creation code for the tool.

+

- Package everything in an activity.

+

'''Seventh Week:'''

+

- Complete the wrap up for the final evaluations. Write up documentation and user manuals. Update the Sugar Wikis. Clean up code.

+

'''Infinity and Beyond:'''

+

Continue with pursuit of perfecting this system on Sugar by increasing accuracy, performing algorithmic optimizations and making new Speech Oriented Activities. :)

----

Mavu

52

edits

Changes

USpeak (view source)

Revision as of 10:11, 28 March 2009

Navigation menu

Search