Difference between revisions of "Summer of Code/2014/bliss-sid/Voice-interface"

From Sugar Labs
Jump to navigation Jump to search
 
(15 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
'''1. What is your name?'''
 
'''1. What is your name?'''
  
My name is Siddharth Lalwani and I am a 1st year undergraduate in Computer Science Engineering.
+
Siddharth Lalwani
  
 
'''2. What is your email address?'''
 
'''2. What is your email address?'''
Line 25: Line 25:
 
'''6. Where are you located, and what hours (UTC) do you tend to work? (We also try to match mentors by general time zone if possible.)'''
 
'''6. Where are you located, and what hours (UTC) do you tend to work? (We also try to match mentors by general time zone if possible.)'''
  
I live in Delhi,India . IST(Indian Standard Time) is UTC + 5:30 hours .I will usually work from 13:30  to 17:30[UTC] on  weekdays and will cover up with some extra time during weekends.
+
I live in Delhi,India . IST(Indian Standard Time) is UTC + 5:30 hours .During College,I will usually work from 13:30  to 17:30[UTC] on  weekdays and will cover up with some extra time during weekends.Afterwards,I will be able to work fulltime ,that is, (6:30-10:30) and (13:30-18:30)[UTC] or when it suits the mentor.
  
 
'''7. Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?'''
 
'''7. Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?'''
  
I am a novice Open Source developer and have been working to enhance SugarLabs repository for some time now.I am proficient in programming in Python,C and C++. I am deeply impressed by the collaborative working in an open source community and am keen to work with Sugarlabs this summer.
+
I am a novice Open Source developer.I used Facebook GRAPH API in Social Reader project. My profile on Github is https://github.com/bliss-sid . I have been understanding the SugarLabs repository for some time now.I am proficient in programming in Python,C and C++. I am deeply impressed by the collaborative working in an open source community and am keen to work with Sugarlabs this summer.
 
 
  
  
Line 43: Line 42:
 
'''2. Describe your project in 10-20 sentences. What are you making? Who are you making it for, and why do they need it? What technologies (programming languages, etc.) will you be using?'''
 
'''2. Describe your project in 10-20 sentences. What are you making? Who are you making it for, and why do they need it? What technologies (programming languages, etc.) will you be using?'''
  
With my project,I am aiming to add another dimension to Sugar Labs learning experience by fabricating a speech recognition engine with the help of  Pocketsphinx. If we say-”Hi Sugar”,then the speech recognition system will be activated.I will add speech recognition within activities and add a documentation for every activity with certain keywords that will perfrom a task.With speech recognition :-
+
With my project,I am aiming to add another dimension to Sugar Labs learning experience by fabricating a speech recognition engine with the help of  Pocketsphinx.I will start with en_US  Hidden Markov Acoustic Model included with Pocketsphinx-0.8.For making it more interactive,we will use a Text-to-Speech engine like eSpeak( Or any better alternative).I will add speech recognition within activities and add a documentation for every activity with certain keywords that will perfrom a task.
 +
 
 +
'''After completion of this project''',
 +
 
 +
'''1.''' If we say-”Hi Sugar”,then the speech recognition system will be activated.
  
->We can give commands to do a task or open activities like if we say “Open Terminal”, the terminal activity will be initiated.
+
'''2.''' We can give commands to do a task or open activities like if we say “Open Terminal”, the terminal activity will be initiated.
  
->We can create new activities for communication with which various users can talk to each other like an audio call etc.
+
'''3.''' Their will be  special support for '''TurtleBlocks,Terminal,Chat and Browse activity''' like for TurtleBlocks activity,if we say
 +
                                        “Forward 20”,it goes 20 units forward.
  
->Text and Speech inter-conversion can also be used to create activities like Story Reader. It can be done using available TTS engines like Pico TTS in Android. It will be an interactive experience for young users and their learning will become even more fun. 
 
  
->We can add special support for most commonly used activities like for TurtleBlocks activity,if we say “Forward 20”,it goes 20 units forward.
+
'''4.'''We can also add support for multiple languages and dialects by using various acoustic models and translation support(Apertium,if needed).
 +
 
  
->We can also add support for multiple languages and dialects.
 
  
One of the major advantages of speech recognition will be for people with disabilities who cannot type. Visually impaired users can control many computer tasks and a sense of detachment that they feel from the world can be removed through this system.
+
Voice Recognition has been around for over 50 years but it hasn't effectively arrived to the scene in education and learning.With voice interface in SugarLabs,I want to make learning fun for children.Speech Recognition will serve as a assistive technology in many future educational activities.The idea of Voice Interface is not only about implementing Speech recognition but also laying the foundation for Speech understanding where the computer can interpret what we say and answer back with their own ideas. It may sound like science fiction but think if you say “Sugar,open flappy bird game” and it says back - “Their is a reminder in your Calendar activity which says you have an Exam tomorrow.You should better study for that.”.In a matter of few years,it will become a reality.
  
For setting up voice interface, I will mainly use Python,Java Script and other open source resources like PocketSphinx and Sphinxbase.
+
For setting up voice interface, I will mainly use Python and other open source resources like PocketSphinx and Sphinxbase.I will also use acoustic model from various open source resources.
  
  
Line 73: Line 76:
 
| 9 April- 21 April || Enhance  understanding of the architecture of project
 
| 9 April- 21 April || Enhance  understanding of the architecture of project
 
|-
 
|-
| 21 April-5 May || Discuss about the design and features of project with mentors and Sugarlabs community
+
| 21 April-5 May || Discuss about the design and features of project with mentors and Sugarlabs community.
 
|-
 
|-
 
| 5 May-12 May || Buffer Week 1(Practical Examinations in College)
 
| 5 May-12 May || Buffer Week 1(Practical Examinations in College)
Line 98: Line 101:
 
| 27 June-7 July || Add voice support to Chat activity
 
| 27 June-7 July || Add voice support to Chat activity
 
|-
 
|-
| 7 July-19 July || Add Multiple Language Functionality to Speech Recognition Engine. Finish previous work.
+
| 7 July-19 July || Add Multiple Language Functionality to Speech Recognition Engine.Use various acoustic models.Finish previous work.
 
|-
 
|-
 
| 19 July-1 August || Run the Iteration-“File and remove Bugs”.
 
| 19 July-1 August || Run the Iteration-“File and remove Bugs”.
Line 109: Line 112:
  
  
 
+
If some time is left,I will work on creating a activity where user can create his/her own acoustic model.It will help reduce the errors in recognizing what the user has to say.
  
  
Line 126: Line 129:
  
 
I worked with a senior to develop a Social RSS Reader Application for Android Platform using Facebook GRAPH API in order to synchronize user subscriptions with Facebook News Feed thereby making it possible to receive updates from and send updates to Facebook friends regarding the Application.  
 
I worked with a senior to develop a Social RSS Reader Application for Android Platform using Facebook GRAPH API in order to synchronize user subscriptions with Facebook News Feed thereby making it possible to receive updates from and send updates to Facebook friends regarding the Application.  
 +
I have uploaded it on github https://github.com/bliss-sid/Social-RSS-Reader
 +
 +
 +
 +
I have adequate knowledge of PocketSphinx and Hidden Markov Acoustic Models.I am also doing a course on Cryptography on www.coursera.org. I am keen to work with SugarLabs and put all my efforts to successfully complete this project.
 +
I know that I lack on experience in Open Source but I will surely cover up for that before Community bonding time and work even harder to stay high on your expectations.I have worked pretty hard on this project and I really feel that I am the apt geek for this project.     
  
  
I have configured speech recognition on my Ubuntu system with Google2Ubuntu.I have also studied the API documentation for the Pocketsphinx speech recognition engine and have learnt how to use it in this project.I am also doing a course on Cryptography on www.coursera.org. I am keen to work with SugarLabs and put all my efforts to successfully complete this project.   
 
   
 
 
== '''You and the community''' ==
 
== '''You and the community''' ==
  
 
'''1. If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.'''
 
'''1. If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.'''
Speech recognition is a ground breaking technology and it will make Sugar Labs more captivating. It will give ease to users as they can do various tasks just by use of speech. Users who are unable to write or see with the help of this application can perform their task like starting an activity or playing a song just by saying. It will lay off foundation to development of new set of activities which use voice interface. The barrier of language difference can also be removed with multiple language support.
+
 
 +
'''Me'''
 +
 
 +
Speech recognition is a ground breaking technology and it will make Sugar Labs more captivating.It has a vital place in education and with time,it is making inroads as an assistive technology.It will have innumerable benefits to Sugarlabs Community  like -
 +
 
 +
> It will make the interface of Sugarlabs attractive and learning fun.And when we learn something with pleasure,we never forget it.Various  activities like Play and Say games can be made to gauge pronounciation skills of children.
 +
 
 +
> Its most relevant application will be for users who are unable to type due to a physical disability. Speech recognition is the only way for their thoughts to get into readable form other than dictation to someone else.  
 +
 
 +
> The barrier of language difference can also be removed with multiple language support and assistance of multiple dialects in the future.It will help bridge the learning gap between teachers and students located round the globe.
 +
 
 +
>  Besides the advantages speech recognition will have in  learning,it will also make using Sugarlabs easier and quick.Users can perform various tasks like switching between running activities just by saying certain keywords.
 +
 
 +
 
 +
'''Martín Abente Lahaye'''
 +
 
 +
Speech-recognition technologies are interaction mechanisms that,nowadays,have evolved from "alternative" to "extended". Proof of this is the proliferation of such technologies in a wide range of domains. From smartphones assistants,medical-record transcriptions, smart cars, and TV command controls to many others. In this regard, not much have been seen in the education domain.
 +
 
 +
This could be due the fact that there is still a missing glue between the speech-recognition technologies and educational content developers. This project is about filling the gap –within the Sugar Learning Platform.
 +
 
 +
Would be great to have speech control in Sugar, for interaction with users and as a assistive tool. --[[User:Godiard|Godiard]] ([[User talk:Godiard|talk]]) 17:57, 18 March 2014 (EDT)
 +
 
  
  
 
'''2. What will you do if you get stuck on your project and your mentor isn't around?'''
 
'''2. What will you do if you get stuck on your project and your mentor isn't around?'''
 +
 
If I am unable to find a way out of the problem myself, I will ask about it on IRC and mailing list. I will also try to consult my college seniors if the problem is general and  related to the programming language. If nothing helps,I will start working on something else and return to it later.  
 
If I am unable to find a way out of the problem myself, I will ask about it on IRC and mailing list. I will also try to consult my college seniors if the problem is general and  related to the programming language. If nothing helps,I will start working on something else and return to it later.  
  
Line 142: Line 171:
  
 
'''3. How do you propose you will be keeping the community informed of your progress and any problems or questions you might have over the course of the project?'''
 
'''3. How do you propose you will be keeping the community informed of your progress and any problems or questions you might have over the course of the project?'''
 +
 
I will update my [http://nowirise.tumblr.com/ blog] weekly on the progress I have made on the project. Moreover, I will try to do the same on mailing list and IRC to keep the mentor and community informed.
 
I will update my [http://nowirise.tumblr.com/ blog] weekly on the progress I have made on the project. Moreover, I will try to do the same on mailing list and IRC to keep the mentor and community informed.
  
Line 147: Line 177:
  
 
'''
 
'''
 +
 
== Miscellaneous ==
 
== Miscellaneous ==
 
'''
 
'''
  
  
'''We want to make sure that you can set up a development environment before the summer starts. Please do one of the following:
+
'''We want to make sure that you can set up a development environment before the summer starts. Please do one of the following:'''
  
1. Send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home  view, the drop-down text should have your email in place of "logout".'''
+
 
 +
'''1. Send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home  view, the drop-down text should have your email in place of "logout".'''
  
  
Line 162: Line 194:
  
 
'''2. Describe a great learning experience you had as a child.'''
 
'''2. Describe a great learning experience you had as a child.'''
 +
 
I remember we had an old Pentium 3 computer at our home. It had become complete  junk and I must say,it taught me a lot more than any other computer I have used. I often disassembled it, checked if RAM was properly set,tightened IDE cables and learned a lot more about hardware. Doing all this at an age of 10-12 was inspiring and a great learning experience.
 
I remember we had an old Pentium 3 computer at our home. It had become complete  junk and I must say,it taught me a lot more than any other computer I have used. I often disassembled it, checked if RAM was properly set,tightened IDE cables and learned a lot more about hardware. Doing all this at an age of 10-12 was inspiring and a great learning experience.
  
Line 167: Line 200:
 
'''3. Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?'''
 
'''3. Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?'''
  
 +
I read somewhere that Summer of Code is not about how much you know. It is about how quickly you can learn and how eager you are to contribute to your Open Source community.I have loved studying the Sugarlabs repository and learned a lot so far.
 +
 +
I feel it is important to start earlier on this project.All the codes I will make before the community bonding period will be uploaded at 
 +
 +
                        https://github.com/bliss-sid/Voice-Interface
 +
 +
           
  
I read somewhere that Summer of Code is not about how much you know. It is about how quickly you can learn and how eager you are to contribute to your Open Source community. I really want to give my everything to successfully complete this project.
+
Well,I feel it winds up all I have to say.

Latest revision as of 08:53, 21 March 2014

                                                     Voice Interface 

About me

1. What is your name?

Siddharth Lalwani

2. What is your email address?

siddharth.lalwani@gmail.com

3. What is your Sugar Labs wiki username?

Bliss-sid

4. What is your IRC nickname on irc.freenode.net?

bliss-sid

5. What is your first language? (We have mentors who speak multiple languages and can match you with one of them if you'd prefer.)

My first language is English.I am also good with Hindi.

6. Where are you located, and what hours (UTC) do you tend to work? (We also try to match mentors by general time zone if possible.)

I live in Delhi,India . IST(Indian Standard Time) is UTC + 5:30 hours .During College,I will usually work from 13:30 to 17:30[UTC] on weekdays and will cover up with some extra time during weekends.Afterwards,I will be able to work fulltime ,that is, (6:30-10:30) and (13:30-18:30)[UTC] or when it suits the mentor.

7. Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects, or some other demonstration of the work that you have done in open-source. If not, why do you want to work on an open-source project this summer?

I am a novice Open Source developer.I used Facebook GRAPH API in Social Reader project. My profile on Github is https://github.com/bliss-sid . I have been understanding the SugarLabs repository for some time now.I am proficient in programming in Python,C and C++. I am deeply impressed by the collaborative working in an open source community and am keen to work with Sugarlabs this summer.


About your project

1. What is the name of your project?

Voice Interface

2. Describe your project in 10-20 sentences. What are you making? Who are you making it for, and why do they need it? What technologies (programming languages, etc.) will you be using?

With my project,I am aiming to add another dimension to Sugar Labs learning experience by fabricating a speech recognition engine with the help of Pocketsphinx.I will start with en_US Hidden Markov Acoustic Model included with Pocketsphinx-0.8.For making it more interactive,we will use a Text-to-Speech engine like eSpeak( Or any better alternative).I will add speech recognition within activities and add a documentation for every activity with certain keywords that will perfrom a task.

After completion of this project,

1. If we say-”Hi Sugar”,then the speech recognition system will be activated.

2. We can give commands to do a task or open activities like if we say “Open Terminal”, the terminal activity will be initiated.

3. Their will be special support for TurtleBlocks,Terminal,Chat and Browse activity like for TurtleBlocks activity,if we say

                                        “Forward 20”,it goes 20 units forward.


4.We can also add support for multiple languages and dialects by using various acoustic models and translation support(Apertium,if needed).


Voice Recognition has been around for over 50 years but it hasn't effectively arrived to the scene in education and learning.With voice interface in SugarLabs,I want to make learning fun for children.Speech Recognition will serve as a assistive technology in many future educational activities.The idea of Voice Interface is not only about implementing Speech recognition but also laying the foundation for Speech understanding where the computer can interpret what we say and answer back with their own ideas. It may sound like science fiction but think if you say “Sugar,open flappy bird game” and it says back - “Their is a reminder in your Calendar activity which says you have an Exam tomorrow.You should better study for that.”.In a matter of few years,it will become a reality.

For setting up voice interface, I will mainly use Python and other open source resources like PocketSphinx and Sphinxbase.I will also use acoustic model from various open source resources.


3. What is the timeline for development of your project? The Summer of Code work period is from May 19 - August 22; tell us what you will be working on each week. (As the summer goes on, you and your mentor will adjust your schedule, but it's good to have a plan at the beginning so you have an idea of where you're headed.) Note that you should probably plan to have something "working and 90% done" by the midterm evaluation (27 June); the last steps always take longer than you think, and we will consider cancelling projects which are not mostly working by then.

                                                         MY TIMELINE


21 March-9 April Brush up Knowledge prerequisite (Python,Java Script,PocketSphinx)
9 April- 21 April Enhance understanding of the architecture of project
21 April-5 May Discuss about the design and features of project with mentors and Sugarlabs community.
5 May-12 May Buffer Week 1(Practical Examinations in College)
12 May-19 May Devise plan of action to work on the project( Begin Coding if possible)
19 May-26 May Begin Coding for Voice Interface on the Home View
26 May-2 June Buffer Week 2(Sessional Exams in College)
2 June-9 June Add voice interface to Turtle Blocks activity
9 June-16 June Add voice interface to Browse activity
16 June-23 June Add voice interface to Terminal activity
23 June-27 June Polish the codes for Mid-term Evaluation
Midterm Milestone Working voice interface for-

>Home View

>TurtleBlocks,Browse,Terminal

27 June-7 July Add voice support to Chat activity
7 July-19 July Add Multiple Language Functionality to Speech Recognition Engine.Use various acoustic models.Finish previous work.
19 July-1 August Run the Iteration-“File and remove Bugs”.

Make Code Bugfree

1 August- 11 August Add tests and documentation
EVALUATION TIME


If some time is left,I will work on creating a activity where user can create his/her own acoustic model.It will help reduce the errors in recognizing what the user has to say.



4. Convince us, in 5-15 sentences, that you will be able to successfully complete your project in the timeline you have described. This is usually where people describe their past experiences, credentials, prior projects, schoolwork, and that sort of thing, but be creative. Link to prior work or other resources as relevant. I began using computers and writing C++ programs in middle school. I have a sturdy foundation in C,C++ and Python which allow me to learn new programming languages at the double. There is one thing about having knowledge but I have the skills to implement it to work for me. Some of my achievements till date are –

1. Linux Professional and OS Developer -

I have successfully completed a course in Operating System Development by compiling the available Linux Kernel and changing the graphical panel(X Window Manager) and adding a few Backtrack applications. I have a cogent knowledge of Grub, BootLoader and Linux commands.


2. Social Reader Android Application

I worked with a senior to develop a Social RSS Reader Application for Android Platform using Facebook GRAPH API in order to synchronize user subscriptions with Facebook News Feed thereby making it possible to receive updates from and send updates to Facebook friends regarding the Application. I have uploaded it on github https://github.com/bliss-sid/Social-RSS-Reader


I have adequate knowledge of PocketSphinx and Hidden Markov Acoustic Models.I am also doing a course on Cryptography on www.coursera.org. I am keen to work with SugarLabs and put all my efforts to successfully complete this project. I know that I lack on experience in Open Source but I will surely cover up for that before Community bonding time and work even harder to stay high on your expectations.I have worked pretty hard on this project and I really feel that I am the apt geek for this project.


You and the community

1. If your project is successfully completed, what will its impact be on the Sugar Labs community? Give 3 answers, each 1-3 paragraphs in length. The first one should be yours. The other two should be answers from members of the Sugar Labs community, at least one of whom should be a Sugar Labs GSoC mentor. Provide email contact information for non-GSoC mentors.

Me

Speech recognition is a ground breaking technology and it will make Sugar Labs more captivating.It has a vital place in education and with time,it is making inroads as an assistive technology.It will have innumerable benefits to Sugarlabs Community like -

> It will make the interface of Sugarlabs attractive and learning fun.And when we learn something with pleasure,we never forget it.Various activities like Play and Say games can be made to gauge pronounciation skills of children.

> Its most relevant application will be for users who are unable to type due to a physical disability. Speech recognition is the only way for their thoughts to get into readable form other than dictation to someone else.

> The barrier of language difference can also be removed with multiple language support and assistance of multiple dialects in the future.It will help bridge the learning gap between teachers and students located round the globe.

> Besides the advantages speech recognition will have in learning,it will also make using Sugarlabs easier and quick.Users can perform various tasks like switching between running activities just by saying certain keywords.


Martín Abente Lahaye

Speech-recognition technologies are interaction mechanisms that,nowadays,have evolved from "alternative" to "extended". Proof of this is the proliferation of such technologies in a wide range of domains. From smartphones assistants,medical-record transcriptions, smart cars, and TV command controls to many others. In this regard, not much have been seen in the education domain.

This could be due the fact that there is still a missing glue between the speech-recognition technologies and educational content developers. This project is about filling the gap –within the Sugar Learning Platform.

Would be great to have speech control in Sugar, for interaction with users and as a assistive tool. --Godiard (talk) 17:57, 18 March 2014 (EDT)


2. What will you do if you get stuck on your project and your mentor isn't around?

If I am unable to find a way out of the problem myself, I will ask about it on IRC and mailing list. I will also try to consult my college seniors if the problem is general and related to the programming language. If nothing helps,I will start working on something else and return to it later.


3. How do you propose you will be keeping the community informed of your progress and any problems or questions you might have over the course of the project?

I will update my blog weekly on the progress I have made on the project. Moreover, I will try to do the same on mailing list and IRC to keep the mentor and community informed.


Miscellaneous


We want to make sure that you can set up a development environment before the summer starts. Please do one of the following:


1. Send us a link to a screenshot of your Sugar development environment with the following modification: when you hover over the XO-person icon in the middle of Home view, the drop-down text should have your email in place of "logout".


Sugar home.png



2. Describe a great learning experience you had as a child.

I remember we had an old Pentium 3 computer at our home. It had become complete junk and I must say,it taught me a lot more than any other computer I have used. I often disassembled it, checked if RAM was properly set,tightened IDE cables and learned a lot more about hardware. Doing all this at an age of 10-12 was inspiring and a great learning experience.


3. Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more?

I read somewhere that Summer of Code is not about how much you know. It is about how quickly you can learn and how eager you are to contribute to your Open Source community.I have loved studying the Sugarlabs repository and learned a lot so far.

I feel it is important to start earlier on this project.All the codes I will make before the community bonding period will be uploaded at

https://github.com/bliss-sid/Voice-Interface


Well,I feel it winds up all I have to say.