WikiPort

Revision as of 17:16, 28 March 2018 by Vipulgupta2048 (talk | contribs) (Changes done)



This page is under heavy development, for the time being, please refer to the proposal in rich-formatting here


About You

Name - Vipul Gupta
Email address - vipulgupta2048@gmail.com
Blog Url - www.mixstersite.wordpress.com
Sugar Labs wiki username - Vipulgupta2048
IRC nickname - vipulgupta2048
Linkedin - vipulgupta2048 Github - vipulgupta2048 Twitter - vipulgupta2048
First language - English (Fluent)
Located at, and what hours (UTC) do you tend to work - I am based out of New Delhi, India (+5.30 GMT). I am flexible with timings and can work for 3-5 hours everyday.


Have you participated in an open-source project before? If so, please send us URLs to your profile pages for those projects or some other demonstration of the work that you have done in open-source? Yes, I have participated and maintained many open-source projects. Also been contributing to other community and conference projects. Some of the links and description of those pages are as follows

  • First pull request on my college community’s website, ALiAS - Maintainer since 2016.
  • vipulgupta2048/opengisproj_mirror- Involved building a web app for real-time monitoring of pollution levels in rivers of India and to assess the impact on the health of people using the water from the river. The working repo is private, hence a public mirror for showcasing.
  • vipulgupta2048/scrape Scraping major news websites/outlets in India using Scrapy [Python] Managing the data flow by employing PostgreSQL. Finding fake news using Machine Learning with help of libraries (Word2vec)
  • Fixed bugs, wrote blogs, added new features using HTML, CSS and JS for the official website of the premier conference on Python in India, PyCON India 2017 ( Here and here)
  • gitlab/asetalias/Community-Connect [Ongoing]
  • Completed: Hacktoberfest Challenge; Here.
  • Building and mentoring website development for Pyladies Delhi Chapter Pyladies-delhi-website
  • Recent contribution: In Sugar Labs

Along with this, Python Delhi User Group (PyDelhi) is a volunteer-driven organization comprising of Pythonistas, enthusiasts, who have a primary focus in Python. I help by volunteering and organizing meetups for them.
I am a committee member of my college’s open source community. ALiAS (Amity Linux Assistance Sapience) We conduct hands-on workshops, seminars, talks and much more to try and help as many people as we can along the process and motivate them to make their first PR.
I genuinely love spreading knowledge among peers/juniors/underprivileged children wherever I can through my voice (Talks and workshops) and my words (Tech-Blog)


About your project

What is the name of your project?
Wikiport

Describe your project in 10-20 sentences. What are you making? Who are you making it for, and why do they need it? What technologies (programming languages, etc.) will you be using?

Abstract Ideas:
Documentation tools have become modern, lighter, easier to use with the use of Sphinx, Github pages/wiki, Mkdocs and many more similar frameworks and applications. It’s easily the first and the most important steps for any open-source projects implemented by almost everybody.
If people know more, they would want to get involved and contribute more. This would, in turn, help the entire community to grow and expand. If the information (Documentation) is accessible fast, easily comprehensible, simple enough to edit and manage for people then I think this project will help really help Sugar Labs achieve that.

My understanding of the problem and why it needs to be solved?
Documentation, as mentioned before, is important to all open-source projects. Sugar Labs has all documentation for its learner applications called “Activities” at one place in the Activities#Sugar_Activities. Some problem (#risks) that I found with this sort of implementation is as follows:-

  1. Difficult to add, edit and manage - In reference to the conversation [archive/sugar-devel] As a user, I need to create an account for editing privileges. Which is very tiresome to be very honest, hence discouraging contributions.
  2. Size issues - Wiki is being hosted on the a server. With the growing number of activities, the size occupied by them will increase leading to rise in maintenance cost. Not to mention buying more space.
  3. No clear format/classification - Some links are for one activity only while other activities have multiple links for them (example Turtle Art) Results in time being wasted of the user even if they are justified.

Found 2 wiki pages of the same activity. (this page and this too) I did extensive research on the same and found many such problems with the wiki. According to my experience in open-source, users just need information fast, easily accessible and neatly organized. A new method to make that information organized should be brought in place for the betterment of the whole community. GitHub could be the solution to all our problems (Check here for reference) WikiPort is a tool that helps in migration process of wiki’s to Github hosted README.md.

Objective To migrate documentation and information of each activity in Activities#Sugar_Activities of Sugar Labs in MediaWiki format to their respective git repositories hosted on GitHub in rich markdown format by a program/tool with special attention to the transfer of all information and media associated with the activity with least redundancies and zero duplication of content if not already present in GitHub@README.md or user documentation in help-activity.

Technologies and Skills Used This list is just a small glimpse of technologies that I am thinking to use for this project. I am proficient in each of them, through my past projects. (Refer) Programming Language: Python 3.6 using libraries such as requests, LXML, Scrapy, Beautifulsoup etc. Python will be used widely in this project. Markup Languages - Markdown, MediaWiki, reStructuredText For Automation - Ansible or Bash (To automate each step of the migration process and installation of the tool.) Tools - git, shell, MediaWiki API - Pywikibot,GitHub API (Experimental) Others - Yaml (If Ansible is used) An Overview of the Migration Process


Some documentation of activities are stored in help-activity. Hence to prevent duplication of content between these two sources (Wiki and help-activity), my checklist already contains a step to either manually check the content difference between them or use a script or program to find and settle differences between them. Thus ensuring end-to-end migration of the wiki-pages and avoiding more maintenance. We can also help keep the help-activity updated by using GitHub’s API for pull requests which can comment the link of the specific help-activity page if anyone submits a PR to change README.md of the activity. (Experimental - but can be implemented) Detailed Description of the steps mentioned One time task - Use a spider to scrape GitHub links for each activity in a proper JSON format (with names of each activity). IF GitHub links not found for the repository, then scripts throws back exception and then links can be found manually. Step 0 Extracting content; from these methods that offers the least redundancies. My research yielded these results. 1. Query MediaWiki API and acquire content in wikitext format 2. Export pages in XML using Python Wikipedia Robot Framework 3. GET page in simple HTML ( Activities/Sugar_Network&printable=yes) Step 1 Convert the data sourced from step 0; using Pandoc to Markdown. Removing syntax, comments and Wikimedia code that might be left after conversion using regex wherever required. Step 2 Fork and clone the respective GitHub repository from the links already scraped or found manually of the activity Step 3 Python script to find all image links and then fetches media (images) associated with activity from web-page; renames them in correct format; adds them to repository folder cloned under “ images” Fix image links and testing. Step 4 Ensuring content is not duplicated by being already present either in the README.md file or the user documentation in the help-activity. Make the necessary changes, fix broken links. Testing; Commit and push to fork.. Step 5 Pull request submitted with detailed description of the changes in activity documentation in clear format containing all details. When PR merged, move to step 6. Step 6 Use Pywikibot (Check delete.py) to delete pages/media associated with wiki page using admin privileges and link back to their repositories. This can also be achieved using the mediawiki API which saves edit history as I mentioned before. (Read more here) (End-to-end Migration of wiki page complete) Step 7 Repeat Step 1 for next activity.


Important Note I noticed by the pull requests of my fellow contributors on Sugar Labs Github have yielded unnecessarily large activity bundles which might be an issue. As one always aims to keep the repository size as small as possible. This needs to be taken care of. The fix I propose is to add these said bundles to gitignore by implementing a test case for or a python script to exclude them.

Project Deliverables 345 web pages stored in MediaWiki format atActivities#Sugar_Activities on Sugar Labs Wiki converted to Markdown format with assets (images and other files) intact and in-place with zero duplication. Files in Markdown converted from the wiki content are tested and checked to ensure foolproof 100% migration without any redundancies or loss of information. Manually or through scripts. A detailed report with classified information about the Sugar_activities wiki that needs further improvements submitted to Sugar Labs (Refer week #8 in timeline #Wikiteam) Files committed, pushed and PR’s made in each activity’s respective git repository in a information specific format for easy reference. Tested bug-free source code for everyone to use (and other open source projects) for their wikis facing the same issues. Complete documentation of the migration method submitted on time of evaluations with the review of the mentor incorporated. After the complete transfer of all the wiki content, discuss and implement the last actions on the wiki, to either delete the pages or link them back to their respective git repositories.

Thus, by the deliverables mentioned above, 3 well defined goals that determines project status which will help in evaluation i.e Evaluation 1: Migration of 100 wiki-pages complete; Evaluation 2: Migration of 300 wiki-pages complete; Final: Migration of 345 wiki-pages;


Other Deliverables (and Future Work) Contribute more - After and during the project, will help in solving bigger challenges and bugs faced as my experience lies in DevOps tasks and start reading code of Sugarizer. Add Features - Edit history of wikis that are being migrated by WikiPort can be exported as commits on GitHub (implementing this). Create New Activities - Being well versed in Python (tried making a game with pygame) Would enjoy the new learning curve as I never really explored GTK. Have some new ideas that could be later transformed into fun, interactive learning activities. Complete Misc. Tasks - Wiki pages contain task waiting to be completed I would like to volunteer and finish them for the welfare of the community like this one here IRC#Wishlist. Remain as an active contributor for Sugar Labs and take part in discussions for the future projects. Maybe if the community agrees, also mentor someone myself in the foreseeable future. Use Automation to our favor so that many tasks are completed in WikiPort using bash or Ansible scripts like environment setups, installation and fine workings of the entire tool. (Aim - one command to work it all out with the least number of user interventions) “WikiPort converts all MediaWiki documents to rich markdown format and commit them to GitHub or other hosting services without losing any of your data. A tool created by Sugar Labs with love.” This seems to me like an excellent repository description of an awesome tool. I always wanted to create an open-source tool, that people would use and benefit from and take the time to contribute back. WikiPort feels likes that chance.




What is the timeline for development of your project? Description as follows in accordance with the official timeline.

Date Event April 16 Exams start (College) April 23 Student Proposals announced. -*-*-*-*-*- Community Bonding Period -*-*-*-*-*- May 14 Official Coding Day starts (Summer break starts) June 11 First Evaluations July 9 Second Evaluations 19 July Soft Deadline [Pencil’s Down] 23 July Hard Deadline [Final code review/ Bug fixing] August 6 Submit Code and Final Evaluations

In accordance with the official timeline by GSoC, my college exams end somewhere in the first week of May or so. My college reopens early in July. During this time, I will remain in contact with my mentor. This gives me ample time to code. I plan to begin coding before the official period coding period of GSoC starts from starting of May. Which gives me a head start to finish the majority (80%) of the project before the second evaluation. Providing a total of 10 weeks to write code for WikiPort (& other deliverables) I will also write a weekly or Bi-weekly blog post on my progress and updates on my project and post it diligently on the website to get the community involved. The Weekly timeline is as follows.


Week Number [#] Task Week 1 [#1] April 23 - April 30 Community Bonding Period Community Bonding, if possible keen on having a meeting with the Activity and wiki team. Discussion of ideas that I have. Research Part 1 MediaWiki [Syntax, documentation, API, functionalities] Wikitext Exploring the problem case defined earlier.

[#2] April 1 - May 7 Start documentation and blog post, about Sugar Labs on my blog. Setup: development environment. Continue with Research Part 2 Finding, testing, implementing solutions available for Migration Process step 0 through 6. Discuss results with mentor, shortlist methods that have the highest success rate. Pushing to Github; fixing organization problems

[#3] May 7 - May 14 Finalized migration process, start coding. Spider ready to scrape activities for their GitHub repo. Report on activities missing GitHub repo. links Blog post on Community Bonding @ Sugar Labs

[#4] May 15 - May 22 Official Coding period starts Start work on migration script. Test migration script on 10 activities (aim - 60% migration of content from wiki to GitHub readme of the activity) Gets reviewed by mentor; implement suggestions

[#5] May 23 - May 30 Work on script continues; Tweaking; suggestions implemented by the community Testing extensively; and commenting Documentation for script 40 pages completed (Aim - 70% migration of content through the script without manual edits;) Write a quality checklist for pages converted to determine bugs, thorough checking of every page converted, every step of the way.

[#6] June 1 - June 7 Work on Other Deliverables Script improved; Commenting; Logging implemented; Quality Checks implemented on sourced markdown files Blog post on research and work did so far. 100 pages completed (minor improvements)

[#7] June 8 - June 15

First Code Evaluations


Implement suggestions from evaluations. Start work towards automation of scripts wherever possible (Aim - Least user input) Start work on making WikiPort available to all as a simple tool. 150 pages completed. Get a review of pages migrated, from the community.

[#8] June 16 - June 23 Documentation alpha release; Repository made; Start work: Report of Activities#Sugar_Activities 200 pages migrated. Give a talk related to “Sugar and what it stands for” at one of our local meetups in Delhi. Random quality checks; manual fixes wherever required.

[#9] June 24 - July 1 250 pages migrated, checked. Get a review of the pages migrated from the community and mentor(s) Implement and discuss changes. Start with pet project of making my own activity. Finish automation task.

[#10] July 2 - July 10 Second Code Evaluations 300 pages migrated. Completed end-to-end. Blog post about GSoC: 10 weeks later.

[#11] July 11 - July 18 Soft Deadline - All 345 web pages migrated completing the primary objective. 345 pages migrated. Documentation completed. Tool Ready with automated tasks and checks wherever they could be implemented hosted on GitHub for everyone to use under Sugar Labs Start work on wiki report (Aim - To propose a better format for clear, uniform documentation)

[#12] July 19 - July 26 Hard Deadline Time allocated to fix errors and do any due/extra/unknown work. Submit Wiki Report; Get feedback from the community WikiPort reviewed from the community

[#13] July 27 - August 05 Time allocated to fix errors and do any due/extra/unknown work. WikiPort reviewed from the community Implement further suggestion if possible.

[#14] August 6 - August 13 Final Code Evaluations Various blog posts about the entire experience, GSoC 2018 and Sugar Community. Finish making the activity (tentative)

Convince us, in 5-15 sentences, that you will be able to successfully complete your project in the timeline you have described. I love writing and reading code. I learned almost everything by reading the docs. I, later on, started writing documentation and code for others too. Helping me always know exactly where I can find the information I am looking for and how to improve on it, how necessary it is. All my past projects, showcase a strong dedication to the deadline and the determination to get work done. WikiPort will be no different. As for my skills, Articles written, published using Pelican, Wordpress (Markdown). I use git in all project that I start. Document and collaborate on GitHub. Comfortable with both Linux and Windows OS, (preferring Linux all the way) There is no better to prove that I can accomplish and execute my vision for WikiPort by actually showing that work has been done. (here) There are many points to keep in mind while migrating is taking place, hence for a more methodical approach a checklist has been implemented by me which has a summary of all the steps. (Both manual/automated)

You and the community If your project is successfully completed, what will its impact be on the Sugar Labs community? If WikiPort is completed and deployed as a tool through this project. It would help in making the work easier for developers to not just Sugar Labs but everybody and even experienced users. WikiPort will help users port their wikis to GitHub with minimum user inputs, which would ultimately benefit everyone involved and for many generations of developers to come who will use that documentation. As previously stated, documentation is an important part of any development process. Bad documentation impacts the entire community as. But if the documentation is Accessible - GitHub is one of the best places to host it. Relevant and easily traversable - The documentation about every activity will be housed with its GitHub Repository. Clearly written - Strict format to maintain overall uniformity. Easily editable (+ VCS) - Wiki offers edit history of documents too, but the process to edit a document is cumbersome while GitHub makes it very easy for users to fix issues in it. More views, more stars - GitHub also increases views and popularity of the activity ultimately helping Sugar Labs to get noticed more. If our documentation has all these features, then everybody would be more inclined towards contributing to Sugar Labs and get to know more about it. People would contribute more to bug fixing, indirectly helping the Sugar Labs community to grow and expand more. Helping students, developers, activity maintainers, members of the community. Produce a new tool by the community, for the community and with the community’s help for everyone to benefit from. Answer 2: Walter Bender - walter.bender@gmail.com This project has solid potential to address several pressing issues for the community: (1) bifurcation -- we are spread across too many platforms which is both a maintenance issue and also an issue for our users and potential developers, who struggle to find the documentation for the Sugar Activities; (2) redundancy -- we are spread too thin to maintain multiple, redundant instances of documentation. Migrating everything to a uniform platform where is it directly tied to the source would impact both of these issues.

Answer 3: James Cameron - quozl@laptop.org (His assessment of project impact from both Github and Email ) "Migration is needed because Wiki is not being used now that GitHub is being used. Maintaining the Wiki in addition to GitHub is not sustainable. So we want documentation to move from Wiki to GitHub. So that when an activity is updated in GitHub, the documentation can be updated in the same place at the same time." “Originally documentation was separate because we had non-coding developers and tool chains that varied by type of developer. Now we use GitHub the tool chains are combined. With the project as described, documentation will be concentrated in the source code repository for an activity, reducing ongoing maintenance.We have less active Wiki contributors than we ever did, and in the current threat environment a Wiki requires significant monitoring and administration; we recently lost some system administrators and gained new ones; using GitHub allows us to outsource system administration.”

Answer 4: Tony Anderson - quozl@laptop.org (His comments clarifying the need of WikiPort and the problems it solves) As always, the question is the impact on our users. The traditional source of information for users is [1] and [2] and, especially the wiki pages. So far the effect of gitHub has been to reduce the value of these two sources. In many cases the activities on ASLO have been superseded by ones on gitHub or other git repository but are not available to our users. The documentation of activities on ASLO has never been adequate but now no effort will be made to improve it. This continues the trend toward Sugar being a playground for the technical elite. Complete conversation available here. [1] http://www.sugarlabs.org [2] http://www.laptop.org

What will you do if you get stuck on your project and your mentor isn't around? I strive to solve a problem using my own skill base by searching about the problem case and working it out into smaller bits.

  1. sugar and #sugar-devel channels on IRC, the mailing lists where other community members discuss issues. I am familiar with mailing list etiquettes for open source communities and learn fairly fast.

Google and StackOverflow about the issues, later refer to the documentation of the tool I am using for reference.


How do you propose you will be keeping the community informed of your progress and any problems or questions you might have over the course of the project? I would get regular, constructive feedback on the code I will write and the work I would do for the community by posting work samples from Github, on the mailing list and on IRC. [Check Timeline for the same] Post constant updates on my blog and maintain an update tracker for each week on a public notepad (Etherpad, etc.)

Miscellaneous Send us a link to a pull request or merge request you have made on a Sugar or Sugar activity bug. I needed to test my WikiPort beta script that I had created, hence took extra effort and migrated these four wiki pages listed below. lettermatch/pull/3 iknowmyabcs/pull/7 sugarlabs/AEIOU/pull/9 /i-can-read-activity/pull/4


Another Issue that I am currently working is sugar-live-build/issues/ Describe a great learning experience you had as a child. Here’s a short story titled “Let me Google that for you.” when I was 14.

From my childhood, web technologies have always piqued my interest for the longest time as I can remember. I been writing HTML since I was 1. But I never liked to read from the old, dusty books. When I code, I had to take help from seniors, teachers, the experience I gained and other sources. Almost everybody was irritated by the constant barrage of long and complex questions, that Vipul Gupta had. So what happened one day, I walked to one of the seniors and asked him a question if I remember right about how he would determine which browser is better for a particular segment of code I wrote. He took me to the lab and told me to open a link similar like to this http://bfy.tw/HFBs This will have the solution surely to all my problems. What happened next? I was embarrassed and couldn’t see straight but he was not judgemental and was very humble (A trait everyone of should have towards beginners, I feel like.) and from that day forward. The only place I actually learned something was the internet. That day I actually learned how to Google by letting someone google that for me. Teaching by application at its best. I look back to this incident and still smile. I share this experience of mine whenever I have a chance to interact with my juniors that serves as motivation boost and compels them to think that nothing is impossible, and everybody makes mistakes once in a while. This lead to me creating hundreds of links more from this website.

Is there anything else we should have asked you or anything else that we should know that might make us like you or your project more? I think I am well satisfied with the proposal that I have made for WikiPort, the project for Sugar Labs. The template given by Sugar Labs really covers all the corners of my project, and actually helped me think more constructively about the project and the fine aspects of it. I learned a lot through these few weeks, working on the proposal. MediaWIki API and new skills, contributing to an open-source organisation of this stature, how to interact with people. This has been a memorable experience. I am in awe, I really am. All open-source communities may not be same, but are run on the same principle that I also try to follow in life. Be humble, and helpful to everybody. The ‘ahaa-moment’ of a beginner is what I live for, when he truly get the point that I am trying to make. Some Credits - Learned to not contest for issues, but focus on better PR’s, and many new things from James Cameron (@Quozl)’s blog. Jasikrat too for helping people on IRC and all mentors who review our proposal and go through all our mails. Thank you all for the opportunities and the help provided to me and my fellow contributors. I hope down the line, I could follow in your footsteps too.