Jump to content

User:Language portal/Language coverage matrix/GSoC 2013

From mediawiki.org

This is proposal for GSoC project.

Identity

[edit]

Name: Harsh Kothari
Email: harshkothari410@gmail.com
Project title: Language Coverage Matrix dashboard

Contact/working info

[edit]

Timezone: UTC+5:30 (IST - India)
Typical working hours: 12:00 PM to 6 PM (IST) and 10:00 PM to 4:00 AM (IST)
IRC or IM networks/handle(s): harshkothari (freenode)
I live in Ahmedabad, a metropolitan city with 24/7 power supply and a good enough and uninterrupted Internet Connection. So working online will not be hampered by any means.

Project Outline

[edit]

The Language Coverage Matrix dashboard would help automate the information about language support provided by the Language Engineering team for e.g. key maps, web fonts, translation, language selector, i18n support for gender, plurals, grammar rules. The LCM would display this information as well as provide visualization graphs of language coverage using various search criteria such as tools or languages. I will build this web based dashboard using Javascript libraries integrated with MySQL to manage the data. I found this project very useful for language engineering team since wikipedia supports more than 300 languages. This tool will help them analyse the details of various available features of individual language. The Language Engineering team can efficiently prioritize and include some missing features , that is the features which are not currently available particular language. The overall impact of this project will lead to an efficient and enhanced user experience for Wikis.

This web based dashboard will also help other products and communities for showing innumerous search results and visualization graphs for the same.

Document of Matrix Data

[edit]

http://hexm.de/LangMatrix

Bug on Bugzilla

[edit]

https://bugzilla.wikimedia.org/show_bug.cgi?id=46651

Thread on Mailing List

[edit]

http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/068882.html

Mentors

[edit]

Runa Bhattacharjee and Alolita Sharma are my mentors.

Deliverables

[edit]
Mock Up for Dashboard
Moch Up for form
After clicking / searching form will be generate on the same page
Visualization Graph Demo
After clicking on Gujarati language on above image will give this visualization graph.
Database structure
  • A Python Script - To save all current data into the MySql Database. Currently all the data is in spreadsheet. So I will create one Python Script that saves all the data of spreadsheet into database.
  • A Form (HTML + jQuery) - To manually enter new data in the Database. If there will be any new entry it can be easily entered manually. This can be done by only admin.
  • Form will contain one search field as well as a button
  • If one want to update existing language will search into search field and after selecting a language one form automatically created with filled entry
  • If one want to add new entry, click on the button named “Add new Language” and then one form will be automatically generated with all possible fields.
  • Clicking on save button changes will save the entered data into database and will also update appropriate pages.
  • This uses jQuery + HTML + Ajax
  • PHP - For all the integration work. Since it is web based application all integration will be done by PHP. Session Management, Scripting for AJAX call and response, Database connection.
  • Dashboard - DashBoard will fill up as per the query i.e filtering or searching process. Dashboard will be dynamic, smooth and very responsive. It will show the result with an attractive UI and efficient understanding. This will use jQuery + Ajax + PHP + MySQL.
  • Data Visualization Graphs - It will be generated as per the requirements. Parent - Child (node) representation. It will be dynamic. It uses JavaScript/jQuery for implementation. It may also include charts, graphs or any other interactive visualization methods. It is very useful for both fast and simple representation of the data. Mock up for Data visualization graph is shown right side. This will live with optimum and professional look and feel.
  • UI Designing - Professional look and feel Dashboard design with all mentioned facility. This will use CSS.
  • Optimized search facility with autocomplete feature. User can search a region / language / language code etc that will be implemented by jQuery, Ajax with autocomplete facility. One more Search method I will try to implement and that is “Language processing search”. It is very useful search method and very fast to get data as per user’s need. Use Case is mentioned below.
  • Search “How many languages support grammar rules” it will give you ans 44 or any number
  • what is jquery.ime” -> That will give definition of jQuery.ime
  • languages that support jquery.ime ” -> That will give the list of languages that supports jquery.ime and Dashboard will fill by appropriate data.
  • This will be implemented by jQuery + PHP + AJAX
  • Creation of bot - For automatic updation. Bot will run after certain schedule time. If any changes in spreadsheet and not updated into database then it will update database. it also have direct integration with Names.php, langdb.yaml in jquery.uls, extra languages supported in translatewiki.net and incubator, etc. This is very useful feature since all the details will updated successfully.
  • MySQL Database - To save all the data. Tables for database
  • Table 1 : Details of all languages (will also include detail of incubator languages by adding one more column)
  • Table 2 : IME GAP details
  • Table 3 : For admin details
  • I will create / update tables as per the requirement.
  • Create APIs to help some other websites, that are in need of data which the project includes. This will be of help for fetching data , for their projects, research, analysis and so on.

Use Cases

[edit]
  • Homepage -> All languages or regions.
  • [ Clicking on ] Region -> All languages of that region
  • [ Clicking on ] Language -> All the details of that particular language (example - key maps, web fonts, translation, language selector, i18n support for gender, plurals, grammar rules) + Visualization Graph of the language.
  • Dropdown menu for Search / Checkbox for Filtering the Search
  • Search field with autocomplete / suggestion facility.
  • [ Select from ] Dropdown / [ Select from ] Checkbox -> Dashboard will be filled automatically as per the query. (i.e. - If user wishes to see the list of languages that have grammar rules - just a *click on ‘grammar rules’ will show all the languages that have grammar rules.)
  • Error Handling Use Case : [ Select from ] Dropdown / [ Select from ] Checkbox -> If no result found for same query then it will show "No Result Found" and will show data before the select/filter process is done.

Some features that would be directly useful to Mediawiki developers and Wikmedia site maintainers

[edit]
  • Direct integration with existing lists of languages: Names.php, langdb.yaml in jquery.uls, extra languages supported in translatewiki.net and incubator, etc.
  • Integration with a matrix of existing or planned Wikimedia projects, so it would be clear from the matrix - is there a project in this language? Are the language tools extensions installed in this project? Is there an incubator project in this language?
  • Understanding variants: does this language supports variants in any way?

If time permits

[edit]
  • I would add integration with other knowledge bases about languages, such as Ethnologue, CLDR and others, that would provide information such as number of speakers, literacy levels, language contact, etc. This way it would be possible to see, in a way that is slightly more structured that what we have now, how well our projects are covering the different languages of the world.
  • I will create one Mediawiki Extension for matrix support + visualization graph + filtering searching facility.
  • I will create browser support matrix for TUX and other product. i.e https://bugzilla.wikimedia.org/show_bug.cgi?id=45602

Project Schedule

[edit]
Timeline Task
May 27- June 16 Planning and delving for some new ideas and technologies for the basic Implementation, creating database and python script to store current data into database
June 17- June 30 Basic dashboard integration with php, database connection and simple designing with login system and manual entry facility.
July 1- July 14 Create javascript/jquery libraries for filtering facility to fill dashboard as per the filtration
July 15- July 28 Create javascript/jquery libraries for searching facility to fill dashboard and merging the libraries of both the filter and search.
July 29- August 10 Better UI and designing exertion
August 11- August 23 Visualization graph implementation using javascript/jquery
August 24- August 31 All browser support and smooth searching as well as filtering. Improvement on visualization graph as well as UI and all other functionalities and integrate all the above mentioned features.
September 1- September 7 Create one bot to automate the task and updation of database on scheduled time.
September 8- September 15 Create APIs of LCM data so that it can be used outside the scope of this web tool.
September 16- September 23 Final touch and documentation

About you

[edit]

I am Harsh Kothari, final year engineering student of L.D. College of Engineering. I am from Gujarati Wikipedia community, and also a contributor in Mediawiki for almost 8 months now. I have developed Mediawiki Extension : TwitterCards. I am a promoter of 1st Mediawiki group of India. I have localized and ported different gadgets in Gujarati Wikipedia as well as other indic wiki. i.e HotCat, Reference Tooltip, PopUps.

Apart from Mediawiki, I am also an active contributor of several open source communities such as Mozilla, fedora etc.I am also the ambassador of FOSS program of Government of India and promoting open source technology across Gujarat. Programming is my passion and I enjoy coding across various technologies. I have good knowledge as well as experience in C,C++, Java, Python, PHP, JavaScript. I have successfully completed my internship at Physical Research Laboratory. I was working on a project named as Genetic Algorithm Based Digital Filter design which involves digital filtering on the basis of Artificial Intelligence. I have been one of the developer and organizer for last two years of Online Coding Competition like Google Code Jam and Facebook Hacker Cup named voidmain(). Also,my talk is been selected which is named as “We, the people” in Open Source bridge 2013, it is a simple talk elaborating my entrance and experience in MediaWiki community and help involve more people in MediaWiki community.

The proposed project is about Language Engineering Matrix dashboard which is an Internalization project. This will develop a web based dashboard that will include all the details of languages supported by Wikimedia. This project will be of great help for Language Engineering team of WikiMedia. This tool will help them to analyze the details of various features of individual language.

Participation

[edit]

In my opinion, IRC is the best way of communication hence I am available on IRC all the time on the channels such as #mediawiki, #mediawiki-i18n, #wikimedia-dev, #wikimedia-lab. I am an active participant in the discussions on different mailing lists such as wikitech-I , mediawiki-india, mediawiki-i18n. I would appreciate all discussions related to my project to be carried on the above mentioned mailing lists and WikiPage.

I own a blog where i would update all the progress of my proposed project. I would also update all my progressive work regarding this project on Github. Also since my project aims at implementing new features, i would be taking regular feedback from the community over my Interface designs through testing and prototypes, and also through the mailing list.

Past Open Source Experience

[edit]

I am involved with many open source activities in Ahmedabad. I am an active member of Google Developer Group Ahmedabad. I have created MediaWiki Extension TwitterCards. I am also a small contributor in Mediawiki Extension EtherEditor , jquery.uls and jquery.ime. My all code is open-source and is uploaded at Github. I have also worked on Library to get metadata from parsed raw description text. I was also invited as a delegate to share my knowledge on open source in various open source events. I was a speaker in the workshop of Mediawiki Gadget Kitchen held at Gnunifyand Avenir.

Acknowledgment

[edit]

I really want to thank my mentors Runa Bhattacharjee and Alolita Sharma for guiding me through out and very special thanks to Amir Aharohi for his valuable inputs for this proposal. Last but not the list special thanks to Sumana and Quim for polishing my proposal as well as for valuable feedback.

LCM-dashboard Repo on Github

[edit]

https://github.com/wikimedia/lcm-dashboard

Any other info

[edit]

Monthly Reports

[edit]

June

[edit]

I am working on Language Coverage Matrix Dashboard for language engineering team of wikimedia foundation.

  • In the community bonding period I first thought/planned about new technologies which I can use in my project.
  • After that I created the database schema.
  • Currently all the data is in spreadsheet so created one python script so that it can transfer all the data from spreadsheet to database automatically.
  • the week after, I started working on new language entry system, database connection and completed it.
  • I recently completed language search system with suggestion.
  • When one will search the language and select it then it will show all the data of the particular language.
  • Currently working on facilitating the admin of changing the details of a particular language on the same page,if he/she wants

Link of my progress report : Project Updates

July

[edit]
  • created and set up primary thing on wikimedia labs
  • done minor changes in language search system
  • created on the spot editing facility for any language detail under admin privileges
  • created filter facility
  • New UI designs - contains
  • Database scheme changed
  • New Design implemented
  • jquery based new alternative REST architecture created
  • Language to Font mapping
  • Language to Input method mapping
  • redefining search implementation

August

[edit]
  • created login system for admin + session management
  • designed new UI on langfilter.php page
  • developed PI visualization chart
  • developed API for language detail
  • developed example for API usage
  • created preview system for new language entry system as well as direct editing feature

September

[edit]
  • Developed API console for 3rd party user to use these data for their website.
For more info
  • Redesigned entire NEW UI as per proposed scheme
For more info
  • Small correction in functionality
  • Solved different bugs


Link of my progress report : Project Updates

See also

[edit]

Project Progress : See here