Wikidata/2005 proposal
Relaunch of the project is located at Meta:Wikidata.
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
- See also: meta:Wikidata (2)
Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.
Imagine that you can edit the content of an infobox on Wikipedia (e.g. Germany) with one click, that you get an edit form specific to the infobox you are editing, and that other Wikipedias automatically and immediately use the same content (unless it is specific to your locale).
Imagine that some data in an article can be automatically updated in the background, without any work from you - whether it is the development of a company stock, or the number of lines of code in an open source project.
Imagine that you can easily search wiki-databases on a variety of subjects, without knowing anything about wikis.
This project is separate from the Wikimedia Commons, because a Wikidata database does not necessarily have to be useful for another Wikimedia project, and because it is larger in scope.
Applications
[edit]Astronomy - space.wikidata.org (spc.wikidata.org)
- astronomical objects
- constellations
- craters
- observatories and telescopes
- surveys
- space missions
Economy - economy.wikidata.org
- Products
- Corporations, companies
- Governments and local administrative bodies, complete with analysis of statistical/aggregate parameters
- Macroeconomic data (current and historical), indexes
- Currency exchange rates
- Oil prices & other commodities prices
- Stock Exchanges indices
- Interest rates
Events - time.wikidata.org
- News
- People biography
- Timetables
Languages - language.wikidata.org
- Translations tables (multilingual)
- Dictionaries (multilingual)
Society - society.wikidata.org
- Schools and universities
- Cities, Countries, Subdivisions
- Ethnic groups
- Radio and television stations
Military - military.wikidata.org
- Battles
- Army divisions
- Air Squadrons
Technology - tech.wikidata.org
- Planes
- Rockets
- Ships
- Weapons
- Computer hardware
Nature - nature.wikidata.org
- Plants
- Animals
- Species
- Mountains
- Rivers
- Protected areas
- Weather
Chemistry - chemistry.wikidata.org
- Elements
- Rocks and minerals, compounds
Content - works.wikidata.org
- Books
- Journal articles
- Newspaper articles
- Movies (IMDB is not open content)
- Music
Locations - geo.wikidata.org
- Cities
- Countries
- Regions
- Geo-located pictures
See Wikimaps
Physics - physics.wikidata.org
- Physical constants
- Physics equations
- Tables of wavefunctions etc.
Science (various) - science.wikidata.org
- Pharmacology
Stamps, Coins and bank notes
- Postage stamps
- Coins
- Bank notes
Calendar - calendar.wikidata.org
- Events
- Births
- Deaths
- Holidays and observances
- National
- Religious
- Laic
The project should allow for translations - this would then help to include all calender events of the different wikipedias and an easier update of the calendars by just needing to translate present events and only new events will then be added. In this way much time will be saved. For translation: OmegaT is a great instrument.
Requirements
[edit]Wikidata has the following technical requirements to be useful:
- easy setup of data groups, and of new structures within a group
- data structure editor
- tables
- fields
- field types (text, number, textarea, localizable enumerations ..)
- field constraints (required, unique etc.)
- relationships between fields (parents, brothers)
- edit mechanisms
- modify more than one cell at once (e.g. search/replace)
- export of data in suitable formats (html, xml, csv)
- import from suitable formats
- search mechanisms
- limit the table to the interesting subset
- use nested and/or/not requests
- take use of field types (date ranges, number ranges)
- sort mechanisms
- by one or more fields, up or down
- take use of field types (numbers, user defined sort orders)
- wiki-style syntax for describing view layouts and edit layouts
- placement of fields in a form
- per-field difference engine to show changes to fields in a more precise manner
- per-field history, recent changes etc.
- transclusion of content from other Wikimedia projects
- default link destination, so that, for example, any link in an entry on movies points to Wikipedia
- easy localization
- flag certain types of data as international (with possible auto-conversion routines) and not in need of localization
- single login and Wikimedia Commons functionality should be in operation before this project goes live
Licensing considerations
[edit]Share-alike is not very fair when a much larger work includes a very small piece of data. Individual pieces of data are not copyrightable, claiming copyright on the database itself and the structure we create could help to boost such copyright claims by corporations (which in turn could harm Wikimedia), and could be difficult to enforce.
A very simple attribution license or the public domain may be a better option for data-projects.
Graphical mock-ups
[edit]m:Image:Wikidata-mockup.png This mock-up illustrates form-based editing. Note that we need easy ways to enter relations - in this illustration, the movie-actor relation must be parsed by the backend after saving. Autolinking means that on viewing, we get a link both to Wikipedia and to Wikidata itself for the autolinked word (e.g. a link to Wikipedia about the United States, and a link to Wikidata showing movies made in the United States).
For an idea of how we'd do this in Kendra Base see wikidata mockup in kendra base.
Implementation strategies within MediaWiki
[edit]Fixed set of tables
[edit]We distinguish between wiki-pages and data through the namespace. We can define certain namespaces to be pages, and other namespaces to be data. In the following examples, namespace 0 is for articles, and namespace 402 is for data on countries.
We presume that we have a revisions table that is both used for regular wiki-pages and pieces of data:
revision_id revision_comment user_id page_id ---------------------------------------------------- 2042 created monkey 52 300 2043 added monkey info 203 300 2044 created country 593 301 ...
A pages table:
page_id page_name page_namespace top_revision ---------------------------------------------------- 300 Monkey 0 2043 301 Germany 402 2044 302 Poland 402 4893 => an article on Monkeys, two sets of country data
A relations table:
source_page_id destination_page_id relation_type ---------------------------------------------------- 301 302 2 => Germany is a neighbour of Poland
relation_types: 0=parent, 1=brothers/neighbours, 3=aunt ... whatever is useful
A data-longtext table:
page_id revision_id name value ------------------------------------------------------------------------------- 300 2042 article_text A monkey is an animal...
A data-shorttext table:
page_id revision_id name value --------------------------------------------------------------- 301 2044 country_flag [[Image:Germany-flag.png]]
A data-numbers table:
page_id revision_id name value ------------------------------------------------------ 301 2044 country_population 80000000 301 2040 country_population 75000000
And so on, for the different types.
Now we can structure our data in arbitrary ways and do smart SELECTs:
SELECT page_id,top_revision FROM pages WHERE page_name='Germany' AND page_namespace=402 => 301, 2044 SELECT data_numbers.value FROM data-numbers WHERE page_id=301 AND revision_id=2044 => 80000000 - the country population
Dynamic table creation
[edit]We could create a sophisticated data manager application that allows the creation of tables without much technical know how. It could automatically manage revision storage and revision associations. Advantage: more efficient, constraints at database level. Disadvantage: less flexible, all code has to be aware of which tables exist.
Notes
[edit]- Everything should be Wikidata. Liquid Threads comments, wiki pages, movies, everything. Abstract as much as possible.--Eloquence
- data-workflow table to store workflow properties (publication status/date for Wikinews etc.)?--Eloquence
- I think you want something like graph serialization, but with the concept of hyperlinks. That is, pointers to graph nodes specified in other files, or anywhere on the web, or whatever. Like nLSD. -- LionKimbro
- It would be nice to manage bibliographic references as Wikidata. See also these proposals in the German Wikipedia. --Lambo 15:08, 20 Mar 2005 (UTC)
Related projects
[edit]- The Semantic MediaWiki extension to MediaWiki extends wiki link syntax to represent two kinds of properties of articles: relations between articles and attribute values of articles. It supports inline query of these properties and export of them as RDF.
- Kendra Initiative is developing a semantic data publishing/querying system called Kendra Base.
- Currently input is via 2 methods: wiki-style free text and also more structured forms input.
- Also reviewed at m:Kendra evaluation
- w:TWiki is a wiki which features form-based input as well as metadata which adds a structure to the entered data.
- jot.com seems to be doing something similar according to Jimbo, who has seen beta screenshots
- w:Wikipedia:Proposal for intuitive table editor and namespace
- I have yet to see the software to do this but I always thought that wikipedia would be an excellent project for a oodb. Instead of only allowing certain data have a generic object article. Then classify each article into a person or place. these would be subclasses of the article object and would have fixed fields (begin believed birthdate range, end believed birthdate range, bio, believed birthpacelocation lat/lon, etc....) This info could be persisted across all articles and make different languages simply different chunks of text on an object with a unique id. The reference potential would be drastically modified as you could definitively refer to a person or place regardless of language. It would also allow povs to be addedd as they could just be another block of text associated with the unique id(I like to call these lenses). Finally you could have the object inherit from actor with even more specified fields, or with multiple inheritance have the object inherit from actor and director. I've looked at oodb software and have found that the commercial ones allow multiple inheritance, though I would assume the performance would be terrible.
See also
[edit]- SpecialData.php
- Erik Moeller's introduction of the idea
- Magnus Manske's first thread about Wikidata system
- Later thread begun by Magnus discussing issues with push and pulling the data.
- Flexible Fields for MediaWiki, a proposal for adding free-form/untyped key-value data to wikitext articles
- The Semantic Web initiative, especially RDF, could be really beneficial on the data integration/export level (dev(at)xam.de)
- Describing wikidata structures