Amsterdam Hackathon 2014/Topics/Artworks import from Wikimedia Commons
Appearance
The goal is to import artworks in Wikidata from Wikimedia Commons files with the artwork template.
Wikimedia Commons extraction
[edit]An extraction of Wikimedia Commons files with the template Artworks has been done by a Wikimedian in september 2014.
The file: http://zone47.com/div/artworks0.zip
Result: 190.000 files with 105 different properties
Table of the 105 properties with occurences: http://framacalc.org/1u7az8lted
Preparation for Wikidata
[edit]For a wikidata import:
- remove useless properties (detail, review, informations on the file...)
- merge redundant properties (artist-creator..)
Result: new version with 27 properties.
The file: http://zone47.com/div/artworks.zip
Table of the 27 properties: http://framacalc.org/8pewk3wre2
Issues
[edit]- Fields values are heterogeneous
- Artworks still in Wikidata and another with artworks with two or more occurences in the table.
Options
[edit]At this point, many options. Some proposals:
- Global processing on fields (example on date)
- Division in lots (by institution?)
- First option 1 for some properties then option 2.
- Extract files with well formed metadata first