Reading/Multimedia/Structured Data/Developer Diary

This is a collection of writings from the Multimedia team on their work, challenges, and successes on the Structured Data project.

Q3-Q4 2017-2018

During Q3 and Q4, the Multimedia team focused on several different areas of work.

Search indexing for SDC

Cormac was responsible for adding various SDC concepts, namely multi-lingual captions and wikibase statements, to the search indices for 'parent' File pages of MediaInfo items, so that multimedia files can be found based on their MediaInfo content.

Implementing search using qualifiers is currently under way, and is considerably more complex, as it isn't clear how to represent non-item qualifiers (e.g. text/numeric fields) in a useful way in a document search engine such as elastic search. For example, an image whose metadata include "depicts -> dog, quantity -> 3" might be indexed, and searching for images that include three dogs would be possible, but searching for an image that includes more than 2 dogs is not so simple. Currently working with the search team to come up with some solutions (that will probably involve BlazeGraph and the Wikidata Query Service, or something like it).

File page integration

This work, primarily undertaken by Mark, was largely done in the Wikibase and WikibaseMediaInfo code bases.

First, Mark began familiarizing himself with the systems already present. Much of the system had already been defined, so this familiarization process took some time. Then, building on the API work done in Q1 and Q2, Mark added a hook to the file page that retrieved the MediaInfo page based on the existing system for determining that relationship.

First attempt

The first attempt was to simply get the JSON representing the entity, and rendering it in a helpful way. This approach was complex, and would have significantly hindered future work, because it would split the code paths for rendering MediaInfo objects into two. Especially looking forward to the completion of MCR work, it didn't make sense to continue working on that path, so it was abandoned.

Second attempt

Second, Mark attempted to use the ParserOutput obtained from various MediaInfo entity objects to simply dump the rendered page onto the file page. This approach worked also, but had significant shortfalls - mostly, because the ParserOutput didn't have the proper context, and also, this would have been yet another splitting of code paths for very little benefit.

Final version

Finally, the file page prototype was completed by temporarily copying much of the code from Wikibase to render the MediaInfo entity in the file page hook. This solution also meant circumventing the usual TermsList placeholders by using a SimpleTermsListView object instead of the usual PlaceholderEmitting version. This approach should be used in other non-Wikibase pages to render an entity's TermsList.

Q1-Q2 2017-2018

API work

During this time, Mark took on the task of learning about and modifying the existing MediaInfo code to modify how API requests, especially for wbgetentities, was handled for file page titles. The API work proceeded slowly at first due to Mark having a difficult time asking the right questions of WMDE folks, but ultimately, the API was successfully created and merged, and wbgetentities now does the right thing when asked for entities related to a file page.