Jump to content

Rozšíření:CommonsMetadata

From mediawiki.org
This page is a translated version of the page Extension:CommonsMetadata and the translation is 5% complete.
Příručka k rozšířením MediaWiki
CommonsMetadata
Stav rozšíření: stabilní
Implementace API
Popis Pokusy o extrahování metadat ze stránek Commons
Autoři Brian Wolff (bawolffdiskuse)
Zásady kompatibility Vydání snímků současně s MediaWiki. Hlavní vývojová větev není zpětně kompatibilní.
MediaWiki 1.25+
PHP 5.4+
Změny v databázi Ne
Licence GNU General Public License 2.0 nebo novější
Stáhnout
  • $wgCommonsMetadataForceRecalculate
  • $wgCommonsMetadataPublicDomainPageUrl
  • $wgCommonsMetadataSetTrackingCategories
Čtvrtletní stahování 63 (Ranked 76th)
Veřejné wiki používající rozšíření 1,055 (Ranked 247th)
Přeložte rozšíření CommonsMetadata, používá-li lokalizaci z translatewiki.net
Problémy Otevřené úkoly · Nahlásit chybu

Rozšíření CommonsMetadata je pokusem o extrahování metadat ze stránek Wikimedia Commons, ale je také dostupné ve všech ostatních projektech Wikimedie. It adds some extra information to the imageinfo API, based on templates and categories in the image description. It is used by several extensions/tools (such as Extension:MultimediaViewer , Extension:VisualEditor , Extension:MobileFrontend , Mobile-Content-Service (MCS)) to provide better lightboxes or image selection dialogs.

The extension in its current form is intended to be a temporary solution, eventually replaced by Wikidata on Commons.

Installation


  • Stáhněte soubor/y a vložte je do adresáře pojmenovaného CommonsMetadata ve vaší složce extensions/.
    Vývojáři a přispěvatelé kódu by si místo toho měli nainstalovat rozšíření from Git pomocí:cd extensions/
    git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CommonsMetadata
  • Na konec vašeho souboru LocalSettings.php přidejte následující kód:
    wfLoadExtension( 'CommonsMetadata' );
    
  • Yes Dokončeno – Přejděte na stránku Special:Version vaší wiki a zkontrolujte, zda bylo rozšíření úspěšně nainstalováno.

Motivation & design choices

The assumptions of this extension are the following.

  • At some point in the future, Wikidata will take over handling metadata at Commons. To avoid disruptive changes, which will soon need to be changed again, the extension should work with common metadata as it currently is (so not introducing new parser functions). Hence, screen scraping.
  • The content of many of the fields on a commons description page includes rich formatting (In particular: Links, italics, bold. In some cases, more complex things like embedded images)
    • As a result, the extension outputs parsed html (wikitext sucks, plain text doesn't capture the data)
    • Furthermore, the data tends to be formatted for human display rather than (for example) machine-formatted dates. When the date field says something like "circa 1600s", it's hard to convert that to a precise date (many examples can be).
    • To carry that forward, also apply formatting to EXIF metadata, which is controlled on the wiki (For example, commons links the camera name to a Wikipedia article)
  • If we can't extract info from the description page, but the file has the author tagged in EXIF/XMP/iptc metadata, we should use that as a fallback.
  • Ideally, such a system would be as commons-specific as possible, with the commons and non-commons parts separated.
  • Commons description pages have multilingual descriptions. Lots of users probably want one language.
    • This implementation applies per language conventions to dates and things. Additionally, for explicitly multi-lingual fields (description), there is an option to return all or just a single language. Even in single language mode, some things are still language-specific (like the thousands separator on numbers)

Configuration

parameter default description
$wgCommonsMetadataSetTrackingCategories false Add the following tracking categories to file pages when the corresponding information is not provided either via templates on the file page of (for some of these) EXIF metadata:
  • Files with no machine-readable license (commonsmetadata-trackingcategory-no-license)
  • Files with no machine-readable description (commonsmetadata-trackingcategory-no-description)
  • Files with no machine-readable author (commonsmetadata-trackingcategory-no-author)
  • Files with no machine-readable source (commonsmetadata-trackingcategory-no-source)
  • Files with no machine-readable patent (commonsmetadata-trackingcategory-no-patent) (for 3D files)
$wgCommonsMetadataPublicDomainPageUrl https://commons.wikimedia.org/wiki/Help:Public_domain Link used for 'license' attribute in schema.org markup for files in the public domain.
$wgCommonsMetadataForceRecalculate false Force calculation of metadata even when the image is from a foreign repository that would provide it. This is meant for local development.

Testing

Varování Varování: If you're developing or testing this extension, we do NOT suggest you copy the Commons templates for image metadata, as they take extremely long to compile and have complicated dependencies like Scribunto. Instead, get an expanded version with only wikitext/HTML and manually put in the various parameter references (or don't). You can find an example (to be used with "Special:Import") here. Or use Vagrant which includes certain templates by default.

When testing with remote images (e.g. Commons images if you have enabled $wgUseInstantCommons ), you can set $wgCommonsMetadataForceRecalculate = true; to force CommonsMetadata to parse the description page of the image and extract the metadata (normally, if the remote repository had CommonsMetadata installed as well, it would just copy the API output from there).

Usage

Use the imageinfo API, and include extmetadata as an image info property specified via iiprop.

Example

https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

View this example in the API sandbox:

https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

Returned data

The extension currently provides the following items in the extmetadata field of the response (the field names were chosen, where possible, to follow the IPTC-IIM format used in EXIF headers):

  • ImageDescription - image description
  • Artist - author name (might contain complex HTML, multiple authors, etc)
  • Credit - source
  • DateTimeOriginal - time of creation (space-separated ISO 8601 timestamp whenever possible, but can be any other textual description of a date, possibly with HTML mixed in)
  • ObjectName - title (for a book/painting; otherwise just the file name)
  • Permission - contents of the Permission field of the template. It can be a lot of things (license template, OTRS ID, details on how to attribute...)
  • AuthorCount - the number of templates with authors (e.g., Book, Photograph...). The number of actual authors might be higher if a template describes multiple authors in a single string.
  • GPSLatitude - latitude
  • GPSLongitude - longitude
  • GPSMapDatum - coordinate type (only WGS-84 supported for now)
  • LicenseShortName - short human-readable license name
  • LicenseUrl
  • UsageTerms
  • Copyrighted - True or False (for public domain images)

For multi-licensed images, these values are currently unreliable.

  • Attribution - custom attribution that should replace Artist + Credit (can also originate from the Information template)
  • AttributionRequired - booleanish (phab:T86726), tells whether there is a legal requirement to attribute
  • NonFree - booleanish, true means the image is not under a free license. (Used for non-Commons images only.)

Other data:

  • CommonsMedadataExtension - contains the metadata parser version number; mostly for internal use
  • License - a best guess at the license of the image (mostly for internal use by MediaViewer, might change; LicenseShortName is probably more reliable)
  • Categories - a |-separated list of the categories of the image.

Based on parsing category names, probably won't work for images not hosted on Commons.

  • Restrictions - reuse restrictions such as trademarks or personality rights; an array of keywords (the class names from this table, without the restriction- prefix). See also the restrict-* icons in MediaViewer.
  • DeletionReason - if set, the template is being considered for deletion.

(Based on the nuke template, probably not reliable outside Commons.) It contains a deletion reason, but it is phrased to be applicable for a log entry, so it might be misleading (e.g. past tense when actually it is not yet decided whether the image will be deleted).

See also