Jump to content

Topic on Extension talk:MultimediaViewer/Flow

File description as caption

24
Summary by Till Kraemer

Fetching file descriptions from a pool wiki can be a problem if you run php_fpm in a chroot jail. Make sure DNS works inside the chroot.

Till Kraemer (talkcontribs)

Thanks for this great extension! However, I'd like to change two things:

- Right now, the "more details" button leads to the file description page on pool.domain.com (my file repository) instead of bringing me to the local file description page on en.domain.com.

- I'd like use the file description of the local file description page as a caption in the lightbox, no matter what. Right now, it uses the caption of the thumbnail or (if the caption is missing) the file name. I really want the output of { { File:Name.jpg } } as a caption, but I don't know which files of the extension I need to alter.

Tgr (WMF) (talkcontribs)

The first is phab:T66491; you'd have to poke at includes/api/ApiQueryImageInfo.php. For the second, see setTitle in resources/mmv/ui/mmv.ui.metadataPanel.js (also the setter in resources/mmv/ui/mmv.ui.description.js for consistency).

Till Kraemer (talkcontribs)

@Tgr (WMF), thanks for your help! I already played around with image.caption in resources/mmv/ui/mmv.ui.metadataPanel.js, but somehow imageData.description doesn't seem to get through. When I remove image.caption and image.filePageTitle.getNameText() no caption shows up. The code on the page looks like this: "<p class="mw-mmv-title-para mw-mmv-ttf-container empty"><span class="mw-mmv-title" original-title=""></span><span class="mw-mmv-ttf-ellipsis" style="display: none;" original-title="">…</span></p>".

I don't really get it, because I'm using Extension:CommonsMetadata and the templates here and my code of the file description page looks like this: "<td id="fileinfotpl_desc" class="fileinfo-paramfield">Description<span class="summary fn" style="display:none">Till Kraemer.jpg</span></td><td class="description"><div class="description mw-content-ltr en" dir="ltr" lang="en" style=""><span class="language en">English: Till Kraemer, 2007</span></div><div class="description mw-content-ltr de" dir="ltr" lang="de" style=""><span class="language de">Deutsch: Till Kraemer, 2007</span></div></td>", so pretty much like the code here. However, I don't have usePageDescriptions in the <script></script> section. Could that be a problem? And do you have any idea how to fix this?

Tgr (WMF) (talkcontribs)

You should debug CommonsMetadata and MediaViewer separately. If the description page is right and the extension is working, api.php?action=query&titles=<title>&prop=imageinfo&iiprop=extmetadata&format=jsonfm&iiextmetadatafilter=Description should show you the description. If that does not work, the information never reaches the JS code.

Till Kraemer (talkcontribs)

Thanks for that hint! The output of pool.domain.com for File:Till Kraemer.jpg looks like this:

{
    "warnings": {
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query-continue": {
        "imageinfo": {
            "iistart": "2008-08-24T19:25:38Z"
        }
    },
    "query": {
        "normalized": [
            {
                "from": "File:Till_Kraemer.jpg",
                "to": "File:Till Kraemer.jpg"
            }
        ],
        "pages": {
            "2": {
                "pageid": 2,
                "ns": 6,
                "title": "File:Till Kraemer.jpg",
                "imagerepository": "local",
                "imageinfo": [
                    {
                        "extmetadata": []
                    }
                ]
            }
        }
    }
}

...and the output for en.domain.com looks like this:

{
    "warnings": {
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query-continue": {
        "imageinfo": {
            "iistart": "2008-08-24T19:25:38Z"
        }
    },
    "query": {
        "normalized": [
            {
                "from": "File:Till_Kraemer.jpg",
                "to": "File:Till Kraemer.jpg"
            }
        ],
        "pages": {
            "79": {
                "pageid": 79,
                "ns": 6,
                "title": "File:Till Kraemer.jpg",
                "imagerepository": "shared",
                "imageinfo": [
                    {
                        "extmetadata": []
                    }
                ]
            }
        }
    }
}

The description on both file description pages is "Till Kraemer, 2007", so it seems like there is some problem.

Somehow, out of the sudden the description appears in the lightbox of pool.domain.com but still not on en.domain.com :( I also can't retrieve file descriptions from pool.domain.com on en.domain.com when I deactivate the Multimedia Viewer.

Till Kraemer (talkcontribs)

@Tgr (WMF), to fix my first problem, I checked "includes/api/ApiQueryImageInfo.php" as you suggested. I changed "$vals['descriptionurl'] = wfExpandUrl( $file->getDescriptionUrl(), PROTO_CURRENT );" to "$vals['descriptionurl'] = 'http://en.domain.com/wiki/File:' . ($file->getName());". Now, the "more details" button leads to the local file description page. Yay! :) Thanks!

Till Kraemer (talkcontribs)

If I use "&iiextmetadatafilter=ImageDescription" instead of "&iiextmetadatafilter=Description", I get the following on pool.domain.com:

{
    "warnings": {
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query-continue": {
        "imageinfo": {
            "iistart": "2008-08-24T19:25:38Z"
        }
    },
    "query": {
        "normalized": [
            {
                "from": "File:Till_Kraemer.jpg",
                "to": "File:Till Kraemer.jpg"
            }
        ],
        "pages": {
            "2": {
                "pageid": 2,
                "ns": 6,
                "title": "File:Till Kraemer.jpg",
                "imagerepository": "local",
                "imageinfo": [
                    {
                        "extmetadata": {
                            "ImageDescription": {
                                "value": "Till Kraemer, 2007",
                                "source": "commons-desc-page"
                            }
                        }
                    }
                ]
            }
        }
    }
}

...but en.domain.com looks still like this:

{
    "warnings": {
        "query": {
            "*": "Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query."
        }
    },
    "query-continue": {
        "imageinfo": {
            "iistart": "2008-08-24T19:25:38Z"
        }
    },
    "query": {
        "normalized": [
            {
                "from": "File:Till_Kraemer.jpg",
                "to": "File:Till Kraemer.jpg"
            }
        ],
        "pages": {
            "79": {
                "pageid": 79,
                "ns": 6,
                "title": "File:Till Kraemer.jpg",
                "imagerepository": "poolwiki",
                "imageinfo": [
                    {
                        "extmetadata": []
                    }
                ]
            }
        }
    }
}
Tgr (WMF) (talkcontribs)

Yeah, sorry, it should have been ImageDescription. Are you using the same CommonsMetadata version on both sites? Theoretically, en should just reuse the data it gets from pool without any processing. Or is pool a ForeignDBRepo?

Till Kraemer (talkcontribs)

@Tgr (WMF), yes, according to the API query result, all sites use CommonsMetadata 1.2.

The pool uses a different database (poolwiki) but the same database user and the LocalSettings.php of the language wikis look like this:

$wgUseSharedUploads = true;
$wgSharedUploadPath = 'https://pool.domain.com/w/images';
$wgSharedUploadDirectory = '/path/to/pool/w/images/';
$wgHashedSharedUploadDirectory = true;
$wgFetchCommonsDescriptions = true;
$wgSharedUploadDBname = 'poolwiki';
$wgRepositoryBaseUrl = "https://pool.domain.com/wiki/Image:";

Thanks and cheers!

Tgr (WMF) (talkcontribs)

What version of MediaWiki and CommonsMetadata are you using? Do you see the poolwiki file description page when you visit the page with the same name on en?

If you want to take a shot at debugging, the problem is probably within DataCollector::getDescriptionText().

Till Kraemer (talkcontribs)

@Tgr (WMF), now I'm using MediaWiki 1.26.2, but I also had these problems with older versions. According to the API query result, I use CommonsMetadata 1.2. On Special:Version it shows just a dash ("–") in the version column. The contents of the version file in the extension directory are:

CommonsMetadata: REL1_26
2015-11-17T01:03:37
160f837

I don't see any text from the original file description page on en :( My debugging settings in LocalSettings.php look like this:

$wgDebugLogFile = "/path/to/mediawiki/debug-{$wgDBname}.log";

I have no DataCollector entries in my log file :( Should I use structured logging for this? Thanks and cheers!

Tgr (WMF) (talkcontribs)

That or something like xdebug. Although if you don't see description pages at all, this is probably not a CommonsMetadata-specific issue.

Do you have fetchDescription enabled in your foreign repo configuration?

Till Kraemer (talkcontribs)

@Tgr (WMF), thanks, fetchDescription is enabled in all language wikis (and I don't I have to do that for the pool, right?). I installed MediaWiki 1.27.0 and the problem is still the same. I don't really get it. Shouldn't all local file description pages appear as captions even without fetchDescription enabled? If I understand it correctly, I shouldn't even need the CommonsMetadata extension for this to work. All I want from the pool is the file (which works). Now I just need the MultimediaViewer to get the file description from the local file description page of the language wikis. On the pool, everything works fine: I use the information template and the values appear in the caption. On the language wikis, I also have the information template but the values don't appear in the caption, all I have is the file :(

Till Kraemer (talkcontribs)

P.S.: Okay, sorry, the CommonsMetadata extension seems to be necessary but I'm still not sure about fetchDescription. fetchDescription pulls the data from the pool description page, right? Not from the local file description page of the language wiki.

Tgr (WMF) (talkcontribs)

CommonsMetadata parses the description page to extract the image description (and author name, license etc), on both local and remote files. It is necessary for MediaViewer to show those things, but if your description page does not exist at all, your problem is at a lower level. The flow of information is like this (assuming all caches are cold):

  1. MediaViewer calls the imageinfo API to get file metadata
  2. the API calls CommonsMetadata via a hook
  3. CommonsMetadata calls File::getDescriptionPageText
  4. (assuming your file is a ForeignDBFile) getDescriptionPageText makes a HTTP request to the pool wiki's description page
  5. the pool wiki parses its description page text, loads template definition etc, returns description page HTML
  6. CommonsMetadata parses the HTML and extracts various DOM nodes

If you do not have a ForeignDB repo configured, this will fail at step 3. Specifically, just setting $wgUseSharedUploads is not a reliable way to share images (although if you have all the required tables shared as well, it should work, via LocalFile::getDescriptionPageText which does completely different things to get the description).

Till Kraemer (talkcontribs)

@Tgr (WMF), thank you so much for all the information! I really appreciate your help!

I took your suggestion and switched from $wgUseSharedUploads to $wgForeignFileRepos.

Files work but I still no file descriptions show up:

$wgForeignFileRepos[] = array(
'class' => 'ForeignDBRepo',
'name' => 'pool',
'url' => "https://pool.domain.com/w/images",
'directory' => '/path/to/pool/images/',
'hashLevels' => 2, 
'dbType' => $wgDBtype,
'dbServer' => $wgDBserver,
'dbUser' => $wgDBuser,
'dbPassword' => $wgDBpassword,
'dbFlags' => DBO_DEFAULT,
'dbName' => 'poolwiki',
#   'tablePrefix' => 'mw_',
'hasSharedCache' => false,
'descBaseUrl' => 'https://pool.domain.com/wiki/Image:',
'fetchDescription' => true,
'descriptionCacheExpiry'  => 0,
'apiThumbCacheExpiry'     => 0,
);

I also tried to use ForeignAPIRepo since it worked perfectly a while ago but now it doesn't work at all (no files show up, let alone file descriptions):

$wgForeignFileRepos[] = array(
'class'                   => 'ForeignAPIRepo',
'name'                    => 'pool',
'apibase'                 => 'https://pool.domain.com/w/api.php',
'fetchDescription'        => true, 
'descriptionCacheExpiry'  => 43200,
'apiThumbCacheExpiry'     => 0, 
);

Out of curiosity, I also tried to pull data from Wikimedia. The following configuration doesn't work for me at all (no files, no file descriptions):

$wgForeignFileRepos[] = array(
'class'                   => 'ForeignAPIRepo',
'name'                    => 'commonswiki',
'apibase'                 => 'https://commons.wikimedia.org/w/api.php',
'hashLevels'              => 2,
'fetchDescription'        => true,
'descriptionCacheExpiry'  => 0,
'apiThumbCacheExpiry'     => 0,
);

However, this configuration here works perfectly (files and image descriptions show up oh so beautifully :)

$wgForeignFileRepos[] = array(
'class'                   => 'ForeignAPIRepo',
'name'                    => 'enwiki',
'apibase'                 => 'https://en.wikipedia.org/w/api.php',
'hashLevels'              => 2,
'fetchDescription'        => true,
'descriptionCacheExpiry'  => 0,
'apiThumbCacheExpiry'     => 0,
);

I so don't get it. The file is from Commons but someone I can't pull it directly from Commons but I can get it through the English Wikipedia? What the hell? It's so weird and I just want the MultimediaViewer so badly I could cry.

Tgr (WMF) (talkcontribs)

That's very weird. Check what's logged to the http channel while you are making a request to the file description page (you can use $wgDebugLogGroups to split it into its own file).

Till Kraemer (talkcontribs)

@Tgr (WMF), unfortunately, I didn't find any http related stuff in the log file :( On the cswiki, I visited the file description page, performed a null edit and viewed the file with MultimediaViewer. My whole logfile of the cswiki looks like this ("123..." values are changed by me):

Start command line script /path/to/cs/w/maintenance/runJobs.php
[caches] cluster: MemcachedPhpBagOStuff, WAN: mediawiki-main-default, stash: db-replicated, message: MemcachedPhpBagOStuff, parser: MemcachedPhpBagOStuff
[caches] LocalisationCache: using store LCStoreCDB
[authentication] Overriding AuthManager primary authn because $wgAuth is CentralAuthPlugin
Unstubbing $wgParser on call of $wgParser::setHook from wfBlogger
Parser: using preprocessor: Preprocessor_DOM
Fully initialised
IP: 127.0.0.1
[connect] Connected to database 0 at ::1
LoadBalancer::reuseConnection: this connection was not opened as a foreign connection
LoadBalancer::reuseConnection: this connection was not opened as a foreign connection
LoadBalancer::reuseConnection: this connection was not opened as a foreign connection
[runJobs] refreshLinksPrioritized Soubor:Till_Kraemer.jpg rootJobTimestamp=12345678901234 useRecursiveLinksUpdate= triggeringUser={"userId":123,"userName":"Til
l Kraemer"} triggeringRevisionId=123 requestId=1234567890 (id=2,timestamp=12345678901234) STARTING
Revision::loadText: got id 123 from cache
[ContentHandler] Created handler for wikitext: WikitextContentHandler
Title::getRestrictionTypes: applicable restrictions to [[Soubor:Till Kraemer.jpg]] are {edit,move,upload}
[MessageCache] MessageCache::load: Loading cs... local cache is empty, got from global cache
Revision::loadText: got id 1234 from cache
Revision::loadText: got id 12345 from cache
[Preprocessor] Loaded preprocessor output from cache (key: cswiki:preprocess-xml:1234567890:1)
Unstubbing $wgLang on call of $wgLang::unstub from Wikibase\Client\Hooks\ParserLimitHookHandlers::newFromGlobalState
MWCryptRand::realGenerate: Generating cryptographic random bytes for MediaWiki\Session\SessionManager->generateSessionId/MWCryptRand::generateHex/MWCryptRand
->realGenerateHex/MWCryptRand::generate/MWCryptRand->realGenerate
MWCryptRand::realGenerate: mcrypt_create_iv generated 20 bytes of randomness.
MWCryptRand::realGenerate: 0 bytes of randomness leftover in the buffer.
[session] SessionBackend "123456789012345" is unsaved, marking dirty in constructor
[session] SessionBackend "123456789012345" save: dataDirty=1 metaDirty=1 forcePersist=0
[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache
LoadBalancer::openForeignConnection: opened new connection for 0/datawiki
LoadBalancer::reuseConnection: freed connection 0/datawiki
LoadBalancer::reuseConnection: this connection was not opened as a foreign connection
LoadBalancer::reuseConnection: this connection was not opened as a foreign connection

I can't detect any problematic things in the logfile. I really don't get why values like the complete file history are fetched but not the file description. Should I install Xdebug?

Thanks and cheers!

Till Kraemer (talkcontribs)

I installed the server's OS on a local computer, downloaded the MediaWiki files from the server and imported all databases. The result: no file descriptions from the pool. However, when I deactivated the CentralAuth extension, I was able to fetch file descriptions from the pool. Isn't that weird?

Thanks and cheers!

Till Kraemer (talkcontribs)

@Tgr (WMF), sorry, looks like it wasn't CentralAuth. On the local installation it seems to have something to do with the chroot jail in which php_fpm runs, because when I copy /etc/hosts into the jail, file descriptions are fetched from the pool. But I can't reproduce this behavior on the server. I'll keep digging.

Till Kraemer (talkcontribs)

The log file looks like this:

Without hosts file in chroot jail:

Fetching shared description from http://pool.localdomain.com/w/index.php?title=File:Till_Kraemer.jpg&amp;action=render&amp;uselang=de

HTTP: GET: http://pool.localdomain.com/w/index.php?title=File:Till_Kraemer.jpg&amp;action=render&amp;uselang=de

[http] PhpHttpRequest: error opening connection: {errstr1}

[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache

LocalisationCache::isExpired(en): cache for en expired due to GlobalDependency

LocalisationCache::recache: got localisation for en from source

[http] HTTP request failed due to unknown error.

With hosts file in chroot jail:

Fetching shared description from http://pool.localdomain.com/w/index.php?title=File:Till_Kraemer.jpg&amp;action=render&amp;uselang=de

HTTP: GET: http://pool.localdomain.com/w/index.php?title=File:Till_Kraemer.jpg&amp;action=render&amp;uselang=de

[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache

Article::view using parser cache: yes

Parser cache options found.

ParserOutput cache found.

Article::view: showing parser cache contents

[queries] dewiki: SELECT /* WatchedItemStore::loadWatchedItem Till Kraemer */  wl_notificationtimestamp  FROM `watchlist`   WHERE wl_user = '1' AND wl_namespace = '6' AND wl_title = 'Till_Kraemer.jpg'  LIMIT 1

LoadBalancer::reuseConnection: this connection was not opened as a foreign connection

File::transform: Doing stat for mwstore://pool-backend/pool-thumb/e/e7/Till_Kraemer.jpg/120px-Till_Kraemer.jpg

FileBackendStore::getFileStat: File mwstore://pool-backend/pool-thumb/e/e7/Till_Kraemer.jpg/120px-Till_Kraemer.jpg does not exist.

TransformationalImageHandler::doTransform: creating 120x80 thumbnail at /tmp//transform_1234567.jpg using scaler client

[queries] dewiki: SELECT /* LocalRepo::findBySha1 Till Kraemer */  img_name,img_size,img_width,img_height,img_metadata,img_bits,img_media_type,img_major_mime,img_minor_mime,img_description,img_user,img_user_text,img_timestamp,img_sha1  FROM `image`   WHERE img_sha1 = '123456789'  ORDER BY img_name

[queries] dewiki: SELECT /* Title::getRedirectsHere Till Kraemer */  page_namespace,page_title  FROM `redirect`,`page`   WHERE rd_namespace = '6' AND rd_title = 'Till_Kraemer.jpg' AND (rd_from = page_id) AND (rd_interwiki = '' OR rd_interwiki IS NULL) AND page_namespace = '6'

[queries] dewiki: SELECT /* ImagePage::queryImageLinks Till Kraemer */  page_namespace,page_title,il_to  FROM `imagelinks`,`page`   WHERE il_to = 'Till_Kraemer.jpg' AND (il_from = page_id)  ORDER BY il_from LIMIT 102

[queries] dewiki: SELECT /* GenderCache::doQuery/MediaWikiTitleCodec::getNamespaceName Till Kraemer */  user_name,up_value  FROM `user` LEFT JOIN `user_properties` ON ((user_id = up_user) AND up_property = 'gender')  WHERE user_name = 'Till Kraemer'

MediaWiki::preOutputCommit: all transactions committed

MediaWiki::preOutputCommit: pre-send deferred updates completed
Tgr (WMF) (talkcontribs)

Well, [http] PhpHttpRequest: error opening connection: {errstr1} suggests there is a network issue. No idea why the error message is missing. Can you dump PhpHttpRequest::$fopenErrors in PhpHttpRequest::execute(), just after if ( $this->fopenErrors ) {?

Till Kraemer (talkcontribs)

@Tgr (WMF), thanks for your help! You mean changing HttpFunctions.php like this, right?

if ( $this->fopenErrors ) {
PhpHttpRequest::$fopenErrors
LoggerFactory::getInstance( 'http' )->warning( __CLASS__
. ': error opening connection: {errstr1}', $this->fopenErrors );
}

...but wait a minute, I somehow got an error message now (without changing HttpFunctions.php):

Fetching shared description from http://pool.localdomain.com/wiki/File:Till_Kraemer.jpg?action=render&uselang=de
HTTP: GET: http://pool.localdomain.com/wiki/File:Till_Kraemer.jpg?action=render&uselang=de
[MessageCache] MessageCache::load: Loading en... local cache is empty, got from global cache
LocalisationCache::isExpired(en): cache for en expired due to GlobalDependency
LocalisationCache::recache: got localisation for en from source
[http] Error fetching URL: Couldn&#39;t resolve host &#39;pool.localdomain.com&#39;

But shouldn't it just connect to the database since I'm using ForeignDBRepo? ForeignAPIRepo doesn't work for me. And if I want to make database connections, I have to use 127.0.0.1 as the database server, everything else fails.

Is it somehow possible to fetch the file descriptions via 127.0.0.1 instead of connecting to pool.localdomain.com?

Till Kraemer (talkcontribs)

It's definitely the chroot jail! As soon as I run php_fpm without chroot, everything works perfectly and file descriptions are fetched from the pool. So the problem has nothing to do with MediaWiki.

I'm gonna find out what options I have to get DNS going inside the chroot. Linux users seem to be able to use nscd for DNS caching; on OpenBSD maybe I can use unbound for that.

Anyway, sorry for all the noise I made and thanks again for your help @Tgr (WMF)!