(it’s related also to Topic:Search inside uploaded documents)
I’m running MW 1.28.2, Extension:CirrusSearch REL1_28, elasticsearch 2.4.5 and I’m experimenting to integrate plugin mapper-attachments to read all kinds of OFFICE file_media_type. Right now I find them by querying elasticsearch, but I’m unable to find the results in the wiki search. My guess is, it comes all down to the proper mapping which is tricky to achieve rightly.
- Is it possible to use copy_to some how or plug into an existing search filter like “insource:” or do I have to write a SearchResult class? (I don't have access to hook CirrusSearchAddQueryFeatures MW1.29+)
- how can I direct CirrusSearch to read also from my custom
file_attachment
or the sub fieldfile_attachment.content
? - Can anybody direct me into the right direction?
Thank you.
So far I managed to index file_media_type OFFICE to use the elasticsearch plugin and by using the CirrusSearch hooks, but the data are not found by CirrusSearch only in Elasticsearch:
$wgHooks['CirrusSearchMappingConfig'][] = function ( array &$config, $mappingConfigBuilder ) {
foreach ($config['page']['properties'] as $key => &$PAGE_PROPERTIES) {
if ($key == 'file_text') {
/* https://stackoverflow.com/questions/36618549/is-it-possible-to-get-contents-of-copy-to-field-in-elasticsearch */
$PAGE_PROPERTIES['store'] = true; /* add store=1 to defaults, no effect with copy_to */
}
}
// plug in mapper-attachment
$config['page']['properties']['file_attachment'] = [
'type' => 'attachment',
"fields" => [
"content" => [
"type" => "string",
"copy_to" => ["all", "file_text"], /* no effect with copy_to */
"analyzer" => "text",
"search_analyzer" => "text_search",
]
]
];
};
$wgHooks['CirrusSearchBuildDocumentParse'][] = function (
\Elastica\Document $Doc,
Title $ThisTitle,
Content $PageContent,
ParserOutput $ParserOutput ) {
global $wgTmpDirectory;
$log_content= "\nDEBUG \$Doc:\n";
$ThisLocalFile=wfFindFile($ThisTitle);
$localFilePath = $ThisLocalFile instanceof File ? $ThisLocalFile->getLocalRefPath() : null;
if ($Doc->namespace == NS_FILE
&& $Doc->has('file_media_type')
) {
if (preg_match("@OFFICE@i", $Doc->get('file_media_type'))) {
$Doc->set('file_attachment', base64_encode( file_get_contents($localFilePath) ) ) ;
$log_content.= "\nDEBUG did set file_attachment\n";
} else {
$log_content.= "\nDEBUG file_media_type: {$Doc->file_media_type}\n";
}
}
if ($Doc->namespace == NS_FILE) {
$log_content.= "\nDEBUG \$ThisTitle:\n";
$log_content.= var_export( $ThisTitle, true);
$log_content.= "\nDEBUG \$ThisLocalFile:\n";
$log_content.= $ThisLocalFile instanceof File ? $ThisLocalFile->getLocalRefPath() : var_export( $ThisLocalFile, true);
$log_content.= var_export( $Doc, true);
file_put_contents($wgTmpDirectory . "/CirrusSearchBuildDocumentParse.log", $log_content, FILE_APPEND );
}
return true;
};
require_once "$IP/extensions/Elastica/Elastica.php";
require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";