Currently if there is a BR tag in wikitext, it is erased completely and the end result is the last word before the tag and the first after it getting joined. Is there a way to modify this behaviour, ie. by replacing BR tags with a space?
Extension talk:TextExtracts
Appearance
The original page's html tags have been stripped, but were replaced with whitespace. This results in contents that are difficult to read and/or shorten in any easy way. Ideally, MathML contents would be shortened to a single expression, where any tags and whitespace between tags should be removed, leaving only the raw expression.
Or rather, the alttext attribute could be used for any HTML tags that provide it.
As of now the TextExtracts extension has no knowledge about the Math extension. This is a reoccurring problem with many extensions, I'm afraid.
May I ask what the use case even is? Where does this text appear? I know the team around @Physikerwelt worked on Popups support for Math. But that skips TextExtracts entirely, as far as I know.
Mediawiki 1.39.6, PHP 7.4.3, MySQL 8.0.36
Prior to 1.39.6, the PagePreview/Popup/TextExtracts either showed some text from the target article or it showed "..." The ellipsis always occured when there was _no text before the first heading_. But, if there was an associated image, the preview showed "..." on the left-hand side and the image on the right-hand.
Now with the upgrade, the preview is "Es gab ein Problem bei der Anzeige dieser Vorschau" / problems displaying the preview. No image being displayed.
How can we regain the previous behaviour?
Topic can be closed, wrong place. See Topic:Y3eq158cl5otcgdt instead.
As the title mentioned, when I try to use Popups with TextExtract, the Popups often shows "There was issues displayding this preview".
When I check by using Chrome's function to check the code and console, it shows that there is a "500 Internal Server Error".
I have tried using API Sandbox to test every part of the api, and discover that once prop=extracts part was put in the api, it will send back Error Code 500. But when it was removed, no error will be given and the output remains normal.
Is there reason why this situation would happen and is there any possible ways to solve it?
P.S I have set short URL by apache2 according to the Tutorial in Mediawiki, while api.php is accessible and have no problem to access at all.
Is this question about a self-hosted wiki? An error 500 could be anything. You would need to find the responsible error message in your server's log files. Manual:How to debug might help.
Yes, the wiki is a self-hosted wiki.
Thank you for your advice and I will try to figure it out by log files.
After debugging, it shows the following lines:
Fatal error: Declaration of TextExtracts\ExtractFormatter::onHtmlReady(string $html): string must be compatible with HtmlFormatter\HtmlFormatter::onHtmlReady($html) in /var/www/<my wiki name>/w/extensions/TextExtracts/includes/ExtractFormatter.php on line 66
Does this means that the extension php is having error?
Problem Solved after executing "composer require wikimedia/html-formatter".
When I run this script https://www.tbpedia.org/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81
It show the below extration message
{ "batchcomplete": "", "warnings": { "extracts": { "*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats." } }, "query": { "pages": { "1": { "pageid": 1, "ns": 0, "title": "\u9996\u9801", "extract": "\n" } } } }
It seems that no text had been extract .
When I use Popups Extension, it will showed " There was issues displayding this preview:.
- Did you ever solve this?
- The above link only shows the NewPP limit report commented-out text as an extract, which would explain the "Issues displaying this preview" error:
{
"batchcomplete": "",
"warnings": {
"extracts": {
"*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats."
}
},
"query": {
"pages": {
"1": {
"pageid": 1,
"ns": 0,
"title": "\u9996\u9801",
"extract": "<!-- \nNewPP limit report\nCached time: 20230324145510\nCache expiry: 3600\nReduced expiry: true\nComplications: []\n[SMW] In\u2010text annotation parser time: 0.002 seconds\nCPU time usage: 0.031 seconds\nReal time usage: 0.031 seconds\nPreprocessor visited node count: 9/1000000\nPost\u2010expand include size: 10/2097152 bytes\nTemplate argument size: 0/2097152 bytes\nHighest expansion depth: 2/100\nExpensive parser function count: 0/100\nUnstrip recursion depth: 0/20\nUnstrip post\u2010expand size: 0/5000000 bytes\n-->\n<!--\nTransclusion expansion time report (%,ms,calls,template)\n100.00% 0.000 1 -total\n-->"
}
}
}
}
Hi Joe,
The issues not resolved. What does this means "NewPP limit report commented-out text as an extrac" ?
Hi Joe,
If you access to this link https://www.tbpedia.org/w/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81
You will notice that the extract only showed the NewPP limit report commented. This cause the Popups extension said "There was an issues displaying this preview". See from here https://www.tbpedia.org/wiki/%E7%9B%A7%E5%8B%9D%E5%BD%A5%E6%96%87%E9%9B%86%E7%BF%BB%E8%AD%AF%E7%B6%AD%E5%9F%BA%E9%A4%A8
I reinstalled the MW with 1.39.3, PHP 8.0.28, SMW 4.1.1 , TextExtracts – (74baaa7) 17:23, 20 March 2023 , Previews – (010237d) 15:23, 21 March 2023, PageInages – (78537e6) 15:23, 21 March 2023 .
I am using the Short URL as well.
Initiately , the previews was working fine. Buy after I installed more Extensions until one of it ( Can't figure it whicj one), it caused this error. I removed the installed extensions the error still persist.
I suspected may be one of the extension that I installed with Composer has screwup the library ? Or there is a conflict if Javascripts ?
I thought if the issues is caused by the conflict of extensions, I just removed installed extension one by one but it doesn't work even I have removed it ( not load it from LocalSettings.php).
If the preview issues is caused by Popups extensions, then the Text extract API should be working.
I have enable the debug toolbar for easy troubleshooting.
Really apperciate if anyone can help to troubleshoot this issues.
Thanks in advanced.
Today, when I check on the page 盧勝彥文集翻譯維基館 - 真佛百科 True Buddha Pedia (tbpedia.org) , Item 3 & 5 can showed the preview but not item 2 and 4.
This is really a puzzle to me why a day ago all 4 links can't show the preview, now can only show two out of four ?
Any clues what is went wrong ? It is due to cache ? Due to the page content ?
5 minutes later, All the links can't show the preview. This created more confusion for me. What is the root cause of not display the preview ? I didn;t make any changes on the configuration.
Sorry to reopen this old talk but I want to ask if there are any solution on this question?
Basesd on my case, it seems that there is nothing wrong with TextExtracts or Popups . I notice that TextExtract will not extract any artcile that beging with heading . You need to have some text before the heading.
Yes TextExtract did not extract any article with heading as beginning. But for me, despite having text before headings, the TextExtract still cannot output anything, while Popups remained showing " There was issues displayding this preview."
That's why I would like to seek help from your past experience and see if it will be useful.
If you don't mind, pls share the link and I can test it on my wiki site.
May I know if what link do you want me to share?
The page that Popus showed "There was issues displaying this preview."
We saw that thumb captions are shown in the extract if an image is in the first paragraph. Is there a way to remove it? tried to add "figure" + "figcaption" (MW 1.40) to wgExtractsRemoveClasses with no success. Manally adding the "noexcerpt" class to the image did work.
On which wiki does this happen? What version of MediaWiki are you using?
<figure> is already part of the list of elements to remove. Since <figcaption> is inside of <figure> it will be removed as well. Maybe your wiki's configuration modifies $wgExtractsRemoveClasses in an unexpected way? Maybe your $wgParserEnableLegacyMediaDOM configuration changed, but TextExtracts wasn't updated?
I use some kind of layout for pages of my wiki. It means that almost every page begins with the div
tag.
Unfortunately, div
is among default items in $ExtractsRemoveClasses
array (defined in extension.json
of this extension). So no text is displayed by Extension:Popups for those pages as content inside div
element is ignored by TextExtracts.
I would like to remove div
item from $ExtractsRemoveClasses
in my LocalSettings.php
, but I cannot find the right way to do it. Some ideas, please?
As a workaround, I removed div
from extension.json
, but I am sure it is a bad practice.
Unfortunately, this is the only way to do this at this point... and you need to do it if you use the Citizen skin, as of this writing anyway.
I used to use this API to get excerpts from Wiktionary on Wikipedia in JavaScript, but now (since a few weeks ago perhaps) it returns a "badtoken" error. I can use other APIs on Wiktionary from Wikipedia alright, including Parse, so this is odd.
It would be great if there was a parameter $wgExtractsIncludeClasses where classes could be defined that should be included in the text extracts. Often, I use some kind of div with styling informtion also for the first paragraph that will not be included in the extracts. If I want this to work, I always have to start with some plain text, which is quite unflexible.
How would I go about this? Most pages on my site are created from template calls (using Extension:Cargo) without any other text or headings. So no summary is extracted.
I see that it's possible to create a new API but to be able to do that I'd mostly need to copy an existing one :-)
Is this about Popups, or about other usages of the TextExtracts API? It might be possible to customize the existing TextExtracts code so it supports your use-case better. Unfortunately, staff (like me) is probably not able to give a lot of support for customizations like this. If you are able to submit patches that make the TextExtracts extension work better with other extensions like Cargo, making it better for everyone, we can have a look at these patches.
Thanks again. It's about TextExtracts, which Popups on my site would use (I understand that WMF sites use something else). Cargo is only part of the background information, as my reason for having template-only pages (though it may be that most Cargo websites use it for infoboxes after introductory text, so mine may be a minority interest). If I can work it out I'll submit patches!
It turned out to be fairly easy. It works fine for me now after I got rid of "div" within "ExtractsRemoveClasses" in TextExtracts's extension.json file. Is there any way of making that change in LocalSettings.php instead?
(Initially I had wrongly assumed the extension looked at the raw wikitext but once I saw it used api.php?action=parse it became clearer...)
No easy way, but it's possible:
$wgHooks['MediaWikiServices'][] = function () {
global $wgExtractsRemoveClasses;
$wgExtractsRemoveClasses = array_diff( $wgExtractsRemoveClasses, [ 'div' ] );
};
That seems to work - thanks!