Jump to content

Manual talk:Parameters to Special:Export

Add topic
From mediawiki.org
Latest comment: 2 years ago by Julia2661 in topic Explanation

The addcat parameter does not work!

[edit]

I tried to use a POST request generated by cURL with adding the parameter addcat and catname so that wikipedia could export a category of pages in one xml file. But I only got an xml filr lead me to the special: export page with all the page names listed in the blank, which was wired. I thought it might wanted me to download from the page, however, I got file did not exist page after clicked download... Any advice/help is welcome. Thanks in advance.

[edit]

The links do not seem to actually work, instead returning always and only the latest revision of the page in question?

  • I took a look at the source to Special:Export. It forces anything done with a GET to only get the most current version. I've added a couple of boxes to my own version of Special:Export so that the user can set the "limit" and "offset" parameters; I don't know how to change special pages, though. --En.jpgordon 00:20, 19 January 2007 (UTC)Reply

Where is the source? How do I use "my own version"?

[edit]
  • RFC: when limit=-n, dump the previoue n edits with the default being from the current edit.
  • Where is the source? How do I use "my own version"?

Thanks Doug Saintrain 19:39, 26 January 2007 (UTC)Reply

Heya. The source I'm talking about is the MediaWiki source (I got it from the SourceForge project link over there in the resource box.) There's a comment in includes/SpecialExport.php saying // Default to current-only for GET requests, which is where the damage occurs. I imagine it's trying to throttle requests. So I instead made my version by saving the Special:Export page, tweaking it, and running it on my local machine; I only had to adjust one URL to get it to work right.

More fun, though; I wrote a little Python script to loop through and fetch entire article histories, a block of 100 revisions at a time (that being the hardwired limit), catenate them into one long XML, run it through anther filter, and then look at them with the History Flow Visualization Application from IBM.[1]. Pretty.

Hi, I am trying to do the same thing, 100 at a time and then concatenating them for a full history - any way you could share the fixed export and python script? Thanks. Mloubser 11:52, 13 November 2007 (UTC)Reply
We shouldn't need limit=-n, should we? Isn't that what dir and limit should provide? My only problem, though, has been figuring what offset to start with for a backward scan. --En.jpgordon 07:43, 27 January 2007 (UTC)Reply


Thanks for responding.

Mea culpa! I didn't even see "dir". Thanks.

The reason I wanted to look at recent history was to find at which edit a particular vandalism happened to see what got vandalized.

Is there a more straightforward way of looking for a particular word in the history? Thanks, Doug. Saintrain 04:46, 29 January 2007 (UTC)Reply

Y'know, we almost have the tools to do that. The aforementioned history flow tool knows that information; I just don't think there's a way to glean it from it. --En.jpgordon 00:08, 2 February 2007 (UTC)Reply

Discussion

[edit]

Hi, is there a way to get just the total number of edits an article has had over time? Thanks! — Preceding unsigned comment added by 87.196.51.250 (talk • contribs) 20:55, 20 September 2007

As far as I can remember there is no way to get only this number (but I might be wrong). Anyway, this number can probably be easy calculated using the appropriate parameters to API. Tizio 10:02, 21 September 2007 (UTC)Reply

Parameters no longer in use?

[edit]

Using either the links provided in the article, or attempting to add my own parameters does not yield the desired results. I can only get the most recent version of the article, regardless of how I set parameters. I've tried it on several computers running Linux or windows, and at different IPs. Same problem, the parameters seem to be ignored. --Falcorian 06:59, 14 January 2008 (UTC)Reply

I've had it suggested to use curonly=0, but this also has no effect. --Falcorian
I also found that the links given did not work, nor did any experiments creating my own urls to get the history. However, submitting the parameters via a ruby script did work. I don't know enough yet (about HTTP, html forms) to understand why this approach worked and the url approach did not, but anyway here is some code that successfully retrieved the last 5 changes to the page on Patrick Donner, and writes the output to a file:
res = Net::HTTP.post_form(URI.parse("http://en.wikipedia.org/w/index.php?"), 
  {:title=> "Special:Export", :pages =>'Patrick_Donner', :action => "submit", :limit => 5, :dir => "desc"})
f = File.new("donner_output_last_5.txt", "w")
f << res.body
f.close

Hope this helps. I wish I knew enough to provide a more general solution. Andropod 00:44, 17 January 2008 (UTC)Reply

When you use the URL as in a browser, you are submitting via GET. In the above ruby script, you are using POST. This seems the solution, as for example:
curl -d "" 'http://en.wikipedia.org/w/index.php?title=Special:Export&pages=Main_Page&offset=1&limit=5&action=submit'

worked for me. Before updating this page, I'd like to check this with the source code. Tizio 12:46, 21 January 2008 (UTC)Reply
Works for me as well, which is great! Now to crack open the python... --Falcorian 03:29, 26 January 2008 (UTC)Reply
For future reference, I get an access denied error when I try to use urllib alone in python to request the page. However, if I use urllib2 (which allows you to set a custom header), then we can trick Wikipedia into thinking we're Firefox and it will return the page as expected. --Falcorian 06:57, 26 January 2008 (UTC)Reply
import urllib
import urllib2

headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4'} # Needs to fool Wikipedia so it will give us the file
params = urllib.urlencode({'title': 'Special:Export','pages': 'User:Falcorian', 'action': 'submit', 'limit': 2, })
req = urllib2.Request(url='http://en.wikipedia.org/w/index.php',data=params, headers=headers)
f = urllib2.urlopen(req)
print f.read()
This doesn't work for me. It doesn't stop at 2 versions. Neither does dir=desc work. --89.138.43.146 15:15, 25 July 2009 (UTC)Reply
I have tried all of the above with urllib2 and with getwiki.py but it seems to me that the limit parameter has stopped working? Is this the case? --EpicEditor 1:03, 30 September 2009 (UTC)

Other parameters

[edit]

I found these parameters in the source code:

curonly
appears to override the other parameters and makes only the current version exported
listauthors
export list of contributors (?) if $wgExportAllowListContributors is true
wpDownload
returns result as a file attachment: http://en.wikipedia.org/w/index.php?title=Special:Export&pages=XXXX&wpDownload
templates
images
(currently commented out in the source code)

I don't know what listauthors does exactly, maybe it's disabled on wikien. Tizio 15:40, 21 January 2008 (UTC)Reply

Also, variable $wgExportMaxHistory is relevant here. Tizio 15:43, 21 January 2008 (UTC)Reply

Also missing: "history" has a different meaning when used in a POST request (use default values for dir and offset, $wgExportMaxHistory for limit). Tizio 15:49, 21 January 2008 (UTC)Reply

Recursive downloading

[edit]

Hi, is there some way (without writing my own script) to recursively download the subcategories inside of the categories? I don't want to download the whole wikipedia database dump to get the 10,000 or so pages I want. Thanks, JDowning 17:32, 13 March 2008 (UTC)Reply

Export has changed

[edit]

All the examples (and my script which has worked for months) return only the newest version now. Anyone have ideas? --Falcorian 05:58, 24 September 2008 (UTC)Reply

disable Special:Export to users non-sysop

[edit]

Hello !

What's the best to do to disable Special:Export from some user rights ? Thanks--almaghi 14:40, 27 April 2009 (UTC)

See the main page, you have to change localsettings.php Rumpsenate 16:32, 15 July 2009 (UTC)Reply
Changing it in LocalSettings.php only allows you to disable the export function entirely, including sysops.
My solution isn't the prettiest, but works.
Edit includes/specials/SpecialExport.php
In the execute function add
//Only allow sysops to access Export page.
if(!in_array("sysop", $GLOBALS['wgUser']->getGroups())) {
	echo 'Admins only';
	exit;
}
After the globals are declared.
This solution is quite ugly, and will also get overwritten each time you update your MediaWiki installation.
Full modified file can be viewed at https://gist.github.com/Mastergalen/6405904
--Mastergalen (talk) 17:26, 1 September 2013 (UTC)Reply

Bug report on special export

[edit]

[2]

This might be misunderstanding. The description says "if the history parameter is true, then all versions of each page are returned." Rumpsenate 16:13, 15 July 2009 (UTC)Reply

How to

[edit]

I am trying to export en:Train to the Navajo Wikipedia (nv:Special:Import) as a test, but I am not having any luck. I don’t know much about commands or encoding, and I’m not certain that nv:Special:Import is properly enabled. I typed Train in the Export textbox and pressed export, and it opened a complex-looking page in my Firefox, but I can’t figure out what to do next. Nothing appears in nv:Special:Import. What am I missing? Stephen G. Brown 14:38, 17 September 2009 (UTC)

I want to limit how much history I get, but the link in the article doesn't work. 24 megabyte history too long for WIkia to upload

[edit]

It says it imports the last 1000 edit histories. That results in a 24megabyte file for Total Annihilation! The Wikia can not import a file that large, so I need something smaller it can handle. I tried the link example in the article, but it doesn't work. Click on either of them, and you'll see neither produce the results they are suppose to. Anyone have any ideas how I can do this? I became the administrator of the Total Annihilation Wikia and am trying to import the edit histories over. Dream Focus 11:11, 25 November 2009 (UTC)Reply

The article does not mention if any request throttling should be implemented.

[edit]

What kind of request throttling should be used to avoid a block? --Dc987 21:59, 22 March 2010 (UTC)Reply

Revisionid as a parameter?

[edit]

I guess there is presently no way to use revisionid as a parameter? Tisane 01:54, 16 May 2010 (UTC)Reply

It should be bundled, I think. --Diego Grez return fire 01:56, 16 May 2010 (UTC)Reply
What do you mean by "bundled"? Tisane 03:00, 16 May 2010 (UTC)Reply

Using History Flow with Special Export page

[edit]

Hi, I am using History Flow (IBM) to see the history of editing article. It designed to get data from [[[Special:Export]]]. However the special export page made the "only current revision " as automatic choice.

How can I get the entire history of article on this stage. Can I change the set of special export?

Thanks for help! Zeyi 15:44, 24 May 2010 (UTC)Reply

Does anyone have an answer to this old question? 193.190.244.18 16:01, 8 December 2010 (UTC)Reply

New parameters

[edit]

I did not include either pagelink-depth or images in my recent updates to the article because pagelink-depth appears to be broken (I know little about PHP, but $listdepth appears to be uninitialized in any file), and as stated in the code itself, images is deliberately disabled for the time being. I had typed up a description for pagelink-depth before realizing it didn't work anywhere, so I've included it below in case it's wanted in future.

pagelink-depth
Includes any linked pages to the depth specified. Hard-coded to a maximum depth of 5 unless overridden by $wgExportMaxLinkDepth.

– RobinHood70 10:22, 22 November 2010 (UTC)Reply

Actually, I've confirmed that this does work, and the wonderful people at bugzilla corrected the misperception of $listdepth not being initialized (it's initialized in the If statement - changed in a later revision to be more clear). Given that, I'm copying this back onto the page. – RobinHood70 talk 20:12, 22 November 2010 (UTC)Reply

curl example did not work with slash in name

[edit]

The example in the article did not work with slash in names (Österreichischer_Fußball-Cup_2013/14)and also non latin chars. Expand this with curl --data-urlencode "&pages=<PAGE>&offset=1&action=submit" http://de.wikipedia.org/w/index.php?title=Special:Export -o <PAGE>.xml also did not solve the problem. Any ideas?

Transfer to HTML

[edit]

Dear All,
How can I export a complete wiki into a set of static HTML pages?
Thank you in advance!
Have a nice weekend.
Ciciban (talk) 08:04, 5 November 2016 (UTC)Reply

The best technology available is mwoffline, which however requires Parsoid. Otherwise, there are a series of hacks... But this is a topic for Manual:Backing up a wiki. Nemo 08:12, 5 November 2016 (UTC)Reply

History parameter is not working always

[edit]

I used two links like https://en.wikipedia.org/w/index.php?title=Special:Export&pages=US_Open_(tennis)&history=1&action=submit which was giving all history and other https://en.wikipedia.org/w/index.php?title=Special:Export&pages=India&history=1&action=submit which was not giving error,any help please? — Preceding unsigned comment added by Srinadhu (talk • contribs) 05:48, 13 June 2017

custom Special:Export and API

[edit]

On pl.wiktionary.org Special:Export, and the relevant API, instead of returning a page with xml tag title: <title>Dyskusja wikipedystki:Zu</title> it returns the <title>Dyskusja wikipedysty:Zu</title>. It took me one night!!! to realize that nothing this was wrong with in my python script using the APIs. Even if this is a custom change in Special:Export, made from contributors in pl.wiktionary, the API should stay common in all WMF projects. If pl.wiktionary changed something then the API "should find" another way to get the results. --Xoristzatziki (talk) 08:36, 16 June 2017 (UTC)Reply

Even more amusing, in the API, it goes the other way around and normalizes "wikipedysty" to "wikipedystki". It's obviously a normalization issue of some kind; you might want to report this as a bug in Phabricator. – Robin Hood  (talk) 17:15, 16 June 2017 (UTC)Reply
This is about gender namespaces, by the way. I don't think it has anything to do with the specific wiki, just with the locale. --Nemo 19:23, 16 June 2017 (UTC)Reply
Ok. This seems to be a conflict between what MediaWiki API sees as "title" in Special:Export and what as "title" to all other API "query"s (recentchanges, moves etc), independently of site. Thanks --Xoristzatziki (talk) 18:09, 22 June 2017 (UTC)Reply
By the way, Special:Export returns the real page name, so has not to be discussed here. --Xoristzatziki (talk) 18:13, 22 June 2017 (UTC)Reply

"Keep in mind that that exporting is still possible, if you have the API enabled."

[edit]

How would you disable it, now that $wgEnableAPI has been removed? MW131tester (talk) 10:32, 8 February 2019 (UTC)Reply

The API is separate from Special:Export. Julia2661 (talk) 15:01, 8 February 2022 (UTC)Reply

Reason for HTTP GET restriction?

[edit]

Why are three URL parameters, as described in § URL parameter requests do not work, only supported through HTTP POST, and not HTTP GET?

What is the technical reason for that limitation? --17:39, 1 November 2020 (UTC)

Explanation

[edit]

Would you mind explaining this? If individual pages and revisions are downloaded via the API, that is not Special:Export. --Julia2661 (talk) 14:59, 8 February 2022 (UTC)Reply

  • The text just points out that you can also "export" pages through the API, so if you should not feel that limiting the use of Special:Export on your wiki will fully stop people from extracting the contents. While perhaps not precise, "export" is still a valid way of describing the effect. If you want to clarify the terminology feel free, but it's a useful cautionary note and should not be just entirely erased. --Clump (talk) 15:23, 8 February 2022 (UTC)Reply
    I see. Thanks for explanation. Julia2661 (talk) 15:50, 8 February 2022 (UTC)Reply