Jump to content

Extension talk:Data Transfer/Archive 2014 to 2018

From mediawiki.org

Issues with CSV upload

Hello,

I'm running MediaWiki 1.31.1 on Mac OS and installed the "Data Transfer" extension according the instruction. From the "cheese" example I generated a CSV file and imported it via special page "Import CSV". The import worked well. Four pages were generated.

After this initial test, I deleted the four newly created 'cheese'-pages and uploaded another CSV-file. After importing this, the just deleted four 'cheese'-pages reappeared again. I needed to delete them but after uploading another file the showed up again.

Could you please be so kind and give advice? Thanks!

That's very odd... Data Transfer uses MediaWiki "jobs" to do all the page creation, so my only guess is that these jobs are not getting properly deleted after being run. If you have command-line access, try going to the main wiki directory and calling "php maintenance/runJobs.php", and see what happens. If you run it more than once and the same jobs get run, that's definitely a problem. Yaron Koren (talk) 21:47, 13 November 2018 (UTC)Reply
Thanks for your reply. I needed some time to figure out how to work the Mac OS terminal. As adviced I move to the root of my wiki and entered "php maintenance/runJobs.php". After repeated calling, I received this:
[73e0b5108ad818949907aa8b] [no req]   JobQueueConnectionError from line 754 of .../Wiki/includes/jobqueue/JobQueueDB.php: DBConnectionError:Cannot access the database: No such file or directory (localhost)
Backtrace:
#0 .../Wiki/includes/jobqueue/JobQueueDB.php(608): JobQueueDB->getReplicaDB()
#1 .../Wiki/includes/jobqueue/JobQueue.php(657): JobQueueDB->doGetSiblingQueuesWithJobs(array)
#2 .../Wiki/includes/jobqueue/JobQueueGroup.php(364): JobQueue->getSiblingQueuesWithJobs(array)
#3 .../Wiki/includes/jobqueue/JobQueueGroup.php(248): JobQueueGroup->getQueuesWithJobs()
#4 .../Wiki/includes/jobqueue/JobRunner.php(167): JobQueueGroup->pop(integer, integer, array)
#5 .../Wiki/maintenance/runJobs.php(89): JobRunner->run(array)
#6 .../Wiki/maintenance/doMaintenance.php(94): RunJobs->execute()
#7 .../Wiki/maintenance/runJobs.php(122): require_once(string)
Any thought on this? Thanks!
This is not a Data Transfer issue - I did a web search on that error message and found this, among other things. Does that help? Yaron Koren (talk) 15:20, 21 November 2018 (UTC)Reply
Hello Yaron, I followed your hint and started a google search as well and found some possible solutions (1, 2, 3, 4, 5, 6, 7, 8). It seems that some adjustments in Local Settings may be of help:
$wgDBserver = "localhost:/Applications/MAMP/tmp/mysql/mysql.sock";,
$wgDBserver = "127.0.0.1"; or
$wgDBserver = "127.0.0.1:8889";.
None of this worked for me. Unfortunately this type of work is out of my competences. Since I also have problems with some other extensions (see here and here), it might be a specific problem with my system. Thanks!

Fixed problem with processing Utf-8 characters

When I tried to import Unicode characters, I was getting garbage. This problem was fixed by changing the file "./extensions/DataTransfer/specials/DT_ImportCSV.php"

I commented out the code between lines 119 and 129, changed it to the following:

		} else {
			while ( $line = fgetcsv( $csv_file, 0, '|' ) ) {
				// Convert from UTF-8 to ASCII - htmlentities()
				// fails for UTF-8 if there are non-ASCII
				// characters.
				//$convertedLine = array();
				//foreach ( $line as $value ) {
				//	$convertedLine[] = mb_convert_encoding( $value, 'UTF-8', 'ASCII' );
				//}
				//array_push( $table, $convertedLine );
				array_push( $table, $line );
			}
		}

After the line is read with "fgetcsv", the line does not need any additional processing. Mediawiki can handle a line with unicode text fine. I also changed the delimiter character, because I was having issues with the comma character (it was too common in the data I was importing). That should actually be a GUI option, but that is another issue. Zzmonty (talk) 20:25, 7 August 2018 (UTC)Reply

I can confirm the bug. I have German Umlauts in a CSV file, correctly saved in UTF-8, but the 'ä' gets encodes as 'ä'. With the above commented out, it works. --Krabina (talk) 12:22, 4 March 2019 (UTC)Reply

MWException

Hi,

I have previously tried to solve this problem here and it has been suggested that I contact yourselves to see if I can get it resolved, Many thanks,

--Budstellabecks (talk) 15:03, 3 January 2014 (UTC)Reply


My wiki is currently bugged. I think it may have something to do with the same info being submitted twice as a csv import.

As a side effect I'm unable to save pages as well as a few other things.

MediaWiki 1.20.4 PHP 5.3.3 (cgi-fcgi) MySQL 5.0.92-50-log

The error I get using $wgShowExceptionDetails = true; bring up the following,

Thanks in advance for any info

Internal error
Jump to: navigation, search

The Title object did not provide an article ID. Perhaps the page doesn't exist?

Backtrace:

#0 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/parser/ParserOutput.php(390): LinksUpdate->__construct(Object(Title), Object(ParserOutput), true)
#1 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/WikiPage.php(1749): ParserOutput->getSecondaryDataUpdates(Object(Title))
#2 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/WikiPage.php(1617): WikiPage->doEditUpdates(Object(Revision), Object(User), Array)
#3 [internal function]: WikiPage->doEdit('{{Individual Mo...', 'Add more months...')
#4 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/Article.php(1820): call_user_func_array(Array, Array)
#5 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/extensions/DataTransfer/includes/DT_ImportJob.php(77): Article->__call('doEdit', Array)
#6 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/extensions/DataTransfer/includes/DT_ImportJob.php(77): Article->doEdit('{{Individual Mo...', 'Add more months...')
#7 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/Wiki.php(594): DTImportJob->run()
#8 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/Wiki.php(556): MediaWiki->doJobs()
#9 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/includes/Wiki.php(447): MediaWiki->restInPeace()
#10 /websites/123reg/LinuxPackage23/yo/ur/si/yoursitehost.co.uk/public_html/partymixers/manager/index.php(59): MediaWiki->run()
#11 {main}
Hi - I would delete the contents of your "job" database table - that should at least take care of these current problems. Then, if you're ready to do the import thing again, it might work better now that you've updated all the software... but if you get the same problems, please let me know. Yaron Koren (talk) 16:07, 3 January 2014 (UTC)Reply

Import CSV error, line break SOLVED

In case you use Mac OS X, make sure you save your CSV file with Windows line break format. I have Excel and TextWrangler (just another text editor). I used Excel to edit the CSV file. Then I reopened the file in TextWrangler and save as, specifying that it uses Windows line breaks (CRLF) (as per the CSV specification linked to on the extension page), rather than Classic Mac (CR).

--Tommyheyser (talk) 22:19, 9 March 2014 (UTC)Reply

That's very helpful! It's odd, a linux open source system using the DOS convention, but I suppose they do it deliberately.

In vi you need to type:

%s/^M/^M^J/

Incidentally, this isn't the right place to ask, but, is there a way to have vi instead of the editor mediawiki uses as a standard?

Fustbariclation (talk) 05:13, 10 March 2014 (UTC)Reply

Tommyheyser - thanks for diagnosing that issue; I just added a note about that to the documentation. Fustbariclation - I believe CSV files saved with both Windows and Linux are handled fine; it's only Mac (apparently) that requires special treatment. (It could well be that the Data Transfer code could be modified to handle the Mac style as well; I haven't looked into it.) Yaron Koren (talk) 21:54, 10 March 2014 (UTC)Reply

How to add text at the top the page while keeping current page content, ?

Thanks,

Nicolas NALLET (talk)

If you choose the "append" option, it will add content to the bottom of the page; there's no way to add content to the top of the page, as far as I know, but you could probably easily change the "append" behavior in the code. Yaron Koren (talk) 14:40, 25 March 2014 (UTC)Reply
Ok Thanks Yaron it works, in the file DataTransfer/includes/DT_ImportJob.php, line 61

I have replaced

$text = $existingText . "\n" . $text;

by

$text = $text . "\n" . $existingText;

Nicolas NALLET (talk)

Extension breaking /mw-config/ [SOLVED]

Hi Yaron,

Firstly, I must say that I really admire your work and use several of your extensions. Bravo!

Two years ago I used DataTransfer successfully on an earlier incarnation of my website. I was then using it with the Semantic bundle which is impressive, but the bundle was unsuited to my needs and I no longer use it.

Today I upgraded my site to Mediawiki 1.24 and coincidentally installed the DataTransfer extension soon afterwards. I have been trying to get Special:ImportCSV and Special:ViewXML to work but without success so far. Both just seem to hang. Import CSV responds with "Importing... 2 pages will be created from the CSV file" but it does not create either of the new pages. Two years ago Special:ImportCSV worked okay.

Later, when I visited the config page - eg. http://www.example.com/mw/mw-config/index.php - I found that DataTransfer did cause an error:


Warning: include_once(/languages/DT_LanguageEn.php) [function.include-once]: failed to open stream: No such file or directory in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 98

Warning: include_once() [function.include]: Failed opening '/languages/DT_LanguageEn.php' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 98

Fatal error: Class 'DT_LanguageEn' not found in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 102


I am hopeful that the /mw-config/ related error may in someway be symptomatic of the reason that I cannot get the extension to work.

Thanks in advance for any advice.

Regards, Cruickshanks (talk) 09:20, 26 September 2014 (UTC)Reply

Are you using the latest version of Data Transfer, 0.6? The line numbers make it seem like you're using an older version. Yaron Koren (talk) 12:45, 26 September 2014 (UTC)Reply
Yaron, I have just found that the Download Snapshot, using the Special:ExtensionDistributor for Mediawiki Release 1.23, was downloading Version 0.5. I have now downloaded Version 0.6 using the adjacent (Git master) link in the Infobox. It is still breaking Config however, but with different line numbers now:
Warning: include_once(/languages/DT_LanguageEn.php) [function.include-once]: failed to open stream: No such file or directory in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 106
Warning: include_once() [function.include]: Failed opening '/languages/DT_LanguageEn.php' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 106
Fatal error: Class 'DT_LanguageEn' not found in /home/mysite/public_html/mw/extensions/DataTransfer/DataTransfer.php on line 110
Special:ViewXML is now working with Version 0.6 installed. I have not tried ImportCSV as I have a lengthy BOT running at present, but my 'spider senses' guess that ImportCSV will probably work now.
By the way, I incorrectly said in my first post I had upgraded to 1.24; I meant to say I had upgraded to 1.23.4.
Regards, Cruickshanks (talk) 15:43, 26 September 2014 (UTC)Reply
Oh, okay. I looked through my emails and realized that I had a an email discussion with someone about this very same problem a few weeks ago. We weren't able to figure out the underlying cause, but were able to fix it by just commenting out line 77 of DataTransfer.php, which reads "dtfInitContentLanguage($wgLanguageCode);". It's fine to do because the language stuff is only used for a few minor features, like XML grouping, that you're probably not using. Yaron Koren (talk) 16:40, 26 September 2014 (UTC)Reply
Thanks Yaron, that fixed it. Regards, Cruickshanks (talk) 18:49, 26 September 2014 (UTC)Reply

Multi-instance templates not preserved in output

If you import a file and update fields in the base template of a page it removes any multi instance templates embedded within and replaces them with the text "Array" this can be solved quite easily in DT_PageStructure.php

public function toWikitext() {
   if ( $this->mIsTemplate ) {
      $wikitext = '{{' . $this->mTemplateName;
      foreach ( $this->mFields as $fieldName => $fieldValue ) {
	 if ( is_numeric( $fieldName ) ) {
            $wikitext .= '|' . $fieldValue;
	 } else {
//----fix for internal multi-instance templates
            if (is_array($fieldValue)){
               $wikitext .= "\n|$fieldName=";
               foreach ($fieldValue as $arrayItem){
                  if (is_a($arrayItem, 'DTPageComponent')) {
                     $wikitext .= $arrayItem->toWikitext();
                  }
                }
             }else{
                $wikitext .= "\n|$fieldName=$fieldValue";
             }
//--end fix
          }
       }
       $wikitext .= "\n}}";
       return $wikitext;
     } else {
	return $this->mFreeText;
     }
}

I tried this on my wiki and it didn't work for me. This is blanking all of the pages that already exists on my ImportCSV list. --V brooks (talk) 16:48, 17 May 2019 (UTC)Reply

XML Import Only Imports 2 Pages

Hello:

I have an XML file coded to create 80 new pages.

When I try to do the import, I get a message saying:

Importing...
2 pages will be created from the XML file.

How do I create all 80 pages at once?

Thank you,

--Patricia Barden (talk) 20:37, 6 January 2015 (UTC)Reply

I'm guessing that there's some issue with the XML. Could you pastebin it or something? Yaron Koren (talk) 22:47, 6 January 2015 (UTC)Reply
Here is a link to the XML file on Google Drive: https://drive.google.com/file/d/0BxAhvHwTqRzPTVRjbEFaVTVkRTg/view?pli=1
By the way, I'm using MediaWiki 1.23.6 and Data Transfer 0.6
Thank you so much for taking a look.--Patricia Barden (talk) 02:03, 7 January 2015 (UTC)Reply
Ah, great. I'm guessing the problem is the quotes in the page title of the 3rd page - those need to be escaped, like with \" or something. Yaron Koren (talk) 02:10, 7 January 2015 (UTC)Reply
I escaped the quote marks in the Page Title attribute by using "
E.g., <Page Title="Mary P. &quot;Prissy&quot; Hickerson">
After doing that for all Page Title attributes, the pages were created in a flash.
It was not necessary, however, to escape the quote marks in the Field Name attribute values. This worked just fine:
E.g., <Field Name="Name">Mary P. "Prissy" Hickerson</Field>
Thought you'd like to know. Thank you again! --Patricia Barden (talk) 16:49, 7 January 2015 (UTC)Reply

Nothing exported

I just get the page with:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

And nothing else:


Product Version MediaWiki 1.24.2 PHP 5.5.22 (cgi-fcgi) MySQL 5.5.42-cll


I'm actually trying to move properties from one SMW to another - 'export pages' doesn't seem to export properties, even when mentioned specifically.

Fustbariclation (talk) 09:25, 11 June 2015 (UTC)Reply

Are you trying create a "copy" of property pages, pages with properties on them or maybe both? If so you should be fine when you use Export pages. This will export the actual page content, it does not matter what is on them. You have to make sure you export everything you need for it to work on the other wiki, including categories and templates. This is why categorising pages is important :) it makes duplicating pages easier. When you import the resulting XML on your destination Wiki it will covert the page content back to properties and values. Regards, --Felipe (talk) 09:50, 11 June 2015 (UTC)Reply
When testing SMW MW's Import/Export XML is used to import property/value declarations from another wiki. See also swm@org Export content --MWJames (talk) 11:05, 11 June 2015 (UTC)Reply

Nothing imported

Hello,

each time i try to import pages it simply does not work. It tells "0 pages will be created from the XML file". Mediawiki version 1.24.1.

Here is a pastebin in order to see if you can help me.

http://pastebin.com/kdS3K2Ci

Thanks.

P.D: Could it be because of language? Perhaps it does not simply understand spanish, but that would be really extrange, as extension itself gave that XML to me.

--00Maiser00 (talk)

I think I made a bad mistake by having the XML field names be translatable, instead of just always being in English. For the case Spanish, I think the issue is that the translation for "ID" is "Id." - I think the "." is causing the problem. It could be that the accents in "Páginas", etc. are also a problem. If it's not too much trouble, I would suggest replacing all the Spanish field names with their English-language equivalents (you can find them here), which will hopefully work. And I probably need to change that internationalization thing. Yaron Koren (talk) 17:11, 3 August 2015 (UTC)Reply

I just made a program to translate all these tags from spanish to english, still not working. It may be because i have putted some tag incorrectly. Could you provide me with a viewxml in english where i can see all elements please?

Moreover, this is what is actually not working for me (as an example, trying to import forms).

http://pastebin.com/ZFFULXdi

Could it be because it is trying to read in spanish, but does not understand it? I have tried with other pieces of xml (simple ones) and is still not importing, no matter the language i use.


--00Maiser00 (talk)

It should be "Free_Text", not "Free Text" - maybe that's the issue. Yaron Koren (talk) 00:21, 17 August 2015 (UTC)Reply

I changed that and it is still not working. Could you send me some text which can be confirmed is working so i can test if it is because of the syntax or simply because a problem in the import when the wiki is in another language?

--00Maiser00 (talk)

Any change yet for the language? If it is related to vocabulary, i guess it is regarding something which can not be changed in the xml generated document. Perhaps it is because of "user hidden" vocabulary in the import or something? I am pretty surprise i am the only non english user which is talking about this issue :S.


EDIT: I can confirm the problem is not solved changing vocabulary to english, i tried importing elements from your example page: http://discoursedb.org/wiki/Special:ViewXML And it still did a 0 import.--00Maiser00 (talk)

Error when trying to import categories

Hi,

this time, when i choose categories in viewXML i get an error telling:

This page contains the following errors:

error on line 2 at column 1: Extra content at the end of the document Below is a rendering of the page up to the first error.

Nothing more is showed, even with $wgShowExceptionDetails=true;

What could be the cause? --00Maiser00 (talk)

Does that happen for any category? Yaron Koren (talk) 13:32, 2 September 2015 (UTC)Reply

Deprecated : Methods with the same name as their class when used with PHP 7

Hello. During testing under PHP 7 you get a deprecated warning because of 2 methods with the same name as their class. Filed a bug report for that with a patch. I hope it is done correctly. Regards, --Felipe (talk) 13:04, 25 September 2015 (UTC)Reply

Special:ImportCSV No such special page

Hi. Thanks in advance for any suggestions.

I do a 1) Fresh Mediawiki 1.26 install. With composer I install 2) mediawiki/semantic-media-wiki "2.3.1" 3) ediawiki/semantic-forms "3.5" 4) mediawiki/semantic-result-formats "2.3" 5) mediawiki/data-transfer "dev-master" 6) Add include_once( "$IP/extensions/DataTransfer/DataTransfer.php" ); at end of LocalSettings 7) Run php update.php 8) resolve an Error in DataTransfer.php namely "undefined variable wgScriptPath" by copying $wgScriptPath = "/mw126"; from LocalSettings.

But the extension does not appear in Special:Version which is confirmed by Special:ImportCSV "No such page". Tried Mediawiki 1.23 but same problem. Thanks.

Setup Details XAMPP for Windows 7.0.3

Product Version MediaWiki 1.26.2 PHP 7.0.3 (apache2handler) MariaDB 10.1.10-MariaDB

Composer.json "require": { "composer/semver": "1.0.0", "cssjanus/cssjanus": "1.1.1", "ext-iconv": "*", "liuggio/statsd-php-client": "1.0.16", "oyejorge/less.php": "1.7.0.9", "mediawiki/at-ease": "1.1.0", "oojs/oojs-ui": "0.12.12", "php": ">=5.3.3", "psr/log": "1.0.0", "wikimedia/assert": "0.2.2", "wikimedia/cdb": "1.3.0", "wikimedia/composer-merge-plugin": "1.3.0", "wikimedia/ip-set": "1.0.1", "wikimedia/utfnormal": "1.0.3", "wikimedia/wrappedstring": "2.0.0", "zordius/lightncandy": "0.21", "mediawiki/semantic-media-wiki": "2.3.1", "mediawiki/semantic-forms": "3.5", "mediawiki/semantic-result-formats": "2.3", "mediawiki/data-transfer": "dev-master" },

Can Data Transfer be installed/included via Composer? I'm not sure it can - maybe that's the issue. I would try removing it from composer.json, and installing/including it manually. Well, you're already (also) including it manually. Yaron Koren (talk) 04:55, 10 March 2016 (UTC)Reply

Yes it does install via Composer. [1]
Yes removing it from composer.json solved the problem.
Thank you.

Possible to edit only parts

I want to use Data transfer to upload an csv-file to write into the my SemanticFormular. Could I create the Page with an frist CSV upload and the most important information and fill the articles with more information later with an secound upload, withour blanking/overwritting the existing information? --Tester of Datatransfer (talk) 15:06, 27 April 2016 (UTC)Reply

You have a great username! Unfortunately, no - the Data Transfer import is not "smart". You would probably have to do that kind of import with a bot/script. Yaron Koren (talk) 19:34, 27 April 2016 (UTC)Reply

Feature request evaluation level (nested templates problem)

Hi,

using Data Transfer 0.6.1 the problem arises that in the 1st template level the value of a parameter can have further nested templates (2nd level). The 2nd level is also transformed to xml but embedded in partially Wiki-parsed HTML from the 1st level value, that is: plain text becomes <p> (or <ul>, <dl> etc.), the value is then regarded as Wiki code to be parsed. See the XML output examples. Without the nesting templates it would transform to:

<1stLevelTemplateName>
  <1stLevelParameter>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam; semicolon keeps plain text</1stLevelParameter>
</1stLevelTemplateName>

But with nesting templates it parses each 1st-level-value partially that is split by a 2nd-level-template:

<1stLevelTemplateName>
  <1stLevelParameter>
    <Free_Text id="1">&lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam &lt;/p&gt;</Free_Text>
    <2ndLevelTemplateName>
      <2ndLevelParameter>Lorem ipsum</2ndLevelParameter>
    </2ndLevelTemplateName>
    <Free_Text id="1">&lt;dl&gt;&lt;dt&gt; starting with a semicolon becomes a definition list&lt;/dt&gt;&lt;/dl&gt;</Free_Text>
  </1stLevelParameter>
</1stLevelTemplateName>

Why is the value in the 1st level (that has nested templates in it) partially wiki parsed? Could there be a feature to not evaluate that deep? Do you know what I mean? I did not expect that the 1st-level value, that embeds 2nd-level-templates, is parsed partially. Any solution to that?

Kind regards --Andreas P. 11:36, 18 May 2016 (UTC)Reply

That sounds like a bug. Yaron Koren (talk)
Well, possibly not: I saw just now that there are $wgDataTransferViewXMLParseFields and $wgDataTransferViewXMLParseFreeText that prevent parsing. The default setting $wgDataTransferViewXMLParseFreeText = true; causes this (for me unexpected) behaviour. --Andreas P. 14:22, 18 May 2016 (UTC)Reply
Ah -so if you set it to false, it works fine? Yaron Koren (talk) 16:02, 18 May 2016 (UTC)Reply
Works almost fine, but a bug occurs when there are nested templates. See the issue I just filed on https://phabricator.wikimedia.org/T135770 --Andreas P. 21:03, 19 May 2016 (UTC)Reply
And could there be a feature to set the transformation level of templates? --Andreas P. 08:06, 20 May 2016 (UTC)Reply
That sounds like a good idea. Yaron Koren (talk) 13:03, 20 May 2016 (UTC)Reply


Schema file for ViewXML import?

I have hundreds of non-wiki XML files, each containing thousands of entries which I'm trying to import as separate pages in my wiki. This extension seems like the right way to go. I'm trying to create an XSLT (Using the 30 day free trial of Altova MapForce because I have no idea what I'm doing) to convert the XML from the schema it's currently distributed in to the format generated by the Data Transfer ViewXML page. Is there an .xsd schema file available for the import? Thanks. Tcrimsonk (talk) 19:09, 29 June 2016 (UTC)Reply

No, unfortunately. Yaron Koren (talk) 02:25, 30 June 2016 (UTC)Reply

Best way to escape commas and double quotes in creating CSV to import?

Related to my question above, it would probably be easier for me to convert my existing non-wiki XML files into CSV for import rather than trying to match your XML schema. However, the existing XML contains fields with text including both commas and double quotes. How should I indicate which of these are deliberate delimiters in a way that Data Transfer will recognize? Thanks again. Tcrimsonk (talk) 19:09, 29 June 2016 (UTC)Reply

I agree that CSV would probably easier than XML. You should just look up how to handle commas and quotes in CSV; I think they both require extra double quotes. Yaron Koren (talk) 02:33, 30 June 2016 (UTC)Reply
Thanks; from what I've found, there are various ways to handle commas and quotes in CSV, and the choice depends on what program is going to be interpreting/processing that CSV. So I would still need to know how Data Transfer (or perhaps just MediaWiki in general) handles them. I'll try a CSV import with all fields encased in double quotes and see how that works. I'll reply back here once I know. (Unless someone else has a suggestion.) Tcrimsonk (talk) 03:25, 30 June 2016 (UTC)Reply

Automatic/Bot/Script based Importing?

Is there any way to monitor a given directory, and automatically import any XML/CSV files that appear there? Or is there a command line way to invoke the import? Tcrimsonk (talk) 03:25, 30 June 2016 (UTC)Reply

Unfortunately, no - Data Transfer doesn't provide an API. You would need to create a custom bot/script to do it, I think. Yaron Koren (talk) 14:47, 30 June 2016 (UTC)Reply

If the extension should be extended with automatic importing, what would be the way to go? Would a maintenance script that reads a file and does the import be a simple way to do that?--AdSvS (talk) 12:03, 26 October 2016 (UTC)Reply

A built-in import script does sound like the easiest approach, yes - I hadn't thought of that. Yaron Koren (talk) 17:14, 26 October 2016 (UTC)Reply

Version 0.6.2 doesn't display properly special characters like é

with Utf-8

é

is displayed

é

after csv importing .

Previous versions like 0.6.1 displays special characters (like é) properly. -- Nicolas NALLET 24. Feb. 2017, 13:13‎

What if you use UTF-16 instead? Yaron Koren (talk) 14:30, 24 February 2017 (UTC)Reply
Sounds like you ran exactly into the same situation as I did as described in task T151160. So yes, saving as UTF-16 and importing as UTF-16 should work. Also saving as UTF-16, then re-saving as UTF-8 and importing as UTF-8 should work too. The only thing seemingly not to work is saving as UTF-8 and importing as UTF-8 directly without doing the UTF-16 loop. Cheers --[[kgh]] (talk) 14:57, 24 February 2017 (UTC)Reply
Yes it's working with UTF-16, thansk guys

Nicolas NALLET Wiki-Valley.com, Semantiki.fr (talk) 13:55, 27 February 2017 (UTC)Reply

CSV / Spreadsheet - How???

This might be cause by a very stupid mistake, but I am unable to get DataImport work.

After creating a csv-file:

Title,Cheese[Country],Cheese[Texture],Free Text
Mozarella,Italy,Semi-soft,It's good on pizzas!
Cheddar,England,Hard/semi-hard,"Often sharp, but not always."
Gorgonzola,Italy,"buttery or firm, crumbly","salty, with a ""bite"" from its blue veining"
Stilton,,"",needs more data

and a Template:Cheese:

This cheese is from {{{Country}}} and the texture is generally {{{Texture}}}

... all I see is: "Importing... 4 pages will be created from the CSV file."

My logfile shows these Exceptions and Errors: (shortened )

[exception] [93065002f4979874bc4e15f3] /Special:ImportCSV   DBUnexpectedError from line 2852 of /var/www/html/mediawiki/includes/libs/rdbms/database/Database.php: MWCallableUpdate::doUpdate: Cannot flush snapshot because writes are pending (WikiPage::insertOn, Revision::insertOn, Revision::insertOn, WikiPage::updateRevisionOn, WikiPage::updateRedirectOn, RecentChange::notifyNew).
salty, with a "bite" from its blue veining requestId=93065002f4979874bc4e15f3 (id=224,timestamp=20170225134852) t=121 error=DBUnexpectedError: Invalid atomic section ended (got WikiPage::doCreate).
[exception] [6add1bd85ff9e614981c4c0e] /index.php?action=emailtowiki&prefix=EmailToWiki   DBUnexpectedError from line 2852 of /var/www/html/mediawiki/includes/libs/rdbms/database/Database.php: MWCallableUpdate::doUpdate: Cannot flush snapshot because writes are pending (WikiPage::insertOn, Revision::insertOn, Revision::insertOn, WikiPage::updateRevisionOn, WikiPage::updateRedirectOn, RecentChange::notifyNew).
It's good on pizzas! requestId=6add1bd85ff9e614981c4c0e (id=287,timestamp=20170226025908) t=189 error=DBUnexpectedError: Invalid atomic section ended (got WikiPage::doCreate).

But where are the 4 pages??? No pages have been created. And why is it 4 pages? I would expect 1 page ...

What am I missing? Any help is highly appreciated! TieMichael (talk) 07:49, 26 February 2017 (UTC)Reply

I hadn't seen the error "Cannot flush snapshot because writes are pending" before, so I did a web search on it, and I found some bug reports like this. This looks like a bug in MW 1.28.0, which I assume is what you're using. Hopefully it will get fixed soon.
It's four pages because each row gets turned into a page. Yaron Koren (talk) 14:08, 26 February 2017 (UTC)Reply
Thanks for your feedback! TieMichael (talk) 14:03, 27 February 2017 (UTC)Reply

UTF-8 not recognized

In a wiki with Data Transfer 0.6.1 I could correctly import an UTF8 formatted CSV with special characters. On the same wiki, I upgraded to 0.6.2 and it results in creaton of a wrong pagename, because the special character is no longer recognized. Seems to be a bug from version 0.6.1 to 0.6.2. UTF 16 works, though. --Krabina (talk) 08:59, 2 March 2017 (UTC)Reply

I can confirm what Krabina reports : utf-8 csv file content is not correctly imported either in wiki page content (accented characters are mangled). utf-16 just works. ( Data Transfer 0.6.2/MW 1.27.3 )

It's the same here with V 1.0 (bf885af) / MW1.31.1 . Carchaias (talk) 12:21, 23 November 2018 (UTC)Reply

"0 pages will be created from the XML file"

Hello !

I've got a problem with DataTransfer import from XML, when I import an XML file with "Special:ImportXML" it show me that error message:

Expected 'Pages', got 'pkg:package'Expected <Page>,got <pkg:part>Expected <Page>, got <pkg:xmlData>Expected <Page>, got <Relationships>Expected <Page>, got <Relationship><br />
"0 pages will be created from the XML file"

And the import does not work, anyone know or have a clue to fix it ?

The Special:ImportXML page requires that the XML be in a specific XML format. I'm surprised now to see that the documentation page doesn't actually describe the format, but you can see an example of it here. Though the "import" version of the format is slightly different - you can see the differences explained in the 2nd paragraph here. Yaron Koren (talk) 14:40, 9 May 2017 (UTC)Reply
Thanks for the answer i'll try it :)

Hi it's me again :D I created XML files with the right format, but now I've this :

Import in process, 1 page will be created from XML file

But it doesn't work, I tried many importations with different syntax with no results. I tried to let it load during 1 hour but no results too. I can't find where is the problem :/

Maybe it's still in the job queue? See the first item here. Yaron Koren (talk) 17:40, 11 May 2017 (UTC)Reply

1.28.* change in $wgRunJobsAsync a problem? [SOLVED]

Recently upgraded to 1.28.1 and can no longer get CSV imports to work. The import page seems to work and read the file as it always did, but no pages are added. Would the change in the default of $wgRunJobsAsync to false cause this? Thanks! --DHillBCA (talk) 17:37, 25 May 2017 (UTC)Reply

Could be. Try changing that setting's value in LocalSettings.php? Yaron Koren (talk) 18:14, 25 May 2017 (UTC)Reply
Adding the setting worked. Thanks! --DHillBCA (talk) 14:13, 30 May 2017 (UTC)Reply

Xml import not creating all pages

I'm trying to use this extension to import XML of new pages. I followed the template exactly as what is exported from "View XML" but no matter how many pages in my import XML file, it doesn't import all of them. If, for example, I have 30 pages to import, it may import 15, or if it's 15 it may import 12. Running the exact same XML file again will import the missing pages. Any idea what's going on? I've increased run rate to 50 and also run the runJobs.php from the command line but the remaining pages are still missing until I import the exact same xml file again.

A second question is that I need to import about 7000 pages. Any tips or tricks how to do this efficiently? Is there a rule of thumb for the most pages that should be imported at once? or a limit on the length of a file to import (# lines or KB)?

--Anrake (talk) 01:50, 18 June 2017 (UTC)Reply

Correction: it seems the pages show up eventually, but is there anything I can do to speed the process? runJobs is not creating them and run rate is set much higher.

--Anrake (talk) 04:05, 18 June 2017 (UTC)Reply

It's good to hear that all the jobs are getting run - otherwise it would be a much bigger problem. I confess that I still don't know much about the inner workings of how jobs are run, but: if you're using MW 1.28 or higher, the section above may be helpful. If not, your high job run rate may ironically be making the problem worse, by overloading the system and leading to jobs that do not get run; I would try reducing the rate to 1 and running runJobs.php. Yaron Koren (talk) 02:13, 19 June 2017 (UTC)Reply
Thanks I will give that a try this weekend. I did try fiddling with async before writing originally but that didn't help either. Anrake (talk) 11:42, 22 June 2017 (UTC)Reply
Actually that took care of it. Once I set the run rate to 5 runjobs.php made all the required pages instantly. --Anrake (talk) 13:12, 22 June 2017 (UTC)Reply
Great! And that's good to know. I just added a note about that to the documentation. Yaron Koren (talk) 19:11, 22 June 2017 (UTC)Reply

Speed

When uploading a spreadsheet of around 500 lines it takes several minutes for MW to come back. The request itself (i.e. sending the file to the server) finishes reasonably fast, but for the answer to be generated takes on the order of 5 to 10 minutes minutes. Does anybody have similar experiences? --F.trott (talk) 08:10, 7 July 2017 (UTC)Reply

Multi templates & ImportCSV

Hi Yaron,

I'm testing out the ImportCSV routine of Data Transfer. One thing I noticed is that multi-templates seem to be not supported yet (only last values are stored(. I found a similar post reporting this earlier. Is this something you have found a way around for and if so what would that be? Or would the best way to work around this by following the suggestion posted at: https://www.mediawiki.org/wiki/Extension_talk:Data_Transfer/Archive_2008_to_2013#Multiple_templates_.26_ImportCSV

Thanks, --Albert Ke (talk) 18:13, 7 July 2017 (UTC)Reply

This still isn't supported, unfortunately. That workaround may be the best way to do it. Yaron Koren (talk) 18:23, 7 July 2017 (UTC)Reply

How to have line breaks inside each article

Lets say I want to have a paragraph break or line break within the article. Using <br /> will do the trick but it will show when editing the article.--Spiros71 (talk) 17:13, 8 July 2017 (UTC)Reply

You can just have line breaks, in the CSV or XML... Yaron Koren (talk) 13:41, 9 July 2017 (UTC)Reply

I am not sure whether there is a MW setting for that, but for text imported with templates and Data transfer, when trying a section edit, I only see the template and not the actual text that has been imported in that page. The only way to see the actual text is to use the page edit link. --Spiros71 (talk) 13:41, 16 August 2017 (UTC)Reply

Is this a Data Transfer issue? Or would the same problem exist if the page were created in any other way? I'm not sure I understand the problem. Yaron Koren (talk) 14:38, 16 August 2017 (UTC)Reply
No, it is not, but I thought you may know the answer. Here is an image: https://www.translatum.gr/test/test2.png --Spiros71 (talk) 04:38, 17 August 2017 (UTC)Reply
Alright. I'm guessing that the issue is that the section header is defined in the template itself - meaning that the section can't really be edited on the page. If that's true, the best course of action is to add a "__NOEDITSECTION__" line to the template, so that those "Edit" links don't appear, since they just cause confusion. Yaron Koren (talk) 11:59, 17 August 2017 (UTC)Reply

Hebrew words import problem

Although the import apparently worked and new words are listed in the category, when clicking any of the links, it will show "There is currently no text in this page". Is this some sort of directionality issue? Searching the word is displayed in search results, but when clicking on it, again the same page. --Spiros71 (talk) 05:28, 26 August 2017 (UTC)Reply

That's very strange. What do you see when you go to "action=history" or "action=edit" for those "nonexistent" pages? Do the pages actually have the right text in them? Yaron Koren (talk) 12:44, 27 August 2017 (UTC)Reply
"There is no edit history for this page." and "Creating". It is even stranger, that some terms have been imported OK and display OK. It makes me thing that MW does not like certain Hebrew characters? For example, these entries:
1, 2, 3, 4 work OK.
I would try running the import again - maybe it's just a matter of some import "jobs" not being run. Yaron Koren (talk) 12:09, 28 August 2017 (UTC)Reply
I had already done so (even with the "Overwrite existing content" option). Although other jobs are run just fine, these (if there are Hebrew titles already entered in the past), apart from the initial "imported x successfully" notice, do not even show in the Recent Changes. But, it is not that they did not get imported at the first place, apparently they are somehow only "entered" in the search system (and listed in the relevant Category) and not in actual pages. I even thought of trying UTF-16 but the flavours I tried did not work at all (LE/BE without signature using EmEditor). If I manually create any of these pages (I.e. by following a link from the category listing), text is stored OK. --Spiros71 (talk) 16:00, 28 August 2017 (UTC)Reply

Allow import for all users does not work

Hello,

The instruction $wgGroupPermissions['*']['datatransferimport'] = true;

does not work for me. I still get the message : "You do not have permission to import pages from another wiki, for the following reason:

The action you have requested is limited to users in the group: Administrators."

Mediawiki 1.28.0 Datatransfer 0.6.2

That's unexpected... do you have that line in LocalSettings.php after the inclusion of Data Transfer? Yaron Koren (talk) 16:36, 8 December 2017 (UTC)Reply

PHPExcel - DEPRECATED .. what to do?

The PHPExcl project [0] says that "All users should migrate to its direct successor PhpSpreadsheet, or another alternative." For new wikis wanting to use the "Data Transfer" extension, what should we do? -Rich (revansx)

[0] https://github.com/PHPOffice/PHPExcel

I'm aware that PHPExcel is deprecated, and it would be great if it were updated. Do you know if the current library fails, or if it still works? I haven't tried it in a long time. Yaron Koren (talk) 03:40, 15 March 2018 (UTC)Reply

PHPExcel

Hi Yaron, PHPExcel appears depreciated and they recommend instead PhpSpreadsheet. Is that supported too? Not sure how to either of those on a CentOS/WHM server.--Spiros71 (talk) 10:38, 6 December 2018 (UTC)Reply

Not yet - see the section above this one. Yaron Koren (talk) 19:42, 6 December 2018 (UTC)Reply

Encoding issues

I was testing latest versions of both MW (1.31.1) and the extension, and it seems that there are some issues with the way the encoding is handled. For example:

Hebrew text

  • UTF-8 with or without BOM has this output: "אֲבִי גִּבְעוֹן"
  • UTF-16 LE with signature imports Hebrew page title, not content
  • UTF-16 LE without signature gives: Warning: Invalid argument supplied for foreach() in DataTransfer\specials\DT_ImportCSV.php on line 160
  • UTF-16 BE with or without signature: nothing imported although it says imported.

Greek text

  • UTF-8 with or without BOM has this output: θέλω‎
  • UTF-16 LE with signature works OK.

All test texts and templates can be supplied.--Spiros71 (talk) 20:03, 16 December 2018 (UTC)Reply