Jump to content

Extension talk:External Data/Archive 2017 to 2018

From mediawiki.org
Latest comment: 5 years ago by Yaron Koren in topic SQLite no longer works

JSON error or choice

Thanks for this extension. It works great and easy. But I discovered an error, or may be you could call it a choice.

Reading json from a webservice, I saw that some data from one records ended up in an other record. The records were mixed up. Looking in the source code I figured out that de extension builds an array containing the external data ($external_values). New values (records) are appended at the end of that array. If a value doesn't exists in a record, there wil be an empty space, that will be filled with data from the next record.

The power of Json is that you don't have to send non-existing data. The plugin unfortunately expects a json array in which the records are all of the same size.

So, the important question is, is this an error or a choice ?

In case you want to see what is happening, use the code below in a wiki page with the extension ExternalData. You have to understand a bit of dutch topography in order to see that the City-name is mixed up at the end of the table.

{{#get_web_data:url=https://www.vaarweginformatie.nl/wfswms/dataservice/1.3/geogeneration
    |format=json|data=geogeneration=GeoGeneration}}
{{#get_web_data:url=https://www.vaarweginformatie.nl/wfswms/dataservice/1.3/{{#external_value:geogeneration}}/lock/?count=10&offset=60
    |format=json|data=naam=Name,plaats=City,aantalkolken=NumberOfChambers,opafstandbediend=IsRemoteControlled,id=Id}}
{| class="wikitable"
! Naam
! Plaats
! Aantal kolken
! Op afstand bediend {{#for_external_table:<nowiki/>
{{!}}-
{{!}}<span class='plainlinks'>[{{fullurl:Details_sluis|id={{{id}}}}} {{{naam}}}]</span>
{{!}}{{{plaats}}}
{{!}}{{{aantalkolken}}}
{{!}}{{{opafstandbediend}}}
}}
|}

(the first call is to get the actual code for geogeneration, that is needed in the second call)

Alfred

It's a bug, though one I don't know how to fix. Yaron Koren (talk) 14:24, 8 August 2018 (UTC)

That is a fair answer. I tried to figure it out. You'll need at least the array $mappings in de parseTree function. --Alfredpel (talk) 16:05, 9 August 2018 (UTC)

Maybe the code will solve the problem:

/*
 	alfred's version of parseTree
 	mappings is needed. therefore two extra changes has to be made in de file ED_Utils.php:
 		getDataFromText: 
 			return self::getJSONData( $contents, $prefixLength, $mappings );
 		getJSONData:
 			static function getJSONData( $json, $prefixLength, $mappings="" ) {
 	all key in mapping are provided
 	if not available in json, it gts the value "no data" (empty string would be better but less easy for debugging
 	Something odd happens with the keys in lower or uppercase. therefore alle keys are translated to lowercase
 	Changes work with my data. I am not sure whether of nog it conflicts with other data.
*/
static function parseTree ($tree,&$retrieved_values, $mappings) {
	if (!is_array($tree)) return ;
	$tree = array_change_key_case ( $tree,CASE_LOWER);	
// loop through all values of $tree, if one is an array then only process that value, $hasArray keeps track of that
	$hasArray=false;
	foreach ($tree as $key => $val) {
		if (is_array($val)) {
			$hasArray = true;
			self::parseTree($val,$retrieved_values, $mappings);
		}
	}
//	if no array was found in tree, then map the values on the output array $retrieved_values
	if (!$hasArray) {
		foreach ($mappings as $key => $mapping) {
			if (isset($tree[$mapping])) {
				if (!is_array($tree[$mapping])) {
					$retrieved_values[$mapping][] = $tree[$mapping];
				}
			} else {
// if the key in mappings does not exist, add a "no data" value (make this a empty string after testing)
				$retrieved_values[$mapping][] = "no data";
			}
		}
	}
	return ;
}

--Alfredpel (talk) 20:22, 9 August 2018 (UTC)

Is the usage of SQL group by possible?

  1. Is the usage of SQL "group by" possible?
  2. if yes how?
  3. if no as it was pointed out in Extension_talk:External_Data/Archive_2013#Why_get_db_data_does_not_contain_GROUP_BY.3F where would a CR be posted?

I would consider "group by" extremly useful since it would be possible to create statistics for database content really easily. --WolfgangFahl (talk) 12:11, 10 January 2017 (UTC)

No, it's still not possible, though I can see how it would be useful. What's a CR? Yaron Koren (talk) 14:35, 10 January 2017 (UTC)

Yaron - thank you for getting back to this - a CR ist a change request. I had used group by successfully with my Mediawiki 1.23 wiki and and old version of SMW and the External Data extension 1.8.2. The markup i used was:

{{#get_db_data:
db=vz
|from=(select channel_id,timestamp,round(value)/1000 as wert from data order by timestamp desc limit 200) as d left join entities e on d.channel_id = e.id left join properties p on e.id=p.entity_id and e.type='electric meter' and p.pkey='title' 
|where=d.channel_id in ('1','2','5','7') 
|limit=100
|order by=channel_id
|group by=channel_id
|data=datum=DATE_FORMAT(from_unixtime(timestamp/1000), '%Y-%m-%d %T'),kanal=d.channel_id,wert=max(wert),titel=p.value
}}
{| class="wikitable sortable"
! Kanal
! Datum
! Titel
! Wert{{#for_external_table:<nowiki/>
{{!}}-
{{!}} {{{kanal}}} 
{{!}} {{{datum}}} 
{{!}} {{{titel}}} 
{{!}}  style="text-align:right;font-family:monospace;font-size:125%;" {{!}} {{{wert}}} 
}}
|}

unfortunately this now gives me a regression with invalid SQL.

SELECT DATE_FORMAT(from_unixtime(timestamp/1000), '%Y-%m-%d %T'),d.channel_id,max(wert),p.value FROM `(select channel_id`,`timestamp`,round(value)/1000 as wert from data order by timestamp desc limit 200) as d left join entities e on d.channel_id `e.id left join properties p on e.id=p.entity_id and e.type='electric meter' and p.pkey='title'` WHERE d.channel_id in ('1','2','5','7') GROUP BY channel_id ORDER BY channel_id LIMIT 100 

There seem two be two issues now:

  • the usage of backticks at places where they shouldn't be
  • the stricter interpreteation of the group by clause by mySQL

Going back to version 1.8.2 fixes the problem :-( --Seppl2013 (talk) 15:41, 24 September 2017 (UTC)

Version 1.8.3 completely breaks MySQL JOIN queries. Moved back to version 1.8.2 for now. Hope this gets fixed soon. --Wikimantis (talk) 01:12, 25 May 2018 (UTC)
Are you using "group by", or are you talking about something else? Yaron Koren (talk) 02:39, 25 May 2018 (UTC)

How to pass header value to request?

With some API we need to pass in the headers some value (for example: https://docs.gitlab.com/ee/api/README.html#authentication) I tried

url=--header "PRIVATE-TOKEN: xxxxxxxxxxxxxx" https://myserver/api/v1/list/all
or 
url=https://myserver/api/v1/list/all,array('PRIVATE-TOKEN'=>'xxxxxxxxxxxxxxxx')

But not chance. How can we achieve this? if this is possible, can we have an example of how?

The #get_web_data function only works with RESTful APIs; #get_soap_data might work in this case. Yaron Koren (talk) 15:52, 17 February 2017 (UTC)

Dash character in data

Hello

We found when displaying csv file data with 
a) cells that contain just a dash character, or
b) cells that contain text and start with a dash character
that any attempt to render this data into a table will cause rows to break at that cell.
This is presumably because the parser inserts | for a column break and then the - from the string, resulting in |- which is a special string which means 'new row'.

To resolve this I edited the function GetCSVData in the file ED_Utils.php.
In the foreach loop just before the return statement, I added a check for the dash character which replaces it with a dash wrapped in <nowiki> tag. This resolves the issue.
Note that this issue likely still exists when other parsers like get_db_data are used, since I only resolved it in the GetCSVData function.

Note that I have had to replace the word nowiki with ''n0wiki'' below, because it was impossible to display this code since it was closing nowiki brackets.

<syntaxhighlight lang="php">
		// Now "flip" the data, turning it into a column-by-column
		// array, instead of row-by-row.
		$values = array();
		foreach ( $table as $line ) {
			for ( $i = 0; $i < $num_columns; $i++ ) {
				// This check is needed in case it's an
				// uneven CSV file (see above).
				if ( array_key_exists( $i, $line ) ) {
					$row_val = trim( $line[$i] );
				} else {
					$row_val = '';
				}
								
				//NEW - check for dash values
				if ($row_val == '-')
				{
					$row_val = '<n0wiki>-</n0wiki>';
				}
				else if (substr($row_val,0,1) == '-')
				{
					$row_val = '<n0wiki>-</n0wiki>' . substr($row_val,1);
				}
				//NEW - end check for dash values
				
				if ( $has_header ) {
					$column = strtolower( trim( $header_vals[$i] ) );
				} else {
					// start with an index of 1 instead of 0
					$column = $i + 1;
				}
				if ( array_key_exists( $column, $values ) ) {
					$values[$column][] = $row_val;
				} else {
					$values[$column] = array( $row_val );
				}
			}
		}
		return $values;
	}

</syntaxhighlight>
Great, thanks for this patch! Is that first "if" check necessary? Maybe everything can be handled with just the "if substr()" call? Yaron Koren (talk) 12:19, 14 April 2017 (UTC)

PHP Notice in error logs

Hi I can see theses messages in the apache server error log

PHP Notice:  Second parameter should not be a timeout. [Called from EDUtils::fetchURL in /opt/mediawiki/extensions/ExternalData/ED_Utils.php at line 882]

ExternalData 1.8.3 Mediawiki 1.27.3

Cheers

Ubibene

Daisy Chaining

Hey, is there a way to daisy-chain queries? As far as i can see one can only enter one URL to get data from. My usecase: get some data, create a table with it and then use coordinates from the first query in a secondary query for reverse geocoding (for each individual entry). Is this possible somehow or is further explainantion required?

If I understand the question correctly, you can just cycle through the data twice, by calling #for_external_table twice - the data won't get cleared after the first call. Or am I misunderstanding something? Yaron Koren (talk) 03:20, 31 May 2017 (UTC)
In my specific usecase i'm building a table from an overpass query which returns xml. I then want to geocode every coordinate of that table via nominatim (returns xml too) to a specific country.

Error: Did not find alias for special page 'GetData'.

Hi, i use Version 1.8.3 (6705c56) from 2017-07-17T22:03:53. This Error is already marked as fixed, but i still get these message. see commit on Phabricator/"Add file for special page aliases". My wiki installation is Version 1.29.0 in german language

--TomyLee (talk) 09:28, 28 July 2017 (UTC)

Yes, an "alias" file was created, but no aliases have been defined yet. Where are you seeing that error? I didn't know that languages for which an alias has not been defined produced an error message. Yaron Koren (talk) 15:21, 28 July 2017 (UTC)
I see it only on the page "Special pages", theis message is printet 2 times. In LocalSettings i have set "$wgDevelopmentWarnings = true" for my upgrade process. TomyLee (talk) 06:10, 31 July 2017 (UTC)

detect_encoding() expects parameter 1 to be string (solved)

Hi, if i execute runjobs.php i get lots of notices for this extension. It says:

PHP Warning: mb_detect_encoding() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 520

PHP Warning: utf8_encode() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 523

--TomyLee (talk) 09:55, 28 July 2017 (UTC)

That's very unexpected. If possible, could you add a line above that line that looks like "if (is_object( $dbField ) ) print_r( $dbField );", and let me know what it prints out? Yaron Koren (talk) 15:18, 28 July 2017 (UTC)
i Insert your code and also a "Debug" message that prints without condition. Here is one example output:
2017-07-31 06:19:06 SMW\UpdateJob PAGENAME pm= requestId=c49f317955c8be4610adbbcc (id=262115,timestamp=20170728095100) STARTING
Notice: Undefined index: port in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 65
Notice: Undefined index: UseWindowsAuth in ..\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG 
DateTime Object
(
    [date] => 2011-10-13 00:00:00.000000
    [timezone_type] => 3
    [timezone] => Europe/Berlin
)
Warning: mb_detect_encoding() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 524
Warning: utf8_encode() expects parameter 1 to be string, object given in ...\mw-1.29\extensions\ExternalData\ED_Utils.php on line 527
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG
Notice: Undefined index: port in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 65
Notice: Undefined index: UseWindowsAuth in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66
DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG

The error comes from a Query to Microsoft SQL Server with column datatype "date" with value "2011-10-13" --TomyLee (talk) 06:52, 31 July 2017 (UTC)

I have solved this error by insert a new check bevor the convert line. my ED_Utils.php is now:
$dbField = $row[$i];
//NEW CODE: check for DateTime
if (is_a( $dbField, 'DateTime')) { 
  $dbField = $dbField->format('Y-m-d H:i:s'); :::
} 
//END OF NEW CODE
if ( !function_exists( 'mb_detect_encoding' ) || 
  mb_detect_encoding( $dbField, 'UTF-8', true ) == 'UTF-8' ) {
  $new_row[$column_name] = $dbField;
} else {
  $new_row[$column_name] = utf8_encode( $dbField );
}
Great find! Sorry, I think I missed your previous message here. I just checked in a patch based on this code - hopefully this problem is now solved. Yaron Koren (talk) 15:36, 11 August 2017 (UTC)

Undefined index: UseWindowsAuth and port (solved)

as a third error message if i execute runphp.php i get the notice:

PHP Notice:  Undefined index: port in ...\includes\libs\rdbms\database\DatabaseMssql.php on line 65
PHP Notice:  Undefined index: UseWindowsAuth in ...\mw-1.29\includes\libs\rdbms\database\DatabaseMssql.php on line 66

i know, this is not part of ExternalData but i use Microsoft Server only with External Data.

Maybe you need thes two new(?) parameters for mssql. In "MWLBFactory.php" i found these part of code:

80 elseif ( $server['type'] === 'mssql' ) {
81    $server += [
82         'port' => $mainConfig->get( 'DBport' ),
83         'useWindowsAuth' => $mainConfig->get( 'DBWindowsAuthentication' )
84                ];
This sounds like something recently added to MediaWiki. What's MWLBFactory.php? Maybe I can copy the approach from there. Yaron Koren (talk) 15:22, 28 July 2017 (UTC)
I don't know the file. i search for these variables to see if there is maybe a new config variable for my LocalSettings i missed. But the only appereance i found was in classMWLBFactory
Did you set $wgDBWindowsAuthentication and $wgDBport ? --wargo (talk) 09:05, 31 July 2017 (UTC)
Yes --TomyLee (talk) 09:50, 31 July 2017 (UTC)
I have solved the problem for me. I had to added little bit of code. It seems that the parameters not used by MediaWiki, maybe in a later release? But the warning messages never comes up again:
You have to add these two new Parameters for the connection definition. To get this: In Localsettings add these new variables (DbName ist the Name/Id of your connection):
$edgDBPort['DbName'] = "80";
$edgDBUseWindowsAuth['DbName'] = false;
in ExternalData.php at Line 60 add these two lines:
$edgDBPort = array();
$edgDBUseWindowsAuth = array();
in ED_Utils.php the static function getDBData has to be modified at 3 areas:
Line 180 add
global $edgDBUseWindowsAuth;
global $edgDBPort;
Line 193 add
$db_port = self::getArrayValue( $edgDBPort, $dbID);
$db_useWindowsAuth = self::getArrayValue( $edgDBUseWindowsAuth, $dbID);
Line 244,260 and 273 add to the array of parameters
'port' => $db_port,
'UseWindowsAuth' => $db_useWindowsAuth
--TomyLee (talk) 13:03, 11 August 2017 (UTC)

can't get to xml field from api query call

A query i added to a template looks like this:

{{#get_web_data: url={{SERVER}}{{SCRIPTPATH}}/api.php?action=query&list=categorymembers&cmtitle=Category:{{{OPfamily|}}}&format=xml
  | format=XML
  | data=title=title}}
{{#display_external_table:template=catList
  |data=title=title}}

on the final page though this returns the error: Undefined Index title

running the query in the sandbox returns this xml though with the fields title.

<api batchcomplete="">
<continue cmcontinue="page|53454c454354204d4154|30" continue="-||"/>
<query>
<categorymembers>
<cm pageid="21" ns="0" title="Constant MAT"/>
<cm pageid="47" ns="0" title="ConstantMAT Class"/>
<cm pageid="22" ns="0" title="Depth MAT"/>
<cm pageid="23" ns="0" title="GLSL MAT"/>
<cm pageid="24" ns="0" title="In MAT"/>
<cm pageid="25" ns="0" title="Null MAT"/>
<cm pageid="26" ns="0" title="Out MAT"/>
<cm pageid="27" ns="0" title="PBR MAT"/>
<cm pageid="28" ns="0" title="Phong MAT"/>
<cm pageid="29" ns="0" title="Pointsprite MAT"/>
</categorymembers>
</query>
</api>

What am I missing?

Thanks Markus

That's strange - I was able to get essentially the same thing working, though with a hardcoded category name. Does this happen with all pages that call this template? Could you try calling it with a category whose name doesn't contain spaces or non-ASCII letters? Yaron Koren (talk) 01:12, 1 August 2017 (UTC)
Thanks Yaron! I should have tried this... Using a hardcoded Category name makes it work.

Addressing an Array

Is it possible to pull information out of an array? I need to pull information out of the statuses array.

[
 {
  "id": "id",
  "serviceName": "name",
  "statuses": [
    {
    "id": 0,
    "name": "network",
    "severity": 5,
    "severityDescription": "Severity Description"
    },
    {
     "id": 1,
     "name": "network",
     "severity": 2,
     "severityDescription": "Severity Description"
   }
  ],
 }
]

Is is something like?

{{#get_web_data:url=https://api.com|format=JSON|data=name=serviceName,status=statuses[0].name}}

I understand that this isn't in any way scalable, but the vast majority of the time the array does not contain more than one member, and at this stage I am just trying to learn about ExternalData's capabilities.

Unfortunately, no - if this were stored in XML, you could do it with the "use xpath" parameter, but there's no such parameter for JSON. (There is a JSONpath technology, but it's poorly supported, from what I understand, and it's not supported by External Data.) Yaron Koren (talk) 01:10, 4 September 2017 (UTC)
Thanks, good to know

JSON GET header

Is it possible to send a JSON GET header with ExternalData?

You mean, have an API query string composed of JSON? I don't see why not... Yaron Koren (talk) 17:12, 13 September 2017 (UTC)
Sorry, I am rather new to JSON. The GET query I have in Postman requires the header section with a key value pair. I am not sure how define a header using ExternalData. Does that make sense?
Can you define this query as a single URL, with all the JSON in the query string? If so, you can just pass that in to #get_web_data. Yaron Koren (talk) 13:33, 14 September 2017 (UTC)

Accessing source files

Often enough I cannot access the XML directly but need to open the source file through the browser. In browsers such as Chrome and Firefox, the URL is typically prefixed by "view-source:". Is there any way we can make External Data query the source code? I did some test runs with the prefix but all I got so far is "Error: No contents found at URL view-source:http:..". Cavila 07:25, 5 November 2017 (UTC)

I don't understand - when you do a "view source", you're getting the same content, just displayed in a different way. How would that apply with External Data? Yaron Koren (talk) 01:17, 6 November 2017 (UTC)
Not necessarily. Just because ".xml" is in the URL doesn't necessarily mean that the web page gives you raw, queryable XML. Or you'll get a parse error because any XSL referred to is missing. Cavila 18:57, 6 November 2017 (UTC)
Alright, but, as far as I know, you'll always get the same content when using "view source". Or do you know of a case where doing "view source" gives true XML, but External Data's retrieval does not? Yaron Koren (talk) 20:09, 6 November 2017 (UTC)
Sure, this comes to mind. Cavila 20:33, 6 November 2017 (UTC)

That's a custom browser display, but #get_web_data seems to retrieve the right thing there - the following call works for me:

{{#get_web_data:url=http://tarltonproject.org/TarltonJest_TEI1/content/Tarlton-Cittie_basic.xml|format=xml|data=title=title}}
{{#external_value:title}}

Yaron Koren (talk) 14:41, 7 November 2017 (UTC)

#for_external_table requires "nowiki"

My 1.30 Mediawiki didn't recognize the example provided for #for_external_table. The result was one "-" per each line of the file.

The solution was to change:

! Area {{#for_external_table:

for:

! Area {{#for_external_table:<nowiki/>

And then it worked. Spent some hours debugging it. Roger Krolow (talk) 14:37, 29 December 2017 (UTC)

Thanks for pointing that out, and sorry for the long delay - and the initial problem. I just fixed this in the documentation. Yaron Koren (talk) 03:02, 18 January 2018 (UTC)

Don't know why. Flow bug? Maybe because topic above uses:

nowiki><nowiki/></nowiki
Yeah... I just fixed it. Yaron Koren (talk) 12:33, 23 May 2018 (UTC)

Hello. i'm using mediawiki 1.30.0. Have a test mysql database set on oracle virtualbox ubuntu 16.04. I'm creating a documentation system for a company that would fetch data on a template from their own mysql database by name (or number) of client company (this also would be the pagename). Idea is to create a template that fetches data from the database by the pagename (as the client amount might increase, automation is the goal) The code below does not work but to get the idea...

{{#get_db_data:
|db=test
|from={{BASEPAGENAME}}
|data=Id=id,Company=company,Device=device,Ip4_address=ip4_adress,User=user,Password=password,Install_date=install_date
}}

Problem is that mysql can't have variables as a search value (and I think the way commands are processed the BASEPAGENAME template wont even open inside the #get_db_data) So i'm forking the Extension/Externaldata/ED_parserfunctions.php with something like this...

if ( array_key_exists( 'from', $args ) ) {
			if ( $args['from'] == 'word' ) {
				$args['from'] = -something?-
				$from = $args['from'];
			}
			else {
			$from = $args['from'];
			}

Any ideas, solutions, code locations to fork more, are all welcome!

Actually, "BASEPAGENAME" in the above example will get parsed correctly - I just tried it myself to be sure. And more generally, nested parser functions, templates, variables etc. always get parsed from the innermost one out, as far as I know. Could it be that there's just some issue with your #get_db_data call? Yaron Koren (talk) 12:27, 18 April 2018 (UTC)
It worked! i went back to trying the "BASEPAGENAME" template call and found out that i had a problem with mysql tables being written in all lower case letters. As the page comes with the first letter being capital, i just had to redo my tables for this to work. Thanks for the explanation, it saved me a ton of time ^^ best regards:OP, to be :nappula3 8:31 19 April 2018 (UTC)

Form to Create Record?

Is there an extension or native way to create/edit a record in a db using a form?

This doesn't sound like an External Data extension question, but the Cargo and Semantic MediaWiki extensions both let you do that. Yaron Koren (talk) 13:31, 11 May 2018 (UTC)
Thx!

Johnywhy (talk) 14:42, 23 May 2018 (UTC)

Use Logged-In User Credentials?

When i execute the following, data returned looks like this extension is NOT running with logged-in user credentials:

{{#get_web_data: 
url={{SERVER}}/api.php?action=query&meta=tokens&format=json
|format=JSON
|data=csrftoken=csrftoken
}}

That url gives, for example: {"batchcomplete":"","query":{"tokens":{"csrftoken":"992b1efd8912345dcec879106eb64b615b054636+\\"}}}

But

{{#external_value:csrftoken}}

returns +\, which is the token for non-logged-in users.

"For unregistered users, the token is always +\" https://www.mediawiki.org/wiki/API:Edit#Unregistered_users

$edgAllowSSL = false; had no effect.

Is there something wrong with my syntax?

Do i need to use https://www.mediawiki.org/wiki/Extension:NetworkAuth?

Johnywhy (talk) 10:48, 23 May 2018 (UTC)

I don't fully know the mechanics of tokens, but I would think it's a very good thing that External Data doesn't let you use your own MediaWiki token - otherwise, a malicious user could potentially put calls in some page to get admins who visit that page to unwittingly do all sorts of bad things. Yaron Koren (talk) 13:02, 23 May 2018 (UTC)
Only if the malicious user has access to the page containing the call. But, that can't happen if the page is protected. Seems this is not the only potential security risk that is completely under control of the admins. An admin could also post their password on a public page, and then a malicious user could gain admin access-- admins ought know not to do that. To troubleshoot this issue, can you plz comment whether my syntax above is correct? -thx Johnywhy (talk) 14:48, 23 May 2018 (UTC)
Okay, but what if the malicious user puts the call on an unprotected page? Or on every unprotected page? Yaron Koren (talk) 15:09, 23 May 2018 (UTC)
Plz see title of this topic: "Logged-In User Credentials"
Why aren't i receiving alerts about your comments here?

If extension respects logged in user credentials (as it should), then there's no risk. Johnywhy (talk) 11:36, 25 May 2018 (UTC)

On the contrary, I see tremendous risk: (1) malicious user goes to page "ABC", puts in a call to #get_web_data that calls some harmful MediaWiki API action which uses the logged-in user's credentials; (2) user with admin rights goes to page "ABC", the page calls that API URL and is successful because that user has the correct privileges. Do you see what I'm saying? Yaron Koren (talk) 13:33, 25 May 2018 (UTC)
Ok, i can see your point there. So, does that mean that all actions performed by this extension are done with anonymous user-rights? Thx
I suppose you could describe it that way, yes. Yaron Koren (talk) 23:47, 25 May 2018 (UTC)

From this page, it sounds like it IS possible to log into the API as a specific user. Correct? However, it sounds like it requires javascript, right? - Johnywhy (talk) 19:22, 2 July 2018 (UTC)

Yes, it's possible. I don't know if it would require JavaScript. Yaron Koren (talk) 22:31, 2 July 2018 (UTC)

Request: Adding Records

Awesome extension! Would also be awesome if the extension could add/edit records. Maybe could use built-in MW edit-token, from the API.

I'm not sure what this means, but External Data can be used in conjunction with either Cargo or Semantic MediaWiki to store the information it retrieves. Yaron Koren (talk) 18:33, 27 May 2018 (UTC)
Thx for reply. It's my understanding that Cargo can only do a store-once action for a page-- it can't add more records without overwriting the existing records for that page. https://www.mediawiki.org/wiki/Extension_talk:Cargo#How_to_Remove_Built-In_Fields?
Johnywhy (talk) 00:34, 2 June 2018 (UTC)
That's right. Yaron Koren (talk) 16:41, 3 June 2018 (UTC)

1.31 API Issue: #get_web_data Suddenly Failing

This code was returning correct section number of a named section


{{#get_web_data: 
url=https://gunretort.xyz/api.php?format=json&action=parse&page=Portal:TagDescriptions&prop=sections
|format=JSON
|data=SecID=index
|filters=line=CivilRights
}}

{{#external_value:SecID}}

Now (on MW 1.31) returning:

Error: no local variable "SecID" was set.

The url alone returns correct json: https://gunretort.xyz/api.php?format=json&action=parse&page=Portal:TagDescriptions&prop=sections returns

{"parse":{"title":"Portal:TagDescriptions","pageid":433,"sections":[{"toclevel":1,"level":"2","line":"Terrorism","number":"1","index":"1","fromtitle":"Portal:TagDescriptions","byteoffset":0,"anchor":"Terrorism"},{"toclevel":1,"level":"2","line":"SecondAmendment","number":"2","index":"2","fromtitle":"Portal:TagDescriptions","byteoffset":747,"anchor":"SecondAmendment"},{"toclevel":1,"level":"2","line":"Timeliness","number":"3","index":"3","fromtitle":"Portal:TagDescriptions","byteoffset":1275,"anchor":"Timeliness"},{"toclevel":1,"level":"2","line":"Money","number":"4","index":"4","fromtitle":"Portal:TagDescriptions","byteoffset":2317,"anchor":"Money"},{"toclevel":1,"level":"2","line":"CivilRights","number":"5","index":"5","fromtitle":"Portal:TagDescriptions","byteoffset":3038,"anchor":"CivilRights"},{"toclevel":1,"level":"2","line":"CommonGround","number":"6","index":"6","fromtitle":"Portal:TagDescriptions","byteoffset":3800,"anchor":"CommonGround"}]}}

Update: Exact same query works with a static json file (instead of API call). Json file contains same json as returned by API:

{{#get_web_data: 
url=https://gunretort.xyz/sections.json
|format=JSON
|data=SecID=index
|filters=line=CivilRights
}}

{{#external_value:SecID}}

Something about the API has changed. https://www.mediawiki.org/wiki/Topic:Ug391jaupfj0mvnm

- Johnywhy (talk) 15:26, 1 July 2018 (UTC)

I think the issue is that the API now requires the user to be logged in. When I go to that API URL, I get a "You need read permission" error - and I assume the External Data code is getting the same thing. Yaron Koren (talk) 01:19, 2 July 2018 (UTC)

That sounds related. I found a way to cause the problem: Disabling anonymous reads with:

$wgGroupPermissions['*']['read'] = false;

This happens while logged in as admin. Disabling anonymous read when logged in as admin shouldn't break anything. - Johnywhy (talk) 17:59, 2 July 2018 (UTC)

You may be logged in, but the External Data code accessing the API is not. Yaron Koren (talk) 18:54, 2 July 2018 (UTC)
"the API now requires the user to be logged in"

Since 1.31? @Anomie says the API didn't change. https://phabricator.wikimedia.org/T198577#4391434 - Johnywhy (talk) 19:26, 2 July 2018 (UTC)

Right, it's not due to a change in MediaWiki - I assume you changed the permissions on the wiki. Yaron Koren (talk) 22:33, 2 July 2018 (UTC)

Updating to 1.31 no longer access database tables

Hi, I have some tables in a MYSQL database which I can access perfectly fine under 1.30 with this extension but as soon as I update to 1.31 I can no longer access the database.

The error I get is as shown below:

Error: 1146 Table 'mraths48_meta. Metadata' doesn't exist (localhost)

Taken from a debug log file.

If I look in phpMyAdmin everything looks fine the tables are there and contain all the data.

Let me know how I can help to resolve this.

Regards

RogerA

Hi - that's unexpected. I don't know if this has to do with any change to the MediaWiki code in 1.31, or if this is just a result of something that happened during the upgrade process, or maybe it's just a coincidence that it happened at the same time as the upgrade... this page lists some things you can try when that error happens, like restarting MySQL. I would recommend trying one or more of those. Yaron Koren (talk) 11:12, 9 July 2018 (UTC)

Hi, I can access the tables from phpMyadmin and also form the MYSQL command line if I SSH onto the server so I am pretty sure the tables are all OK. I used the same settings that are in the Localsettings.php to access via MYSQL command line so they are all OK also. I also know that if I restore 1.30 from a backup everything will be fine. Is there a way to check if the connection to the database is being made? It is unlikely to be a MYSQL problem as I am using a shared server. I am happy to work through debugging steps if you let me know how to proceed?

It is as you say possible that some coincidence has occurred as a result of the update but if I so I need to track it down and I am not sure how to do that. Regards

RogerA

Ah - if changing back to MW 1.30 lets you access the table again, then I was wrong - it is something in the code. I don't know what, though. Is "Metadata" some kind of system table, or is it just a regular table with a clever name? Yaron Koren (talk) 14:07, 9 July 2018 (UTC)

Hi,

Metadata is nothing special and contains information about images e.g. dates, subject, copyright, photographer. It has 13 columns and 140000 rows and runs under InnoDB size 1.5MiB. I can also confirm that if I restore this table to an earlier version before the update to 1.31 I still get the same error so to me it seems unlikely that its the table. Let me know how I can help to debug.

Regards

RogerA

NB The database that contains the table Metadata is also on Localhost in case it is important

Hi, Just checking the error log for this exception and I can see a leading space in front of Metadata which might be relevant:


Query: SELECT UID,Origin_Date,Origin_Ref,Type,Image_Author,Subject,Copyright FROM ` Metadata` WHERE UID="002" Function: EDUtils::searchDB Error: 1146 Table 'mraths48_meta. Metadata' doesn't exist (localhost)


There is a leading space reported in the error. I can confirm this is not present in my call which simply has |from=Meatdata.

Maybe this is relevant.

Regards

RogerA

Hi, I tracked this down to line 491 in Ed_Utils.php where a space was being added to the table name. Once I removed this it worked fine. I also noticed that the latest version available for download which has the same version number 1.8.3 has this error corrected.

All resolved now.

Regards

RogerA

That's great! I'm very glad you were able to solve the problem, because I had no idea. And it's also good to hear that there's no bug in the current code. It's another reminder that a release of a new version is well-overdue - I hope to do that this week. Yaron Koren (talk) 15:51, 15 July 2018 (UTC)

SQLite no longer works

$wgSQLiteDataDir was removed from includes/db/DatabaseSqlite.php 4 years ago https://github.com/wikimedia/mediawiki/commit/46cf4f1e3010010e531340b99cbd59469529be0f and ExternalData Database::factory() never sets a file path or directory in constructor instead.

Actually, it looks like support for that global variable was removed two years ago - which is still a long time. Sorry about the problem, and thanks for diagnosing it. I just checked in what I hope is a fix, using the constructor parameter - I can't test it myself, unfortunately. If you get the latest code and try it out, please let me know if it works for you. Yaron Koren (talk) 02:42, 3 September 2018 (UTC)
Yes, works now Yaron Koren thanks.
Great! Yaron Koren (talk) 00:05, 6 September 2018 (UTC)
Haha, I just encountered this error since the current stable version doesn't include the fix. EliteMasterEric
Yes, I need to release a new version of External Data - it's on my to-do list. Yaron Koren (talk) 21:17, 11 January 2019 (UTC)

no offset clause?

There doesn't appear to be an offset parameter. I tried passing 'offset, count' yo the limit parameter but I think it's expecting just one number for limit. Have I missed a trick to sort out pagination? JamesDriscoll (talk) 16:00, 2 October 2018 (UTC)

join type

I think Cargo is doing LEFT OUTER JOINs on tables and External Data is doing JOIN, which will result in different results for the same query on the two systems. Is there a way to specify the join types between the tables? Many thanks JamesDriscoll (talk) 16:06, 2 October 2018 (UTC)

Are you querying Cargo data using External Data? Because if so, it would probably easier - and lead to more consistent results - to use Special:CargoExport to get the results, i.e. use #get_web_data instead of #get_db_data. Yaron Koren (talk) 20:43, 3 October 2018 (UTC)
I'm sometimes joining my tables to cargo tables. I was fudging it by putting some of my tables into the cargo world. I definitely need to specify join types though. (And would need offset in the query as I'll have to paginate the results). I was hoping this extension would allow me to combine both worlds. At first glance I suspect the the getwebdata/fullurl parse isn't going to be quick enough. JamesDriscoll (talk) 21:20, 3 October 2018 (UTC)

Handle JSON with nested duplicate key names?

I've a situation where the same key appears both in the outer dictionary and inner dictionaries. Following the examples on the page, I cannot determine how to only retrieve the outer key. Specifically, in the following example, the key name is problematic, appearing at the outermost level, and then as a key in both "venue": { ... } and "group": { ... } dictionaries.

{{#get_web_data:
url=https://api.meetup.com/hac-dc/events?scroll=next_upcoming&page=10
|format=JSON
|data=local_date=local_date,local_time=local_time,name=name,link=link,duration=duration
|cache seconds=0
}} 

{| class="wikitable"
! Date
! Time
! Event {{#for_external_table:<nowiki/>
{{!}}-
{{!}} {{{local_date}}}
{{!}} {{{local_time}}}
{{!}} [{{{link}}} {{{name}}}]
}}
|}

Is there a way to remove the inner keys. (In my case, getting rid of venue and group in their entirety would work, but that's very specific to my needs, and not a general solution. On the other hand, if there is no good general solution, well... I'll take what I can get.) --KevinCole (talk) 15:36, 11 October 2018 (UTC)

I don't think so, unfortunately - for that, I think you would need to use JSONPath (the equivalent of XML's XPath), and External Data still doesn't support it. The only solution I can think of is to create an online script (like a PHP file) that goes to that URL, parses its contents, retrieves the relevant data, and then displays it in some format like CSV (or a much simpler JSON) - and then have #get_web_data access that, instead of the original URL. Yaron Koren (talk) 19:49, 11 October 2018 (UTC)

Big Data

A question to users: how big are the largest datasets that you have managed to process with ExternalData and Semantic MediaWiki on one page?
Alex Mashin (talk) 03:23, 12 November 2018 (UTC)

Use variable and pass to template in display_external_table

I would like to use a variable that does not come from the data and pass it to the template provided by display_external_table.

I tried {{#display_external_table:template=MyTemplate|data=foo=bar,foo2={{{somevariable}}} }} but unfortunately the content of {{{somevariable}}} is not being passed to MyTemplate.

Any suggestions?

The only way I can think to do that is to use the Variables extension - set a variable in the outer template for that template parameter using #vardefine, and then hopefully it will be accessible in the inner template via #var. I'm not sure if that will work, but it's worth a try. Yaron Koren (talk) 13:35, 19 November 2018 (UTC)

Dynamic categories

Situation

{{#get_web_data:url=http://example.com/names.txt|format=csv|data=name=1}}
{{#display_external_table:template=AddCat|data=1=name }}

The file names.txt simply contains names, one per row.

The template AddCat simply adds the first parameter as category: [[Category:{{{1}}}]]

Problem

The page I use this template on actually shows the name-categories on its bottom but the page itself is not visible on the category-page (I ran the jobs of course).

I assume that this has something to do with the fact that the category-name is not present in the wiki-text but is fetched from an external source.

Any suggestions how I can really add the categories?

That's strange... it seems like that should work. Did you try calling #for_external_table instead, without the use of a template? I don't know if that would work any better, but it's worth trying. Yaron Koren (talk) 04:17, 10 December 2018 (UTC)
Yes, I tried that but no luck. :-( --Dktue (talk) 08:23, 11 December 2018 (UTC)

Problem with empty fields in XML import

I'm trying to import API-data in XML-format and display it in an wikitable (using {{#for_external_table:..}} ), but it seems the rows get "out of sync" because some XML-fields do not contain any data.

The result is that columns in the table are filled with data from the top downwards, skipping the empty fields and leaving empty cells at the bottom of the column.

Is there a way to just display an empty cell in the wikitable for each empty field in the XML? Bmulckhu (talk) 19:34, 20 December 2018 (UTC)

Do those empty fields have XML tags for them, or is the tag itself missing as well? Yaron Koren (talk) 05:20, 21 December 2018 (UTC)
Well, yes; not an opening tag and a closing tag, but they do have a single tag: <tag/>. Bmulckhu (talk) 10:36, 21 December 2018 (UTC)