Jump to content

Extension:Cargo/Querying data

From mediawiki.org

Querying of data within Cargo can be done in a number of ways:

  • Via two parser functions, #cargo_query and #cargo_compound_query
  • At the page Special:ViewData, which provides a simple interface that lets you set the parameters for a query
  • Via a query written in Lua, using the Scribunto extension
  • Simply by SQL calls, if you have direct access to the Cargo database.

The #cargo_query and #cargo_compound_query functions are explained on this page; #cargo_compound_query function essentially calls two or more queries at the same time, then displays their results together. The interface at Special:ViewTable provides the same parameters that #cargo_query does.

For help querying Cargo data within Lua/Scribunto, see Other features#Lua support.

There is no documentation here for how to do direct SQL querying of Cargo data; hopefully it is self-explanatory.

#cargo_query

[edit]

The #cargo_query function is essentially a wrapper around SQL, with a few modifications. It is called with the following syntax:

{{#cargo_query:
tables=table1=tableAlias1, table2=tablesAlias2, etc.
|join on=table1.fieldA = table2.fieldB,table2.fieldC=table3.fieldD, etc.
|fields=field1=fieldAlias1,field2=Alias2, etc.
|where=table1.fieldE='some value' AND/OR etc.
|group by=table1.fieldG
|having=table1.fieldG='some value', etc.
|order by=table2.fieldF, etc.
|limit=some number
|offset=some number
|intro=some text
|outro=some text
|default=some text
|more results text=some text
|no html
|max display chars=some number
|format=format
...additional format-based parameters
}}

The first eight parameters should look fairly familiar to anyone experienced with SQL SELECT queries:

  • tables= - the set of Cargo tables to be queried; "table=" can be used instead. Corresponds to the SQL FROM clause (with the optional "alias" values corresponding to AS clauses).
  • join on= - specifies how to connect each pair of tables if there is more than one table. Corresponds to the SQL LEFT OUTER JOIN...ON clauses, but with clauses separated by commas instead of "AND".
  • fields= - the field or set of fields to be displayed; the default value is "_pageName". Corresponds to the SQL SELECT clause (with the optional "alias" values corresponding to AS clauses).
  • where= -the conditions for this query, if any; can include operators like AND, OR and NOT. Corresponds to the SQL WHERE clause.
  • group by= - one or more fields whose values should be grouped together in one row each. Corresponds to the SQL GROUP BY clause. Note that DISTINCT does not work in Cargo (except within a function like COUNT()), so in most cases you must use "group by" to eliminate duplicates.
  • having= - similar to the "where" parameter, but applies to values computed for "groups". Corresponds to the SQL HAVING clause.
  • order by= - one or more fields by which the result set should be sorted; the default value is "_pageName ASC", which does an ascending sort on page name. Setting it to "_rowID" should sort rows by the same order they were entered. Corresponds to the SQL ORDER BY clause.
  • limit= - the maximum number of rows to display. Corresponds to the SQL LIMIT clause.
  • offset= - the number of initial rows to skip. Corresponds to the SQL OFFSET clause.

The next three parameters are conditional, being applied depending on whether or not there are results:

  • intro= - sets text that goes right before the query results; applied only if there are results.
  • outro= - sets text that goes right after the query results; applied only if there are results.
  • default= - sets text that goes in place of the query results, if there are no results. The default value is "No results", in the language of the user. To not have any text appear, just set "default=".

The last set of parameters is:

  • more results text= - sets text that goes after the query display, to link to additional results. The default value is "More...", in the language of the user. To not have any text appear (suppress the link altogether), just set "more results text=".
  • no html - specifies that this query does not include any HTML in its results.
This is important for embedded queries (a #cargo_query call contained within another one, or within a different parser function). Such embedded calls often fail due to parsing. For example, if your query results show 'UNIQ-item1-QINU' instead of the expected values, placing 'no html' within the inner queries tends to fix the problem.
  • max display chars= - applies only to fields of type "Text" and "Wikitext". Sets the maximum number of characters to display for any field value; if a value is larger than this number of characters, it is replaced with a link reading "Show" that, when clicked, displays the entire text.
  • format= - sets the format of the display (see Display formats).
  • There can also be additional allowed parameters, specific to the chosen display format. Again, see Display formats for the possible values of both the "format" parameter and these additional parameters.

Of all these parameters, "tables=" is the only required one; although if "tables=" holds multiple tables, "join on=" is required as well.

Querying Boolean fields

[edit]

When querying fields of Boolean type, the values stored will be either 1, 0, or NULL. To write a where condition to ensure a value is true, write FieldName="1". To ensure a value is both defined and false, write FieldName="0". In the case that false and undefined (NULL) should be treated the same, write (FieldName="0" OR FieldName IS NULL).

The values of "1" and "0" hold despite the fact that "Yes" and "No" are displayed in the Special:CargoTables interface; values at that interface are interpreted to be more human readable, but the stored values are still "1" and "0".

Examples

[edit]

The following query gets all the cities on the current (fictional) wiki and their populations, and displays them in a table:

{{#cargo_query:
tables=Cities
|fields=_pageName=City,Population
|format=table
}}

The following query gets only cities in Asia, and displays the city name, the country name and the population, all in a dynamic, JavaScript-based table:

{{#cargo_query:
tables=Cities,Countries
|join on=Cities.Country=Countries._pageName
|fields=Cities._pageName=City,Countries._pageName=Country,Cities.Population
|where=Countries.Continent="Asia"
|format=dynamic table
}}

The following query gets all countries in the wiki, and the number of cities in each one that have a page in the wiki, for each country that has more than two cities; it then displays that set of numbers with a bar chart, with the highest number of cities at the beginning:

{{#cargo_query:
tables=Cities,Countries
|join on=Cities.Country=Countries._pageName
|fields=Countries._pageName=Country,COUNT(*)
|group by=Countries._pageName
|having=COUNT(*) > 2
|order by=COUNT(*) DESC
|format=bar chart
}}

#cargo_compound_query

[edit]

You may want to have the results of more than one query within the same display; this is possible using the #cargo_compound_query function. The main usage for such "compound queries" is to display multiple sets of points on a map or calendar, with a different icon (in maps) or a different color (in calendars) for each set; though it's also possible to use it to show multiple result sets within simpler formats like tables and lists.

#cargo_compound_query is called by passing in the sets of parameters of one or more calls to #cargo_query, with the overall sets separated by pipes, and the internal parameters now separated by semicolons.

Here is an example call, that would show two groups of pages in a map, with a different icon for each:

{{#cargo_compound_query:
tables=Restaurants;where=City='Lisbon';fields=_pageName,Address,Rating,Coords;icon=Fork.jpg
|tables=Bars;where=City='Lisbon';fields=_pageName,Address,Hours,Coords;icon=Martini.png
|format=openlayers
}}

Essentially, each subquery functions as its own "parameter". These subqueries have their own sub-parameters that are a subset of the allowed parameters of #cargo_query: only the parameters directly related to querying - "tables", "join on", etc. - are allowed, plus two more parameters related to display in calendars and maps: "color" and "icon".

Besides the subqueries, the only allowed other parameters for #cargo_compound_query are "format", plus whatever additional parameters there are for the chosen format.

Display formats

[edit]

Data returned from a Cargo query can be displayed in many different ways, including lists, tables, charts, calendars, maps and other visualizations. See Display formats for a full listing.

The "HOLDS" command

[edit]

SQL's own support for fields that contain lists/arrays is unfortunately rather poor. For this reason, #cargo_declare creates an additional, helper table for each field that holds a list of values. Additionally, #cargo_query supports a custom, SQL-like command, "HOLDS", within the "join on=" and "where=" parameters, that makes querying on such data easier. Instead of having to manually include the helper table in each such call, you can use "HOLDS" to simplify the syntax; it is what is known as "syntactic sugar" for the true, more complex, SQL.

You can use "HOLDS" within the "where=" parameter to find all rows whose list field contains a certain value. To use our earlier example, if we have a table called "Books" that contains a field, "Authors", holding a list of authors, we can use the following #cargo_query call to get all books written or co-written by Leo Tolstoy:

{{#cargo_query:
tables=Books
|fields=_pageName=Book,Authors
|where=Authors HOLDS 'Leo Tolstoy'
}}

Internally, the implementation of "HOLDS" in "where=" uses SQL subqueries (i.e., SELECTs within the main SELECT). This cannot be directly done in #cargo_query. But you can get a similar result with a #cargo_query call that looks like this:

{{#cargo_query:
tables=Books,Books__Authors
|join on=Books._ID=Books__Authors._rowID
|fields=_pageName=Book,Books.Authors__full=Authors
|where=Books__Authors._value = 'Leo Tolstoy'
}}

Similarly, you can use "HOLDS" within "join on=" to join two tables together based on values within list fields. For example, if information about authors is stored within its own database table, "Authors", and you wanted to display a table of books, their authors, and those authors' dates of birth, you could have the following call:

{{#cargo_query:
tables=Books,Authors
|join on=Books.Authors HOLDS Authors._pageName
|fields=Books._pageName,Books.Authors,Authors.Date_of_birth
}}

This is directly equivalent (unlike with HOLDS in "where=") to the explicit call of:

{{#cargo_query:
tables=Books,Books__Authors,Authors
|join on=Books._ID=Books__Authors._rowID,Books__Authors._value=Authors._pageName
|fields=Books._pageName,Books__Authors._value,Authors.Date_of_birth
}}

To exclude empty rows where there's no matching HOLDS value, a |where= clause such as the following can be used:

{{#cargo_query:
tables=Books
|fields=Books._pageName,Books__Authors._value,Authors.Date_of_birth
|where=Books__Authors._value IS NOT NULL
}}

"HOLDS LIKE"

[edit]

There is an additional command, "HOLDS LIKE", that maps the SQL "LIKE" command onto all of a list of values; it works just like "HOLDS". For instance, to get all books written or co-written by anyone with "Leo" in their name, you could call:

{{#cargo_query:
tables=Books
|fields=_pageName=Book,Authors
|where=Authors HOLDS LIKE '%Leo%'
}}

The "WITHIN" command

[edit]

The "WITHIN" command can be applied to hierarchy fields which hold a single value. This command let you match against not just a value, but all of its children values, as defined in the hierarchy. The "WITHIN" command will result in an error for hierarchy values that contain an apostrophe .

For example, let’s say there is a table called "Recipes", and among its fields is one called "Main_ingredient", which is defined as a hierarchy. You can do the following query to look for recipes having a main ingredient that is a root vegetable:

{{#cargo_query:
tables=Recipes
|fields=_pageName=Recipe,Main_ingredient
|where=Main_ingredient WITHIN 'Root vegetables'
}}

"HOLDS WITHIN"

[edit]

The "HOLDS WITHIN" command can be applied to hierarchy fields which hold a list of values. As you might imagine from the name, it is a combination of "HOLDS" and "WITHIN" - it finds all entries that have at least one match for a value and/or one of its children values.

For example, let’s say there is a table called "Movies", which contains a field called "Genres", which is defined as a hierarchy. To get all the movies that are defined as being, at least in part, a "Drama" or one of its sub-genres, you could run the following query:

{{#cargo_query:
tables=Movies
|fields=_pageName=Movie
|where=Genres HOLDS WITHIN 'Drama'
}}

The "NEAR" command

[edit]

Like arrays, coordinates are not well-supported overall by relational databases. For that reason, similarly to arrays, coordinates have special handling for both storage and querying. For the case of coordinates, if you want to query on coordinates, the recommended approach is to use the "NEAR" command, which like "HOLDS" is a virtual command, defined by Cargo.

"NEAR" finds all the points near a specified set of coordinates, within a specified distance. The coordinates and distance must be placed in parentheses, separated by commas; and the distance must be in either kilometers (specified as "kilometers" or "km") or miles (specified as "miles" or "mi").

For instance, if there is a table called "Restaurants", holding a list of restaurants, and it contains a field called "Coords" holding the coordinates of each field, you could call the following query to display all restaurants (and some relevant information about them) within 10 kilometers of the Piazza San Marco in Italy:

{{#cargo_query:
tables=Restaurants
|fields=_pageName=Restaurant,Address,Rating,Coords
|where=Coords NEAR (45.434, 12.338, 10 km)
}}

The "MATCHES" command

[edit]

For fields of type "Searchtext", you can do a standard text search on their contents using the "MATCHES" command. This is equivalent to a MATCH ... AGAINST call in MySQL; this command is currently available only for MySQL.

For most wikis, the only field of type "Searchtext" will be the automatically-generated field _pageData._fullText (see Storing page data). A simple query, to get all the pages in the wiki containing the word "meeting", might look like:

{{#cargo_query:
table=_pageData
|fields=_pageName=Page,_fullText=Search results
|where=_fullText MATCHES 'meeting'
}}

You can see more complex demonstrations of the MATCHES functionality here.

Using SQL functions

[edit]

You can include native functions from whichever database system you're using within #cargo_query, in the "fields", "join on" and "where" parameters. For the sake of security, the default set of allowed SQL functions is defined in a global variable, $wgCargoAllowedSQLFunctions.

$wgCargoAllowedSQLFunctions Defaults
Math functions COUNT, FLOOR, CEIL, ROUND, MAX, MIN, AVG, SUM, POWER, LN, LOG
String functions CONCAT, GROUP_CONCAT, IF, LOWER, LCASE, UPPER, UCASE, SUBSTRING, TRIM, FORMAT, COALESCE
Date functions NOW, DATE, YEAR, MONTH, DAYOFMONTH, DATE_FORMAT, DATE_ADD, DATE_SUB, DATEDIFF

If you want to use any additional functions, you can enable them by adding lines to LocalSettings.php like this one, after the inclusion of Cargo:

$wgCargoAllowedSQLFunctions[] = 'CURDATE';

You can find documentations on all of these functions online. For MySQL, for example, you can see the string functions explained here, and date functions explained here. Below are some examples of how SQL functions can be used, all assuming a MySQL system.

[edit]

You can use CONCAT() to create custom link text for both internal and external links. Example:

{{#cargo_query:table=Newspapers
|fields=CONCAT( '[[', _pageName, '|View page]]' ) = Newspaper, Circulation, CONCAT( '[', URL, ' View URL]' ) = URL
}}
[edit]

Conversely, you can use CONCAT() to remove links to values, for fields of type "Page". By default, such values are displayed as links, but you may want to instead display them as just strings. The CONCAT() function is probably the easiest way to do that. If the "Author" field here is of type "Page", then to display author values as just strings, you could call the following:

{{#cargo_query:tables=Blog_posts
|fields=_pageName,CONCAT(Author)
}}

This works because, as long as what is being displayed is not simply the field name, #cargo_query will not apply any of that field's special handling.

If the field holds a list of values, you should instead call "CONCAT(fieldName__full)". So the call could look like this:

{{#cargo_query:tables=Blog_posts
|fields=_pageName,CONCAT(Topics__full)
}}

Disabling wikitext parsing

[edit]

By default, every field of type "Wikitext string" and "Wikitext" - and every field calling a string function, like CONCAT() - is parsed as wikitext, so that a value containing wikitext ''like this'' will be displayed in italics. If you want to disable that, and actually show the wikitext on the screen, there's a hack you can use for that: wrap the field name or function inside the SQL TRIM() function. So if the field looked like "TRIM(MyWikitextField)", the syntax in the values would not get parsed but just displayed as-is.

Date filtering

[edit]

You can use date functions like DATEDIFF() to get items with a date within a certain range. Example:

{{#cargo_query:tables=Blog_posts
|fields=_pageName,Author,Date
|where=DATEDIFF(Date,NOW()) >= -7
|order by=Date DESC
}}

Another example:

{{#cargo_query:tables=Blog_posts
|fields=_pageName,Author,Date
|where=YEAR(Date) = "2000"
}}

Truncating strings

[edit]

You can use string functions like LEFT() or SUBSTRING() to trim strings (note that these functions are not allowed by default). The following example also uses CONCAT() and IF() to append an ellipsis, only if the string value (a quote) has been truncated. Example:

{{#cargo_query:tables=Authors
|fields=_pageName=Author, CONCAT( LEFT( Quote, 200 ), IF( LENGTH( Quote ) > 200, '...', '' ) )=Quote
}}

Displaying hard-coded (non-Cargo) values

[edit]

You can use nested IF() statements to display custom data that does not come from Cargo within the query results. Example:

{{#cargo_query:tables=Drinks
|fields=_pageName=Drink, Ingredients, IF( _pageName = 'Lemonade', 'One dollar', IF ( _pageName = 'Fruit punch', 'Two dollars', 'Three dollars' ) )=Price
}}

Configuration

[edit]

There are several settings you can add to LocalSettings.php to change the display of query results:

  • $wgCargoDefaultQueryLimit - the number of results to show for #cargo_query if no limit is set (default is 100)
  • $wgCargoMaxQueryLimit - the maximum allowed number of results for #cargo_query (default is 5000)
  • $wgCargoAllowedSQLFunctions - the set of allowed SQL functions (see "Using SQL functions", above)
  • $wgCargoHideNamespaceName - an array of namespaces that, if a page from that namespace is displayed among the query results, the namespace part of the page name is hidden (by default, the array holds just NS_FILE)
  • $wgCargoDecimalMark - the character that separates between the main number and the decimal part for numbers on this wiki (depends on the wiki's country of origin) (default is ".")
  • $wgCargoDigitGroupingCharacter - the character that separates digits (usually thousands) for numbers on this wiki (default is ",")

If any of these are added to LocalSettings.php, they should be placed below the inclusion of Cargo.