By MW in mediawiki — 12 Sep 2024

Unlocking MediaWiki's Potential with the Cargo Extension for Database-Like Functionality

Why Cargo feels like a missing piece in a MediaWiki puzzle

If you’ve ever hacked a wiki into a makeshift CRM, a film catalog, or even a public‑records portal, you know the pain of forcing MediaWiki’s flat‑page model to behave like a spreadsheet. That’s where the Cargo extension slides in. It doesn’t try to replace the core—rather, it augments it with a lightweight, SQL‑flavoured layer that lives inside the wiki.

Quick glance: Cargo in a nutshell

Defines tables directly on wiki pages using simple markup.
Stores data in the underlying MySQL/MariaDB schema, giving you real indexes.
Offers two parser functions – #cargo_declare and #cargo_query – to declare and retrieve rows.
Works with CSV imports, JSON uploads and even the #cargo_store inline function.
Provides a REST‑style API for external applications.

Sound familiar? It’s essentially a “database in a wiki” that drops the need for an external DBMS while keeping queries fast.

Getting Cargo on your wiki – the practical bits

First, a short reality check: Cargo was built for MediaWiki 1.31+ and works best on a MySQL backend. If you run PostgreSQL you’ll hit a few bumps, but the core still functions.

Installation steps

Open a terminal on your wiki host.
Run composer require mediawiki/cargo (or download the tarball from MediaWiki.org).
Copy the extensions/Cargo folder into your MediaWiki extensions directory if you used the tarball.
Add wfLoadExtension( 'Cargo' ); to LocalSettings.php just after your other extensions.
Run the update script: php maintenance/update.php. This creates the required Cargo tables in your database.

That’s it. A quick reload of Special:Version should now show Cargo with a version number.

Declaring a table – the first line of code

Unlike external DB tools, Cargo lets you declare a table right where you want it. A typical declaration looks like this:

{{#cargo_declare:
|Table=Film
|Fields=Title=String, Year=Number, Director=String, Rating=Number
|Primary key=Title
}}

Drop that block on a page called FilmTable (or any page you like). The next time you save the page, MediaWiki creates a MySQL table named cargo_Film with the columns you listed.

Storing data with `#cargo_store`

If you’re a fan of template‑driven forms, this is the sweet spot. Imagine a template FilmRow that you embed on a separate page for each film:

{{#cargo_store:
|Title={{{title|}}}
|Year={{{year|}}}
|Director={{{director|}}}
|Rating={{{rating|}}}
|Table=Film
}}

Whenever a page using {{FilmRow|title=Inception|year=2010|director=Christopher Nolan|rating=8.8}} is saved, Cargo silently inserts a row into the Film table. No extra admin work, just the wiki’s normal edit workflow.

Querying the data – turning tables into readable content

Now for the fun part. You can fetch rows with a single parser function, or you can build a table view with a handful of parameters. Here’s a classic example:

{{#cargo_query:
|tables=Film
|fields=Title, Year, Director, Rating
|where=Rating>8
|order by=Rating DESC
|limit=10
|format=table
}}

That snippet drops a nicely formatted HTML table into the page, showing the top‑10 films by rating. The same function can output CSV, JSON or even a custom template for each row. For instance, to render each row using a FilmCard template:

{{#cargo_query:
|tables=Film
|fields=Title, Year, Director, Rating
|template=FilmCard
|format=template
}}

Behind the scenes Cargo translates the request into a fast MySQL SELECT query. It respects indexes you set up via #cargo_declare (e.g., Primary key=Title) and can even create additional indexes with the +index flag.

Complex conditions and joins

A lot of people assume Cargo is limited to single‑table lookups. Not true. You can join tables using the tables parameter, separating them with a comma. Consider you have a second table called Actor:

{{#cargo_declare:
|Table=Actor
|Fields=Name=String, Film=Title, Role=String
|Primary key=Name, Film
}}

Now pull a list of actors for each film:

{{#cargo_query:
|tables=Film, Actor
|fields=Film.Title, Actor.Name, Actor.Role
|where=Film.Title=Actor.Film
|order by=Film.Title ASC, Actor.Name ASC
|format=table
}}

Note the explicit where clause that mirrors a SQL join condition. It works fine for modest data sets—once you start hitting tens of thousands of rows, consider adding composite indexes to keep performance snappy.

Importing bulk data – CSV, JSON and beyond

Manual entry is great for a few dozen rows, but real‑world deployments often involve hundreds or thousands of items. Cargo ships with a Special:CargoTables page where you can upload a CSV file.

Navigate to Special:CargoTables.
Select “Import CSV”.
Choose an existing Cargo table (or type a new name to create one).
Match columns in the CSV to the table fields, then hit “Import”.

The import runs as a background job, so you can keep editing while it processes. For JSON you can use the #cargo_store function inside a

block, or write a small PHP script that calls the Cargo API (more on that later).

Performance tips – making Cargo scale gracefully

Because Cargo stores data in regular MySQL tables, the usual DB‑tuning tricks apply. Here are a few that have saved me from a slow‑query nightmare:

Index fields used in where clauses. Adding +index to a field declaration tells Cargo to create an index automatically.
Avoid SELECT * – explicitly list the fields you need. MySQL has to fetch fewer columns, and Cargo’s result parsing is quicker.
Cache frequent queries. Cargo has built-in query caching; you can tweak $wgCargoQueryCacheTimeout in LocalSettings.php. The default is 300 seconds, a good starting point.
Chunk large imports. Splitting a 50k‑row CSV into 5‑row batches prevents the job queue from choking.
Watch out for recursive templates. If a template that includes #cargo_query ends up calling itself (directly or via another template), you’ll see an “maximum recursion depth reached” warning. Break the loop with #iferror or a guard variable.

Cargo vs. Semantic MediaWiki – a quick comparison

Both extensions aim to give MediaWiki a data layer, yet they take divergent approaches. Semantic MediaWiki (SMW) stores triples in a dedicated triple store, while Cargo relies on the existing relational engine. The practical upshots:

Query language. SMW uses #ask with its own syntax; Cargo uses something that looks a lot like SQL, so DB‑savvy users feel at home.
Performance. Cargo’s queries are generally faster for large numeric datasets because MySQL’s optimizer does the heavy lifting.
Flexibility. SMW shines when you need inferencing or complex ontologies. Cargo is better for flat, tabular data—think inventory lists, event schedules, or bibliographies.
Installation footprint. Cargo adds a few tables; SMW adds a whole triple store schema, which can be more demanding on backup and migration processes.

It’s not a matter of “either/or”; many wikis run both, using SMW for semantic relationships and Cargo for high‑volume numeric reporting.

Using the Cargo API from outside the wiki

One of the neat side‑effects of Cargo’s design is that the data is exposed via MediaWiki’s standard API. A GET request like:

https://example.org/w/api.php?action=cargoquery&tables=Film&fields=Title,Year,Rating&format=json

returns a tidy JSON array of rows. You can feed that into a JavaScript dashboard, a Python data‑analysis script, or a mobile app. Authentication works the same way as any other API call—use a bot password or OAuth token for write operations.

Real‑world story: turning a community site into a public‑records archive

I once helped a municipal open‑data group repurpose their MediaWiki into a searchable registry of building permits. The original wiki held the PDFs as file uploads, but the metadata (address, date, applicant) was just free‑form text. By adding Cargo, they declared a Permit table, used #cargo_store in a simple “record” template, and then built a public‑facing query page that let citizens filter permits by year or zip code. Performance stayed snappy even after a few thousand rows, thanks to a composite index on Address+Year. The whole setup lives inside the same wiki they already maintain—no separate database, no extra hosting costs.

That anecdote illustrates the “low barrier, high payoff” vibe of Cargo. You get relational power without leaving the familiar MediaWiki editing workflow.

Things to watch out for

Nothing is perfect. A few quirks have tripped me up:

Field names are case‑sensitive. Declaring Year but querying year returns nothing.
If you rename a page that hosts a #cargo_declare block, Cargo does not automatically rename the underlying MySQL table. You must manually run maintenance/renameTable.php or recreate the table.
When you delete a page that contained #cargo_store, the rows stay in the table. To purge them you need to run a DELETE FROM cargo_Permit WHERE ... query via the API or a maintenance script.

Being aware of these gotchas keeps the wiki tidy and prevents orphaned data.

Wrapping up the Cargo journey

MediaWiki’s core shines as a collaborative editing platform; Cargo extends that shine into the realm of structured, query‑able data. Whether you’re cataloguing movies, tracking inventory, or publishing public records, Cargo gives you the relational toolbox without a separate database to manage.

Installation is a handful of commands, declaration a single markup block, and querying – as simple as a templated #cargo_query. Add indexes where you need them, leverage the API for external consumption, and you’ve got a lean, performant data solution baked right into your wiki.

So next time you stare at a giant table of wiki pages and think “there has to be a better way,” remember that the “better way” might already be waiting in the Cargo extension.