Mastering the Cargo Extension for Structured Data in MediaWiki

Why Structured Data Matters in a Wiki

Ever opened a wiki page and thought, “This could be a spreadsheet, but I’m stuck with plain text”? Yeah, I’ve been there. MediaWiki, at first glance, feels like a giant notebook – you scribble, you link, you hope the reader gets the gist. But when you start dealing with hundreds of items – species lists, election results, or a catalog of community‑run events – a flat page turns into a maze.

Enter Cargo, the unsung hero that lets you treat MediaWiki almost like a tiny relational database, without needing to spin up MySQL tables you can’t see. It’s like giving your wiki a secret stash of spreadsheets that you can query on the fly. The results? Clean tables, dynamic charts, and pages that update themselves as the data grows.

Getting Cargo on Board

First thing’s first: the extension isn’t baked into a vanilla MediaWiki install. You’ll need to pull it from the extension repository. A quick git clone does the trick:


git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Cargo.git
cd Cargo
composer install

But don’t just drop the folder into extensions/ and call it a day. You have to register it in LocalSettings.php. A typical snippet looks like this:


wfLoadExtension( 'Cargo' );
$wgCargoEnableSearchAPI = true; // optional, but handy for external tools

Now, a little note about versions. Cargo tracks MediaWiki’s LTS releases fairly closely, but if you’re on a bleeding‑edge version (say, the latest 1.40 development build), double‑check the compatibility matrix on the extension’s official page. A mismatched version can cause cryptic errors that look like “unknown table ‘cargo_pages’”. Trust me, that’s a version‑mismatch, not an alien invasion.

Defining Your First Cargo Table

Okay, you’ve got Cargo humming. Time to lay down a table. Cargo uses a special #cargo parser function that lives inside a wiki page. Think of the page as the table’s “schema definition”. Here’s a bare‑bones example for a simple book catalog:


{{#cargo_declare:
  table=Books,
  fields=Title=String, Author=String, Year=Integer, ISBN=String,
  format=table
}}

Notice the format=table – that tells Cargo to render the result as an HTML table when you later query it. You could also pick format=chart if you want a quick bar graph, but that’s a rabbit hole for later.

Now, you might wonder, “Do I have to write every field manually?” Not really. Cargo can infer types from the data you feed it later, but it’s good practice to be explicit, especially when you plan to run numeric calculations. Otherwise, Cargo might treat “2023” as a string, and your SUM queries will return zero.

Populating Data – The Real Work

There are three main ways to get data into a Cargo table:

  • Inline data on the same page – using #cargo_store right after the declaration.
  • External CSV imports – handy for bulk uploads.
  • Semantic MediaWiki integration – if you already have SMW, Cargo can read its properties.

Let’s stick to the first method for now. Below is a snippet that adds three rows to the Books table we declared earlier:


{{#cargo_store:
| Title = The Wind-Up Bird Chronicle
| Author = Haruki Murakami
| Year = 1994
| ISBN = 9780679743381
}}
{{#cargo_store:
| Title = 1984
| Author = George Orwell
| Year = 1949
| ISBN = 9780451524935
}}
{{#cargo_store:
| Title = Sapiens
| Author = Yuval Noah Harari
| Year = 2011
| ISBN = 9780062316097
}}

Notice the pipe character at the start of each line – it’s a bit quirky, but that’s how Cargo knows you’re feeding it data. If you forget a pipe, the parser silently drops that row, which can be maddening when you’re debugging later.

Querying with Cargo – Getting the Good Stuff Out

Now for the fun part: pulling data back onto a wiki page. Cargo’s query syntax is reminiscent of SQL, but stripped down to keep it wiki‑friendly. Here’s a simple “list all books” query:


{{#cargo_query:
  tables=Books,
  fields=Title, Author, Year,
  format=table
}}

That will spit out a nice HTML table, sorted by the order they were entered. Want it sorted by year, descending? Add an order by clause:


{{#cargo_query:
  tables=Books,
  fields=Title, Author, Year,
  order by=Year DESC,
  format=table
}}

Notice the DESC – you can also use ASC for ascending. Cargo supports basic aggregate functions, too. Say you want to count how many books per author:


{{#cargo_query:
  tables=Books,
  fields=Author, COUNT(Title)=NumBooks,
  group by=Author,
  order by=NumBooks DESC,
  format=table
}}

That little COUNT(Title)=NumBooks syntax can feel odd at first, but it’s just Cargo’s way of naming the output column. You can also use SUM, AVG, MIN, MAX – any typical aggregation you’d expect from a relational engine.

Displaying Results – Beyond Plain Tables

Tables are great, but sometimes you need something flashier. Cargo can feed data into a Chart extension or even a #widget if you’re feeling adventurous. For a quick bar chart of books per year, try:


{{#cargo_query:
  tables=Books,
  fields=Year, COUNT(Title)=NumBooks,
  group by=Year,
  format=chart,
  chart type=column,
  chart title=Books Published per Year,
  chart xAxis=Year,
  chart yAxis=NumBooks
}}

That renders a tidy column chart right in the page. The syntax for chart options is a bit verbose, but once you get the hang of it, you can tweak colors, legends, and tooltips. If you’re a fan of vega-lite, Cargo also has a #cargo_viz function that spits out a JSON spec you can feed into any Vega renderer.

Performance Tips – Keep It Snappy

When you start scaling up – think thousands of rows, multiple tables – you’ll notice Cargo queries can become sluggish. Here are a few pragmatic tricks:

  1. Index your tables. Add index=Author (or any field you frequently filter on) in the #cargo_declare line. Example: index=Year for date‑driven queries.
  2. Limit result sets. Use limit=100 or similar to avoid pulling the whole universe into a single page.
  3. Cache queries. Cargo respects MediaWiki’s parser cache, but you can add no cache flags if you need fresh data, or cache=3600 to explicitly cache for an hour.
  4. Avoid heavy joins. Cargo doesn’t truly support joins; you can simulate them with link= fields, but it’s better to denormalize where performance matters.

One anecdote: a community wiki tried to list every public park in a large city, roughly 12,000 entries. They initially stored everything in a single Parks table without indexes. Page loads took >30 seconds. Adding index=City and limit=500 on the front‑page summary cut the load time to under 3 seconds. A simple tweak, but a huge difference for the end‑user.

Real‑World Use Cases – Inspiration Corner

Below are a few ways folks have harnessed Cargo beyond the textbook examples:

  • Election result dashboards. Each constituency’s vote tallies go into a Results table; a query aggregates totals and feeds a live bar chart that updates as precincts report.
  • Open‑source project trackers. A Projects table holds repo URLs, license types, and contributor counts. A query spits out a sortable table that community members embed on the wiki’s “Stats” page.
  • Historical artifact catalogs. Museums can store artifact metadata – accession numbers, provenance, dating – and then generate printable tables for exhibition labels directly from the wiki.

What’s neat is that anyone with edit rights can add a row via the familiar wiki editor. No separate admin portal needed. It democratizes data entry, which is both a blessing (more contributors) and a curse (potential for messy data). That’s why schema enforcement via #cargo_declare is crucial – it keeps the field types consistent.

Debugging Quirks – When Cargo Says “Nope”

Every extension has its temperamental moments. Here’s a quick cheat‑sheet for common pitfalls:

SymptomLikely CauseFix
“Table not found” errorMissing or misspelled #cargo_declareCheck the page name and table identifier; ensure the page is saved.
Empty query resultsIncorrect field names or mismatched caseFields are case‑sensitive; verify spelling in both #cargo_store and query.
Slow page loadNo index on heavily filtered fieldAdd index=FieldName to the table declaration.
Data type mismatch (e.g., “Year” treated as text)Implicit type detection failedExplicitly set Year=Integer in the declaration.

Pro tip: the Special:CargoTables page lists all tables, their fields, and indexes. It’s like a quick health‑check dashboard. If something looks off, you can edit the declaration page directly from that special page.

Peeking Ahead – What’s Next for Cargo?

There’s chatter in the developer mailing list about adding native JSON export and tighter integration with the upcoming DataMaps extension. The idea is to let Cargo power geographic visualizations without a separate GIS layer. If you’re reading this after the next LTS release, keep an eye out for a geo= field option that could make spatial queries a breeze.

Meanwhile, the community is building a suite of #cargo_template helpers that let you embed query results inside transcluded templates – perfect for “list of the day” sidebars that auto‑rotate based on query criteria.

Wrapping Thoughts – Not a Formal Wrap‑Up, Just a Note

So, you’ve seen how Cargo can turn a static MediaWiki into a living data hub. The learning curve isn’t steep, but the payoff is real: less copy‑pasting, fewer stale tables, and pages that actually evolve as your data does. If you ever feel stuck, the official documentation is surprisingly thorough, and the extension’s talk page is full of folks who love to help.

At the end of the day, structured data in a wiki is about empowerment – giving editors the tools to manage information the way a spreadsheet would, but without leaving the comfort of the wiki’s collaborative environment. That’s the magic Cargo brings to the table, and perhaps, to your next project.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe