How to Use the Semantic MediaWiki Extension for Structured Data
Semantic MediaWiki (SMW) turns a regular MediaWiki installation into a semantic wiki
Overview
Semantic MediaWiki (SMW) turns a regular MediaWiki installation into a semantic wiki. By adding lightweight [[property::value]] markup, pages become a source of structured data that can be queried, visualised and exported. The extension is stable (release 6.0.1) and works with MediaWiki 1.43+ and PHP 8.1+ [[Extension:Semantic MediaWiki]].
Installation Quick‑Start
The recommended installation method is Composer. After cloning your MediaWiki source, run:
composer require mediawiki/semantic-media-wiki
Then add the following lines to LocalSettings.php:
wfLoadExtension( 'SemanticMediaWiki' ); // Optional: load popular spinoffs wfLoadExtension( 'PageForms' ); // form support wfLoadExtension( 'SemanticResultFormats' ); // charts, timelines, etc.
Run the setup script to create the SMW tables:
php maintenance/setupStore.php
For a production wiki you may want to restrict the temporary‑table privileges after the initial setup ($wgGroupPermissions['*']['createaccount'] = false; etc.). The installation page on the MediaWiki site details the required database rights (CREATE, ALTER, CREATE TEMPORARY TABLES) and optional PHP extensions (mbstring, intl, curl) [[Installation - semantic-mediawiki.org]].
Core Concepts
- Property – the predicate of a triple, e.g.
Has capital. - Subject – the page where the property is placed.
- Object – the value, which can be a page, a number, a date, a coordinate, etc.
SMW stores these triples in its own tables, enabling efficient #ask queries and specialised result formats.
Adding Structured Data
The simplest way is inline markup:
The capital of [[Germany]] is [[Has capital::Berlin]]. Population of [[Germany]] is [[Population::83 020 000]].
For larger data sets, the recommended practice is to keep the markup inside a template. This keeps the page readable and lets you reuse the same property definitions.
{{Country | name = Germany | capital = Berlin | population = 83020000 }}
The Country template would contain the SMW markup:
{{#if:{{{name|}}}|[[Has name::{{{name}}}]]}} {{#if:{{{capital|}}}|[[Has capital::{{{capital}}}]]}} {{#if:{{{population|}}}|[[Population::{{{population}}}]]}}
All data is now stored as semantic triples, ready for queries.
Creating Forms with Page Forms
Manual data entry can be error‑prone. The Page Forms extension (formerly Semantic Forms) provides HTML forms that map directly to template parameters. A minimal form definition:
{{#form:CountryForm}}
and the form description (Form:CountryForm):
{{{for template|Country}}} ! Name | {{{field|name|input type=text|mandatory}}} ! Capital | {{{field|capital|input type=text|mandatory}}} ! Population | {{{field|population|input type=number|size=10}}} {{{/for}}}
When a user fills the form, the underlying Country template is populated, and the semantic properties are stored automatically. This workflow is described on the Page Forms documentation page [[Extension:Page Forms]].
Querying Data with #ask
The heart of SMW is the inline query language. The basic syntax is:
{{#ask: [[Has capital::Berlin]] | ?Population | format=table | limit=20 }}
This query returns all pages that have Has capital set to Berlin and displays the Population property in a table. The format parameter can be swapped for any of the result formats provided by the Semantic Result Formats extension – e.g. format=chart, format=timeline, format=bar, or format=map.
Visualising Data
- Maps – the
Mapsextension works with SMW to plot geographic coordinates. Example:
{{#ask: [[Category:City]] | ?Coordinates | format=map | link=title | zoom=5 }}
- Charts & Timelines – using
format=chartorformat=timelinefromSemantic Result Formats. The query can be refined with filters, sorting, and aggregation. - Drill‑down Browsers – the
Semantic Drilldownextension provides a faceted navigation pane that automatically builds filters for any property used in a query.
All visualisations are generated on the fly and respect the wiki’s CSS, making them look native.
Importing External Data
When you need to pull data from CSV, JSON, XML or a relational database, the External Data extension works together with SMW. A typical CSV import:
{{#get_external_data: url=http://example.org/data.csv |data=csv |format=raw |separator=, |columns=Country,Capital,Population }} {{#foreach: data | {{Country|{{{1}}}}}}
Each row can be fed into a template that creates the appropriate semantic triples, allowing you to keep the wiki in sync with external sources.
Advanced Topics
- Concepts – reusable query definitions that act like virtual pages. Example:
[[Concept:LargeCountries]]defined as[[Population::>50000000]]. - SPARQL & RDF stores – SMW can be configured to store data in an external triplestore via the
ElasticStoreorSPARQLStoreback‑ends. This is useful for very large knowledge bases. - Semantic Watchlist – the
Semantic Watchlistextension lets users subscribe to changes of specific properties, receiving email or wiki notifications.
These features are covered in the SMW manual pages [[Help:Semantic MediaWiki extensions]] and the developer documentation on the SMW website.
Best Practices
- Keep all SMW markup inside templates; pages should only call templates.
- Define property types (e.g.
[[Has population::+]]for numbers) to get proper validation and sorting. - Use
[[Category:…]]together with SMW properties for hierarchical browsing. - Cache queries by enabling the parser cache and SMW’s own cache (see
LocalSettings.phprecommendations). - Document concepts and queries on dedicated wiki pages so non‑technical editors can reuse them.
Conclusion
Semantic MediaWiki provides a powerful, low‑barrier way to turn a MediaWiki site into a structured‑data platform. By installing the core extension, optionally adding Page Forms, Result Formats and Maps, and by following the template‑centric workflow, you can capture, query, visualise and export data with just a few lines of wikitext. The extension’s tight integration with MediaWiki’s permission system, versioning and localisation makes it a natural choice for knowledge‑base projects, research portals, or any site that needs reliable, queryable data.