Building Custom Reports in MediaWiki Using Lua Modules

Why Lua modules matter for MediaWiki reporting

At first glance a MediaWiki installation looks like a giant collection of wikitext pages, nothing more than a glorified blog. In practice, though, the platform hides a surprisingly powerful scripting layer called Scribunto. With Scribunto you can write Lua code that lives inside Module: pages and is callable from any other page via {{#invoke:…}}. This dynamic bridge is what turns a static encyclopedia into a live data‑driven reporting engine.

In my own wiki‑farm we needed a way to pull together page‑view stats, recent edit summaries and even external JSON feeds—all without touching the PHP core. The answer? A handful of well‑structured Lua modules. The rest of this post walks through the thought process, the essential building blocks, and a concrete example that you can adapt to any domain.

Getting the basics right

First things first: make sure Scribunto is installed and enabled. In LocalSettings.php you should see something like:

$wgEnableModules = true;
wfLoadExtension( 'Scribunto' );

If you’re on MediaWiki 1.35 or newer the extension is bundled, so the only real requirement is that your host allows execution of the Lua binary. Once that’s confirmed, you can create a Module:ReportHelper page and start coding.

The data model – what you’ll be crunching

When we talk about “reports” in the wiki context we usually mean one of three things:

  • Page‑metadata collections – titles, categories, timestamps.
  • Statistics from the core API – page‑view counters, edit counts.
  • External data sources – JSON from a public API, CSV files stored on the wiki.

Lua in MediaWiki doesn’t ship with a full‑blown HTTP client, but the mw.http library can fetch JSON over and mw.title plus mw.site give you a convenient wrapper over the core wiki API.

Setting up a reusable helper module

Below is a skeleton that most reporters end up customizing. It abstracts away the boilerplate for fetching a list of pages in a given category and for pulling raw JSON from a URL.

local p = {}
local mw = mw

--- Return a table of page titles belonging to category.
--- Uses the core API via mw.title.makeTitle and mw.site:query.
function p.getPagesInCategory( category )
    local result = {}
    local cont = nil
    repeat
        local params = {
            action = "query",
            list = "categorymembers",
            cmtitle = "Category:" .. category,
            cmlimit = "500",
            cmcontinue = cont,
        }
        local response = mw.site:query( params )
        for _, page in ipairs( response.query.categorymembers ) do
            table.insert( result, page.title )
        end
        cont = response.continue and response.continue.cmcontinue or nil
    until not cont
    return result
end

--- Fetch JSON from url and decode it. Errors are caught
--- and reported as a simple string, keeping the report robust.
function p.fetchJson( url )
    local ok, data = pcall( mw.http.request, url )
    if not ok or not data then
        return { error = "Unable to fetch " .. url }
    end
    local success, decoded = pcall( mw.text.jsonDecode, data )
    if not success then
        return { error = "Invalid JSON from " .. url }
    end
    return decoded
end

return p

Notice the liberal use of pcall. Error‑handling in Lua is a bit like walking a tightrope over a construction site—one slip and the whole page collapses. By catching errors early we keep the final report tidy, even if an external service hiccups.

Putting it together – a sample “Monthly Edit Summary” report

Imagine you need a monthly snapshot of all edits made to pages in Category:Published‑Articles. The report should include:

  1. The total number of edits.
  2. The top five editors by edit count.
  3. A list of pages that have not been edited in the last 30 days.

The following module does exactly that. It calls the helper above, runs a second API query for revisions, and finally renders an HTML table using mw.html.

local report = {}
local helper = require( "Module:ReportHelper" )
local html = mw.html.create

--- Main entry point used by {{#invoke:MonthlyEditReport|generate}}
function report.generate( frame )
    local category = "Published-Articles"
    local pages = helper.getPagesInCategory( category )
    local editCounts = {}
    local lastEdit = {}

    for _, title in ipairs( pages ) do
        local revs = mw.site:query{
            action = "query",
            prop = "revisions",
            titles = title,
            rvprop = "timestamp|user",
            rvlimit = "max",
        }
        local pageInfo = revs.query.pages[1]
        if pageInfo.revisions then
            editCounts[title] = #pageInfo.revisions
            lastEdit[title] = pageInfo.revisions[1].timestamp
        else
            editCounts[title] = 0
            lastEdit[title] = "Never"
        end
    end

    -- 1. total edits
    local total = 0
    for _, count in pairs( editCounts ) do total = total + count end

    -- 2. top editors
    local editorFreq = {}
    for _, title in ipairs( pages ) do
        local revs = mw.site:query{
            action = "query",
            prop = "revisions",
            titles = title,
            rvprop = "user",
            rvlimit = "max",
        }
        for _, rev in ipairs( revs.query.pages[1].revisions or {} ) do
            local user = rev.user
            editorFreq[user] = ( editorFreq[user] or 0 ) + 1
        end
    end

    local topEditors = {}
    for user, cnt in pairs( editorFreq ) do
        table.insert( topEditors, { user = user, cnt = cnt } )
    end
    table.sort( topEditors, function(a,b) return a.cnt > b.cnt end )
    topEditors = { unpack( topEditors, 1, 5 ) }

    -- 3. stale pages
    local stale = {}
    local thirtyDaysAgo = os.time{ year=os.date("%Y"), month=os.date("%m"), day=os.date("%d")-30 }
    for title, ts in pairs( lastEdit ) do
        if ts ~= "Never" then
            local tsTime = mw.parseDate( ts )
            if tsTime < thirtyDaysAgo then
                table.insert( stale, title )
            end
        else
            table.insert( stale, title )
        end
    end

    -- Render HTML
    local out = html()
    out:tag( "h3" ):wikitext( "Monthly Edit Summary for " .. category ):done()
    out:tag( "p" ):wikitext( "Total edits: " .. total ):done()
    out:tag( "h4" ):wikitext( "Top 5 editors" ):done()
    out:tag( "ul" )
    for _, e in ipairs( topEditors ) do
        out:wikitext( string.format( "* %s (%d edits)", e.user, e.cnt ) )
    end
    out:done() -- close ul

    out:tag( "h4" ):wikitext( "Pages not edited in the last 30 days" ):done()
    out:tag( "ul" )
    for _, t in ipairs( stale ) do
        out:wikitext( "* [[" .. t .. "]]" )
    end
    out:done() -- close ul

    return out:wikitext()
end

return report

Take a minute to skim the code. You’ll see three distinct sections: data gathering, aggregation, and rendering. That separation is intentional; it makes the module easier to test (by calling each function from the Lua REPL) and simple to extend. For example, you could add a second chart that visualizes edit spikes using the mw.graph library.

Calling the module from a wiki page

With the module in place, the actual report page is just a one‑liner:

{{#invoke:MonthlyEditReport|generate}}

When a reader visits the page, MediaWiki runs the Lua code server‑side, injects the generated HTML, and the visitor sees a fresh report each time. No JavaScript, no extra CSS—just plain wikitext plus a dash of Lua magic.

Performance tips you won’t find in the official docs

  • Cache aggressively. Wrap expensive calls in <>mw.loadData or use the mw.smw.store extension if you have SMW installed. Cached results survive across page loads and dramatically cut API traffic.
  • Chunk large result sets. The MediaWiki API limits cmlimit to 500 for regular users; if you need more, paginate with cmcontinue, as shown in the helper.
  • Avoid string concatenation in loops. Lua’s table.concat is far faster than the naive result = result .. piece pattern.
  • Watch out for timezone quirks. mw.parseDate a timestamp in UTC. If you compare it against a local “30‑day ago” value, first convert both sides to the same timezone.

Real‑world use cases that sparked the idea

On a community‑run documentation wiki we built a “License compliance” report that scanned every page for a {{License}} template, aggregated the license types, and presented a colored pie chart. The same pattern works for “Open issues per project”, “Top contributors in the last sprint”, or “List of pages missing a ‘References’ section”. All you need is a way to tag the pages (usually via categories or hidden templates) and a module that pulls that tag tallies, and spits out HTML.

Common pitfalls (and how to sidestep them)

1. Forgot to enable $wgAllowExternalImagesFrom when pulling remote JSON. The Lua sandbox will silently return an empty string. Adding the domain to the whitelist fixes it.
2. Ran into “module load error: recursive require”. That usually means two modules call each other. Break the cycle by moving shared logic into a third “utils” module.
3. Performance spikes after a big edit war. The API query for revisions can explode. Limit rvlimit to a reasonable number (e.g., 100) and cache the totals instead of recomputing each time.

Where to go from here

If you’re already comfortable with the basics, consider blending Lua modules with the Data Transfer extension to export CSV files, or pair them with VisualEditor to let non‑technical users tweak report parameters without touching raw wikitext.

In short, Lua modules give you a programmable backbone inside MediaWiki—something that feels half‑baked at first, but once the first report works, you’ll start seeing every page as a data source. That’s the magic: a wiki that not only stores knowledge, but also analyses it on the fly.

Subscribe to MediaWiki Tips and Tricks

Don’t miss out on the latest articles. Sign up now to get access to the library of members-only articles.
jamie@example.com
Subscribe