Jump to content

Wikimedia Technical Documentation Team/Documentation tool study

From mediawiki.org

This page contains the final report of the documentation tool study conducted in August-October 2024 by the Wikimedia Foundation Technical Documentation Team (tracked in T371670, research and report by KBach-WMF). The goal of the study was to better understand the tools and processes used by the Wikimedia Movement when creating and maintaining technical documentation . This report contains the main takeaways from the study and recommendations for the next steps.

Methodology

[edit]

This project consisted of three phases, each with its own assumptions, methods, tools:

  1. Data collection - collecting the information about documentation tools and processes used in repositories hosted on Wikimedia GitLab and Gerrit. This was a manual process that involved investigating repository content, checking the linked resources outside the repository (wiki pages, websites, etc.), and checking the common documentation locations for resources connected to the repository.
  2. Data analysis - loading the data about all repositories and items into a database, and then processing and analyzing it using SQL, Python, and Pandas.
  3. Data report - creating the final presentation and report using Jupyter Lab, and then presenting the study outcomes to the Technical Documentation Team and the general audience.

Repositories

[edit]

The repositories from Wikimedia GitLab and Gerrit included in the study are as follows.

GitLab:

  • A large selection of toolforge-repos projects
  • All affiliate projects
  • A selection of popular (based on star count) personal and general projects
  • Small selection of random repositories

Gerrit:

  • Random selection of non-archived repositories

Items

[edit]

Each identified documentation tool, process, or product was added to the list of documentation items. For empty repositories and repositories without any documentation, a single item marked Empty repo or No docs was added to the same list.

As a result, an item can represent:

  • an empty repository
  • a repository with no documentation
  • a section on a wiki page, a wiki page, a collection of wiki pages clustered together
  • a file, a folder with many files, separate files spread throughout a repository
  • an entire website
  • a tool used to process or produce any of the former

Data set

[edit]

The data set contains two data structures in the form of tables, one representing repositories, and one representing items. The total number of repositories included in the survey is 611. The total number of items generated based on these repositories is 952.

In the data set, 80.2% of repositories are not empty and 19.8% are empty. Wikimedia Foundation staff members contributed to 50.74%, volunteers - 48.28%, affiliates - 9%, commercial entities - 2.45%, and others to 0.16% of the repositories.

Most (94%) of the identified 121 empty repositories belonged to tools. Empty repositories were generally excluded from the analysis.

Main takeaways

[edit]

State of documentation

[edit]

Over 85% of non-empty repositories have some sort of documentation.

Count Docs %
417 With docs 85.1
73 Without docs 14.9
[edit]

READMEs and wiki pages are the most common documentation tools with 39.25% and 35.75% repositories respectively using them in documentation. All other tools add up to almost 16%, only seven percentage points above the number of repositories without documentation.

Tool Count %
README 314 39.25
Wiki 286 35.75
Other 127 15.88
None 73 9.12

Tools per project type

[edit]

Looking at which documentation tools are most popular in specific project types:

Wikis are the most popular documentation tool for:

  • affiliate projects
  • extensions
  • projects related to gadgets (tied with READMEs)
  • projects related to mobile apps
  • skins
  • SRE projects (tied with READMEs)

READMEs are the most popular documentation tool for:

  • personal projects
  • projects related to analytics
  • projects related to gadgets (tied with wikis)
  • RelEng projects
  • SRE projects (tied with wikis)
  • tools
  • other projects

In summary, the top 3 tools per project type are:

  • In-repo README
    • 7 times in 1st spot
    • 4 times in 2nd spot
    • 0 times in 3rd spot
    • 0 times outside top 3
  • Wiki
    • 6 times in 1st spot
    • 3 times in 2nd spot
    • 2 times in 3rd spot
    • 0 times outside top 3
  • None (that is, no documentation is present)
    • 2 times in 2nd spot
    • 1 time in 3rd spot
  • 3rd party website
    • 2 times in 3rd spot
  • JSDoc
    • 2 times in 3rd spot
  • doc.wikimedia.org
    • 2 times in 3rd spot
  • In-repo docs
    • 1 time in 3rd spot

Tools per documentation type

[edit]

Wikis are the most popular tool for:

  • administrator documentation
  • project docs (though numbers related to project docs aren't fully reliable in this study)
  • research docs (tied with READMEs)
  • user docs

READMEs are the most popular tool for:

  • developer documentation
  • research docs (tied with READMEs)
  • other documentation

Documentation tools variety

[edit]

While READMEs and wikis are the dominant tools used in documentation, there is still considerable variety in documentation tooling, with 35 other tools used in different projects.

Looking at the number of items per documentation tool produces the following complete breakdown. Note that this record set excludes items representing repositories without documentation (None in the previous section).

Tool Count %
Wiki 315 41.61
In-repo readme 315 41.61
In-repo docs 26 3.43
3rd-party website 22 2.91
doc.wikimedia.org 15 1.98
Doxygen 8 1.06
JSDoc 6 0.79
In-app docs 5 0.66
In-code documentation 4 0.53
OpenAPI spec 3 0.40
Sphinx 3 0.40
php-code-coverage 3 0.40
Empty repo 3 0.40
Vitepress 2 0.26
HTML docs 2 0.26
Docpub 2 0.26
Istanbul 2 0.26
Custom documentation generator 1 0.13
Unknown tool 1 0.13
MediaWiki API page 1 0.13
Toolforge 1 0.13
PDF 1 0.13
Manpages 1 0.13
Docs in tool 1 0.13
rustdoc 1 0.13
Redoc 1 0.13
Phabricator 1 0.13
LateX 1 0.13
1 0.13
Docbook 1 0.13
Custom wiki 1 0.13
Jupyter notebooks 1 0.13
Video gif 1 0.13
Demo site 1 0.13
In-repo example 1 0.13
JSDuck 1 0.13
Google Docs 1 0.13
GroovyDoc 1 0.13

Developer documentation in particular uses a large variety of tools.

Documentation type Tool Count % of type total
Developer docs In-repo readme 141 54.02
Developer docs Wiki 60 22.99
Developer docs In-repo docs 14 5.36
Developer docs Doxygen 8 3.07
Developer docs JSDoc 6 2.30
Developer docs 3rd-party website 5 1.92
Developer docs In-code documentation 3 1.15
Developer docs Sphinx 3 1.15
Developer docs OpenAPI spec 3 1.15
Developer docs Vitepress 2 0.77
Developer docs GroovyDoc 1 0.38
Developer docs In-repo example 1 0.38
Developer docs Docpub 1 0.38
Developer docs In-app docs 1 0.38
Developer docs Custom documentation generator 1 0.38
Developer docs Docbook 1 0.38
Developer docs JSDuck 1 0.38
Developer docs Docs in tool 1 0.38
Developer docs MediaWiki API page 1 0.38
Developer docs Demo site 1 0.38
Developer docs Custom wiki 1 0.38
Developer docs PDF 1 0.38
Developer docs Jupyter notebooks 1 0.38
Developer docs LateX 1 0.38
Developer docs Redoc 1 0.38
Developer docs rustdoc 1 0.38

Least documented projects

[edit]

Personal projects and tools are the only project types with a significant number of repositories without documentation. 34% and 31% of these projects respectively had no documentation.

Project type/Docs With docs Without docs % without docs
Extension 153.0 3.0 1.92
Gadget 3.0 0.0 0
Other 61.0 2.0 3.17
Other - RelEng 13.0 0.0 0
Other - SRE 4.0 0.0 0
Other - affiliate 22.0 0.0 0
Other - analytics 14.0 3.0 17.65
Other - mobile 2.0 0.0 0
Other - personal 53.0 27.0 33.75
Skin 9.0 0.0 0
Tool 83.0 38.0 31.4

Even in situations where a given repository technically has a README or other documentation, there are instances where that documentation doesn't contain any valuable information. I refer to these items as token documentation.

Token documentation was most common among items related to tool repositories, where 37 out of 161 items (almost 23%) contained token docs.

Instances of token documentation appear more often in READMEs than other tools.

Project type/Tool In-repo docs In-repo readme Wiki
Extension 2.0 9.0 6.0
Gadget 0.0 2.0 0.0
Other 0.0 2.0 0.0
Other - affiliate 0.0 2.0 0.0
Other - analytics 0.0 4.0 0.0
Other - personal 0.0 9.0 0.0
Tool 0.0 37.0 3.0

Documentation presence vs contributor types

[edit]

While this wasn't the primary purpose of the research, I think it's worth mentioning that based on available data, correlation between contributor types and documentation presence is weak. Specifically, presence of documentation in repositories varies by contributor type involved in the project in two broad groups:

  • for affiliates and commercial entities, documentation is present in 98% and 93% of repositories respectively
  • for Wikimedia Foundation staff and volunteers, documentation is present in 88% and 87% of repositories respectively

Documentation absent from repositories by contributor type:

Contributor Count Contributor total % without docs
Affiliates 1 55 1.81
Commercial entities 1 15 6.66
Other 0 1 0
Staff 36 307 11.72
Volunteers 36 287 12.54

Notably, documentation absence is less of a problem in repositories with a higher number of contributor types:

Repository/Contributor type count 1 2 3
With documentation 269.0 122.0 26.0
Without documentation 68.0 3.0 0.0

Recommendations

[edit]

Based on the results of this study I formulated the following documentation strategy recommendations for the Technical Documentation Team:

  1. Ensure high quality tooling and templates for README files and wiki documentation. This is already available for wiki documentation with style guides and templates at Technical style guides and templates and Toolkit . READMEs could use a bit more attention.
  2. Ensure adequate platform support for less popular documentation tools. Support other Wikimedia Foundation teams in building solutions that work with a broad variety of documentation tools.
  3. Keep monitoring the state of documentation tooling and find ways to facilitate, streamline, and standardize usage of popular tools and processes. This has already been the case since the formation of the Technical Documentation Team as evidenced by the team's ownership of doc.wikimedia.org and the JSDoc migration project.
  4. Seek out opportunities for introducing documentation early in personal repositories and tool projects. This could be done, for example, by improving mechanisms of generating repositories.