Wikimedia Technical Documentation Team/Documentation tool study
This page contains the final report of the documentation tool study conducted in August-October 2024 by the Wikimedia Foundation Technical Documentation Team (tracked in task T371670, research and report by KBach-WMF). The goal of the study was to better understand the tools and processes used by the Wikimedia Movement when creating and maintaining technical documentation . This report contains the main takeaways from the study and recommendations for the next steps.
Methodology
[edit]This project consisted of three phases, each with its own assumptions, methods, tools:
- Data collection - collecting the information about documentation tools and processes used in repositories hosted on Wikimedia GitLab and Gerrit. This was a manual process that involved investigating repository content, checking the linked resources outside the repository (wiki pages, websites, etc.), and checking the common documentation locations for resources connected to the repository.
- Data analysis - loading the data about all repositories and items into a database, and then processing and analyzing it using SQL, Python, and Pandas.
- Data report - creating the final presentation and report using Jupyter Lab, and then presenting the study outcomes to the Technical Documentation Team and the general audience.
Repositories
[edit]The repositories from Wikimedia GitLab and Gerrit included in the study are as follows.
GitLab:
- A large selection of
toolforge-repos
projects - All affiliate projects
- A selection of popular (based on star count) personal and general projects
- Small selection of random repositories
Gerrit:
- Random selection of non-archived repositories
Items
[edit]Each identified documentation tool, process, or product was added to the list of documentation items. For empty repositories and repositories without any documentation, a single item marked Empty repo or No docs was added to the same list.
As a result, an item can represent:
- an empty repository
- a repository with no documentation
- a section on a wiki page, a wiki page, a collection of wiki pages clustered together
- a file, a folder with many files, separate files spread throughout a repository
- an entire website
- a tool used to process or produce any of the former
Data set
[edit]The data set contains two data structures in the form of tables, one representing repositories, and one representing items. The total number of repositories included in the survey is 611. The total number of items generated based on these repositories is 952.
In the data set, 80.2% of repositories are not empty and 19.8% are empty. Wikimedia Foundation staff members contributed to 50.74%, volunteers - 48.28%, affiliates - 9%, commercial entities - 2.45%, and others to 0.16% of the repositories.
Most (94%) of the identified 121 empty repositories belonged to tools. Empty repositories were generally excluded from the analysis.
Main takeaways
[edit]State of documentation
[edit]Over 85% of non-empty repositories have some sort of documentation.
Count | Docs | % |
---|---|---|
417 | With docs | 85.1 |
73 | Without docs | 14.9 |
Most popular documentation tools
[edit]READMEs and wiki pages are the most common documentation tools with 39.25% and 35.75% repositories respectively using them in documentation. All other tools add up to almost 16%, only seven percentage points above the number of repositories without documentation.
Tool | Count | % |
---|---|---|
README | 314 | 39.25 |
Wiki | 286 | 35.75 |
Other | 127 | 15.88 |
None | 73 | 9.12 |
Tools per project type
[edit]Looking at which documentation tools are most popular in specific project types:
Wikis are the most popular documentation tool for:
- affiliate projects
- extensions
- projects related to gadgets (tied with READMEs)
- projects related to mobile apps
- skins
- SRE projects (tied with READMEs)
READMEs are the most popular documentation tool for:
- personal projects
- projects related to analytics
- projects related to gadgets (tied with wikis)
- RelEng projects
- SRE projects (tied with wikis)
- tools
- other projects
In summary, the top 3 tools per project type are:
- In-repo README
- 7 times in 1st spot
- 4 times in 2nd spot
- 0 times in 3rd spot
- 0 times outside top 3
- Wiki
- 6 times in 1st spot
- 3 times in 2nd spot
- 2 times in 3rd spot
- 0 times outside top 3
- None (that is, no documentation is present)
- 2 times in 2nd spot
- 1 time in 3rd spot
- 3rd party website
- 2 times in 3rd spot
- JSDoc
- 2 times in 3rd spot
- doc.wikimedia.org
- 2 times in 3rd spot
- In-repo docs
- 1 time in 3rd spot
Tools per documentation type
[edit]Wikis are the most popular tool for:
- administrator documentation
- project docs (though numbers related to project docs aren't fully reliable in this study)
- research docs (tied with READMEs)
- user docs
READMEs are the most popular tool for:
- developer documentation
- research docs (tied with READMEs)
- other documentation
Documentation tools variety
[edit]While READMEs and wikis are the dominant tools used in documentation, there is still considerable variety in documentation tooling, with 35 other tools used in different projects.
Looking at the number of items per documentation tool produces the following complete breakdown. Note that this record set excludes items representing repositories without documentation (None in the previous section).
Tool | Count | % |
---|---|---|
Wiki | 315 | 41.61 |
In-repo readme | 315 | 41.61 |
In-repo docs | 26 | 3.43 |
3rd-party website | 22 | 2.91 |
doc.wikimedia.org | 15 | 1.98 |
Doxygen | 8 | 1.06 |
JSDoc | 6 | 0.79 |
In-app docs | 5 | 0.66 |
In-code documentation | 4 | 0.53 |
OpenAPI spec | 3 | 0.40 |
Sphinx | 3 | 0.40 |
php-code-coverage | 3 | 0.40 |
Empty repo | 3 | 0.40 |
Vitepress | 2 | 0.26 |
HTML docs | 2 | 0.26 |
Docpub | 2 | 0.26 |
Istanbul | 2 | 0.26 |
Custom documentation generator | 1 | 0.13 |
Unknown tool | 1 | 0.13 |
MediaWiki API page | 1 | 0.13 |
Toolforge | 1 | 0.13 |
1 | 0.13 | |
Manpages | 1 | 0.13 |
Docs in tool | 1 | 0.13 |
rustdoc | 1 | 0.13 |
Redoc | 1 | 0.13 |
Phabricator | 1 | 0.13 |
LateX | 1 | 0.13 |
1 | 0.13 | |
Docbook | 1 | 0.13 |
Custom wiki | 1 | 0.13 |
Jupyter notebooks | 1 | 0.13 |
Video gif | 1 | 0.13 |
Demo site | 1 | 0.13 |
In-repo example | 1 | 0.13 |
JSDuck | 1 | 0.13 |
Google Docs | 1 | 0.13 |
GroovyDoc | 1 | 0.13 |
Developer documentation in particular uses a large variety of tools.
Documentation type | Tool | Count | % of type total |
---|---|---|---|
Developer docs | In-repo readme | 141 | 54.02 |
Developer docs | Wiki | 60 | 22.99 |
Developer docs | In-repo docs | 14 | 5.36 |
Developer docs | Doxygen | 8 | 3.07 |
Developer docs | JSDoc | 6 | 2.30 |
Developer docs | 3rd-party website | 5 | 1.92 |
Developer docs | In-code documentation | 3 | 1.15 |
Developer docs | Sphinx | 3 | 1.15 |
Developer docs | OpenAPI spec | 3 | 1.15 |
Developer docs | Vitepress | 2 | 0.77 |
Developer docs | GroovyDoc | 1 | 0.38 |
Developer docs | In-repo example | 1 | 0.38 |
Developer docs | Docpub | 1 | 0.38 |
Developer docs | In-app docs | 1 | 0.38 |
Developer docs | Custom documentation generator | 1 | 0.38 |
Developer docs | Docbook | 1 | 0.38 |
Developer docs | JSDuck | 1 | 0.38 |
Developer docs | Docs in tool | 1 | 0.38 |
Developer docs | MediaWiki API page | 1 | 0.38 |
Developer docs | Demo site | 1 | 0.38 |
Developer docs | Custom wiki | 1 | 0.38 |
Developer docs | 1 | 0.38 | |
Developer docs | Jupyter notebooks | 1 | 0.38 |
Developer docs | LateX | 1 | 0.38 |
Developer docs | Redoc | 1 | 0.38 |
Developer docs | rustdoc | 1 | 0.38 |
Least documented projects
[edit]Personal projects and tools are the only project types with a significant number of repositories without documentation. 34% and 31% of these projects respectively had no documentation.
Project type/Docs | With docs | Without docs | % without docs |
---|---|---|---|
Extension | 153.0 | 3.0 | 1.92 |
Gadget | 3.0 | 0.0 | 0 |
Other | 61.0 | 2.0 | 3.17 |
Other - RelEng | 13.0 | 0.0 | 0 |
Other - SRE | 4.0 | 0.0 | 0 |
Other - affiliate | 22.0 | 0.0 | 0 |
Other - analytics | 14.0 | 3.0 | 17.65 |
Other - mobile | 2.0 | 0.0 | 0 |
Other - personal | 53.0 | 27.0 | 33.75 |
Skin | 9.0 | 0.0 | 0 |
Tool | 83.0 | 38.0 | 31.4 |
Even in situations where a given repository technically has a README or other documentation, there are instances where that documentation doesn't contain any valuable information. I refer to these items as token documentation.
Token documentation was most common among items related to tool repositories, where 37 out of 161 items (almost 23%) contained token docs.
Instances of token documentation appear more often in READMEs than other tools.
Project type/Tool | In-repo docs | In-repo readme | Wiki |
---|---|---|---|
Extension | 2.0 | 9.0 | 6.0 |
Gadget | 0.0 | 2.0 | 0.0 |
Other | 0.0 | 2.0 | 0.0 |
Other - affiliate | 0.0 | 2.0 | 0.0 |
Other - analytics | 0.0 | 4.0 | 0.0 |
Other - personal | 0.0 | 9.0 | 0.0 |
Tool | 0.0 | 37.0 | 3.0 |
Documentation presence vs contributor types
[edit]While this wasn't the primary purpose of the research, I think it's worth mentioning that based on available data, correlation between contributor types and documentation presence is weak. Specifically, presence of documentation in repositories varies by contributor type involved in the project in two broad groups:
- for affiliates and commercial entities, documentation is present in 98% and 93% of repositories respectively
- for Wikimedia Foundation staff and volunteers, documentation is present in 88% and 87% of repositories respectively
Documentation absent from repositories by contributor type:
Contributor | Count | Contributor total | % without docs |
---|---|---|---|
Affiliates | 1 | 55 | 1.81 |
Commercial entities | 1 | 15 | 6.66 |
Other | 0 | 1 | 0 |
Staff | 36 | 307 | 11.72 |
Volunteers | 36 | 287 | 12.54 |
Notably, documentation absence is less of a problem in repositories with a higher number of contributor types:
Repository/Contributor type count | 1 | 2 | 3 |
---|---|---|---|
With documentation | 269.0 | 122.0 | 26.0 |
Without documentation | 68.0 | 3.0 | 0.0 |
Recommendations
[edit]Based on the results of this study I formulated the following documentation strategy recommendations for the Technical Documentation Team:
- Ensure high quality tooling and templates for README files and wiki documentation. This is already available for wiki documentation with style guides and templates at Technical style guides and templates and Toolkit . READMEs could use a bit more attention.
- Ensure adequate platform support for less popular documentation tools. Support other Wikimedia Foundation teams in building solutions that work with a broad variety of documentation tools.
- Keep monitoring the state of documentation tooling and find ways to facilitate, streamline, and standardize usage of popular tools and processes. This has already been the case since the formation of the Technical Documentation Team as evidenced by the team's ownership of doc.wikimedia.org and the JSDoc migration project.
- Seek out opportunities for introducing documentation early in personal repositories and tool projects. This could be done, for example, by improving mechanisms of generating repositories.