Jump to content

MediaWiki Product Insights/Artifacts/KR 5.2: Simplify feature development/5.2.1:Hook Survey

From mediawiki.org

Hypothesis 5.2.1: classification of the types of hooks and extension registry properties

[edit]

In the first quarter of the fiscal year 2024/25, we surveyed 76 (out of 464) hook definitions and 165 (of 1408) hook handler implementations used by MediaWiki extensions deployed on Wikimedia sites. Based on the survey data we identified several opportunities for exploration and evolution, and recommend three explorations to be scheduled over the course of the current fiscal year:

First, we recommend exploring the use of the listener pattern to replace hooks that represent domain events, to provide a semantically cleaner way to notify extensions of such events.

Second, we recommend exploring the use of the provider and handler patterns as an alternative to hooks that are primarily used to register and declare (rather than implement) business logic.

Third, we recommend exploring options for being more deliberate about extension’s access to core service objects and configuration by enforcing component boundaries between extensions and core.

The proposed explorations will benefit from synergy with work for the WE5.2.2 hypothesis (Notification redesign) and WE2.4.1 (wikifunctions).

Context

[edit]

Hypothesis WE5.2.1 (Q1 24/25): If we make a classification of the types of hooks and extension registry properties used to influence the behavior of MediaWiki core, we will be able to focus further research and interventions on the most impactful.

Motivation: Hooks are the primary mechanism for extensions to add to (or modify) MediaWiki's behavior. Hooks are essentially callbacks, and there are few restrictions on what they can do. This makes them very powerful and flexible, but it also means they can be hard to reason about. Extensions sometimes interact in surprising ways, or rely on assumptions that may be broken even by subtle changes to the MediaWiki platform or other extensions.

To improve this situation, the interface between extensions and MediaWiki needs to become more expressive and less brittle. This can be done by changing the signature and contract of hooks, or by replacing hooks with alternative mechanisms like the registry/provider pattern. A survey of current hook usage in various extensions will help us ground improvements in knowledge about real-world needs and help with discovering potential difficulties early.

Approach: Research patterns in how hooks are used by the extensions deployed on WMF sites. To that end, we will analyze a sample of the ~1400 hook handlers along with other extension mechanisms.

Once the raw data has been collected, we look for common patterns and, based on these, propose changes that would reduce toil and improve maintainability.

Investigation

[edit]

Hook Survey

[edit]

Data: Hook Usage by Extension, July 2024 as well as Hook Survey 1 (WE5.2.1 FY24/25) and Hook Survey 2 (WE5.2.1 FY24/25)

The goal of the hook survey was to analyze the behavior of a sample of hook handlers implemented by extensions. We did so in two batches, the first one informing the choice of hooks for the second.

For the first batch, we wanted to look at the most used hooks, as well as the most critical extensions. To find the most used hooks, we analyzed all the extension.json files of the extensions we use in production.

The ten most-used hooks turned out to be:

  1. BeforePageDisplay
  2. LoadExtensionSchemaUpdates
  3. GetPreferences
  4. ParserFirstCallInit
  5. SkinTemplateNavigation::Universal
  6. ListDefinedTags
  7. ChangeTagsListActive
  8. PageSaveComplete
  9. ResourceLoaderRegisterModules
  10. SidebarBeforeOutput

For each of these ten hooks, we picked five extensions that implement handlers for them, and analyzed the behavior of these handlers. In addition, we looked at all hook handlers implemented by three extensions that were identified as representative examples of critical functionality is implemented using hooks:

  • AbuseFilter because it integrates deeply with the editing process, defines a persistence layer and provides a complex user interface.
  • CategoryTree because it extends the wikitext syntax, exposes an API that it calls from  its own client side code.
  • Echo (aka Notifications) to support the work on WE5.2.2 that aims to replace the use of hooks by the Echo extension with a more expressive interface. It also serves as an example of an extension that defines its own hooks that are then used by other extensions (e.g. by AbuseFilter).

To gain insights on the usage of hooks in extensions, we surveyed the hook definitions and hook handlers according to the survey instructions and recorded the results in the respective spreadsheet. The data was then analyzed to identify patterns.

From the usage patterns we derived the following main classes of hooks:

  • filters: Hooks designed to provide the caller with information. Handlers of filter hooks communicate information to the caller by updating a data structure provided to them as a parameter. Handlers of filter hooks should not write to the database. The majority of hooks we surveyed fall into this class (39 out of 54 hooks).
  • provider: Hooks designed to allow extensions to register code components. For simple cases, this is functionally equivalent to the “components” class of registration properties. Of the hooks we analyzed, 9 fell into this class, including 5 of the top ten hooks.
  • events: Hooks designed to notify extensions about a change to the site’s persistent state. Handlers for event hooks typically write to the database. 13 of the hooks we analyzed fell into that category.
  • pre-fetch: Hooks called before displaying a listing, to give extensions a chance to perform a bulk-query for information that is later used by another hook that gets called for each entry in the list.

After the analysis of the first batch was complete, we selected a second batch of hooks to analyze, to validate this classification and to collect additional insights to support the work on hypothesis WE5.2.2. The second batch includes all remaining hooks that seem to belong to the “event” class as well as all hooks defined by (rather than implemented by) the Echo extension.

The analysis of the second batch of hooks didn’t provide any new insights into the general use of hooks by extensions, but it validated the classification we derived in the first patch, and it did provide some additional information about the way other extensions hook into the Echo extension. This will be useful for designing notification infrastructure for MediaWiki core as part of the work on WE5.2.2.

Extension Registration Properties

[edit]

Data: Extension Registration Survey, July 2024

This survey provides an overview of the use of different registration properties in the extension.json files of the 197 extensions used on Wikimedia sites. We found 101 different properties in use, which fall into four categories (plus the attributes property):

  • code components: 28 properties allow extensions to register business logic, mostly using the ObjectFactory mechanism.
  • configuration: 41 properties allow extensions to modify the core configuration, e.g. to declare permissions or namespaces.
  • resources: 16 properties allow extensions to register resources (files), e.g. for localization messages or style sheets.
  • meta: 15 properties provide meta-information about the extension

In addition, the attributes property can be used by extensions to provide information to other extensions - attributes may in turn again define components, configuration, or resources. This mechanism is used by 40 extensions.

In the context of this hypothesis, the most relevant aspect of extension registration is the injection of business logic through code components. The mechanism used most for this purpose are hooks, through the Hooks and HookHandlers properties (used 179 and 174 times, respectively). This validates our assumption that we should focus on hooks for identifying opportunities to improve the integration interface between extensions and MediaWiki core.

During our investigation, we also looked at which logic provided by MediaWiki core is used from within extensions. We did this by looking at the use of service objects referenced in component registrations in extension.json files. We surveyed which service objects are used most, to get an idea of what core functionality extensions most rely on. Unsurprisingly, the most-used service object by far is MainConfig, which provides access to the site’s configuration: it is used 125 times (followed by DBLoadBalancerFactory, which is used 49 times). The fact that MainConfig is used so much indicates that it may be worth investigating a more convenient mechanism to provide extensions access to configuration.

Recommendations

[edit]

Based on the observations made during the survey described above, we recommend the following experiments to define new patterns that will allow extensions to integrate with MediaWiki core in a more expressive and sustainable way:

Event Listeners

[edit]

The goal of the Event Listeners exploration is to define events that are emitted by MediaWiki core, which can later be handled by listeners in other components of core, in extensions, or by logic running in a stand-alone service. Following the idea of domain events as defined in domain driven design, these events represent changes to the observable state maintained by a given component (or bounded context). This matches the behavior that had evolved organically for the hooks that we classified as “events” in the survey.

We expect the application of the listener pattern to have the following benefits for the MediaWiki platform:

  • Improve component boundaries between core components by applying the listener pattern. Listeners remove the need for the code that affects a change to know about all code that needs to be informed about it.
  • Clarify the semantics of the notification received by the extension, particularly with respect to transactional context.
  • Make the extension interface more future proof by avoiding the rigidity imposed by using PHP interfaces to define hook parameters. Due to limitations of PHP, method signatures defined by extensions can’t be modified in a backwards-compatible way.
  • Make it easier to publish state changes on an event bus (Kafka) by implementing a generic relay mechanism.

To introduce the listeners into MediaWiki core we will define interfaces for emitting events and for registering listeners. We will implement a mechanism for dispatching events to the relevant listeners if and when the change in state has been successfully committed to the storage layer.

The proposed system can be implemented as a self-contained system, built on top of existing infrastructure such as HookContainer, DeferredUpdates, and EventRelayer in a straightforward way. It will be backwards compatible so existing extensions remain functional without any change, and implemented in such a way that allows us flexibility to change the implementation later, without affecting listeners. This allows us to first use the new system on a small selection of extensions and core components, and to adjust the interfaces and implementation as we expand usage.

Introducing the new system and testing it on just one of the 13 event hooks and a single extension is expected to take a small team of engineers one quarter. We intend to start with an event that represents page updates, because this would benefit a variety of different use cases both in core and in extensions.

Replacing all relevant hooks and converting more extensions will take longer. To avoid maintaining multiple competing systems for the same purpose for an extended period of time, a concerted effort should be made to migrate the majority of extensions by the end of the fiscal year. Migration should be straightforward and generally lead to a reduction in boilerplate code.

Providers and Handlers

[edit]

The goal of the Providers and Handlers intervention is to use declarations instead of executable code when registering logic components. This will make the registration process less ad-hoc and easier to reason about. It will also allow us to create tooling that analyzes which extension registers what, just like we did for the hook survey above.

We already have 28 registries supported by the extension.json system, adding a few more to cover the functionality of the most prominent “provider” hooks should be straightforward. The most impactful approach seems to be to look at the top 5 provider hooks, all of which are in the top ten of the most used hooks overall. Specifically:

  • LoadExtensionSchemaUpdates (used by 51 extensions): this registered schema updates that are run when the extension is installed or updated.
  • ParserFirstCallInit (used by 42 extensions): used to define extensions of the wikitext syntax using implementations of MagicWord, ParserHook, and ParserFunction. This has already been proposed and investigated as part of the Parsoid Parser Unification project (see T299528).
  • ListDefinedTags and ChangeTagsListActive (used by 23 extensions): used to define custom edit tags.
  • ResourceLoaderRegisterModules (used by 20 extensions): used to register files that contain client side code.

A detailed survey of existing hook handler implementations will need to be part of this investigation to surface any edge cases that cannot be covered by a purely declarative approach, such as provider-style hooks depending on configuration. In some cases it may turn out that the hook cannot be replaced entirely, but perhaps most uses of it can be replaced with declarations, while a few remain to cover for edge cases.

It is expected to take a small team of engineers one quarter to explore the use of a declarative declaration mechanism for at least one of the four cases listed above.

Component Boundaries

[edit]

The goal of the Component Boundaries exploration is to provide extensions with easier access to service objects and configuration they need while discouraging access to service objects and configuration that crosses domain boundaries. There is two aspects to this:

Firstly, extensions should have easy access to their own configuration, but should avoid accessing configuration of MediaWiki core or other extensions. From the survey of core service objects in extensions, we know that the MainConfig object is the most used service object. If we can confirm the suspicion that it is used primarily to access configuration defined by the extension, it could be replaced by a config object limited to accessing configuration defined by that extension. This way, it would be easier for extensions to access their own configuration, and access of configuration owned by other components would be discouraged.

Secondly, the set of service objects that can be injected into hook handlers should be limited by implementing a system of component-specific service containers: Hooks could specify which component they belong to, so we could limit the set of service objects available to the hook handler based on that. That way, a handler for a hook in one component would be discouraged from depending on logic in another component, improving separation of concerns. For example, this would prevent a hook handler in the “media” component from interfering with user accounts.

These two aspects, configuration and service containers, should be considered and designed in tandem, but could be explored and implemented separately. We may benefit from investigations of similar ideas that have already been conducted as part of the Parsoid integration effort (see Parsoid/Extension API).

Further Explorations

[edit]

Observations and ideas that arose during the hook survey, and may be worth investigating at a later time:

  • Use the hook classification system in documentation and establish best practices around it.
  • Disable write access to the database during the execution of hooks that do not explicitly allow it. Only handlers for “event” style hooks should be able to write to the database.
  • Investigate if it would be beneficial for extensions to replace or wrap core service objects (decorator pattern) instead of using hooks to modify its behavior. Redefining service wiring is already possible, but rarely done. We could make this easier for desirable use cases, while discouraging uncontrolled use.
  • Add tracing to track hook usage live. Using a system like OpenTelemetry we could track which hooks are used how often, but also which hook handler interacts with the database, how long it takes, etc. To avoid overloading the stats system, this could be done on a small sample of requests only, e.g. 1/1000. Or we only enable it in CI.
  • Use static analysis of the call tree (e.g. using Huma) to determine hook behavior.
  • Reduce the unnecessary exposure of mutable objects as hook parameters (e.g. WikiPage, Parser) by replacing them with more narrow interfaces that match the intent of the hook.
  • Explore replacing the “attributes” mechanism in the ExtensionRegistry with two more specific mechanisms: a generic component registry, and a way for extensions to declare and set configuration.
  • Investigate using the middleware pattern to replace filter hooks. A “middleware style” hook handler would be responsible for calling the “next” handler, and the last handler would be the default implementation of the business logic. This pattern can be used to manipulate inputs as well as outputs of the default implementation, as well as replacing it entirely. Note that this could also be implemented as wrappers around service objects, instead of using the hook system (decorator pattern).

References:

[edit]