Jump to content

User:Roan Kattouw (WMF)/ResourceLoader submodules

From mediawiki.org

This is an idea for how we could support submodules for large modules that expose many small things with dependencies between them. This is commonly the case for libraries (like OOUI). Right now, our only option is to subdivide these modules into smaller modules, but that also increases the size of the startup module by increasing the number of modules. We'd like to be able to do fine-grained tree-shaking of large libraries, but in the current system that would require creating too many modules.

The basic idea of this proposal is as follows:

  • Let modules depend on individual files from another module (if the depended-on module is a package module)
  • Allow files in package modules to express dependencies on each other, and on other modules
  • Simplify/consolidate this information in the manifest, so that only module-level information is exposed to the client
  • The client doesn't know exactly what parts of modules it's asking for, but it passes enough information so that the server can figure it out

Per-file dependencies for package modules

[edit]

Let files in package modules define dependencies for each file. These dependencies could be other files in the same module (internal), or other modules (external). This information would not be exposed directly in the module manifest in the startup module: internal dependencies are omitted completely, and external dependencies are consolidated at the module level.

{
    "foo": {
        "packageFiles": [
            {
                "file": "one.js",
                "dependencies": [
                    "foo/two.js",
                    "bar"
                ]
            },
            {
                "file": "two.js",
                "dependencies": [
                    "bar"
                ]
            },
            {
                "file": "three.js",
                "dependencies": [
                    "baz"
                ]
            }
        ]
    }
}

The module definition above expresses an internal dependency (one.js depends on two.js) and several external dependencies. In the startup manifest, this will be simplified to say that foo depends on bar and baz.

Allow modules to depend on files from other modules

[edit]

Using the moduleName/filename.js syntax, also used above for internal dependencies, modules can depend on files from other modules, and then load these using require( 'moduleName/filename.js' )

{
    "quux": {
        "dependencies": [
            "foo/one.js",
            "blah"
        ]
    }
}

In the startup manifest, this is simplified to say that quux depends on foo and blah. It will also say that the dependency on blah is a full dependency and the dependency on foo is a partial dependency, without saying exactly which file(s) it depends on[1].

How the client deals with partial dependencies

[edit]

When the client is asked to load quux, it sees that it has a full dependency on blah and a partial dependency on foo, and that foo in turn depends on bar and baz. Assuming none of these modules have been loaded yet, the client sends a request to the server indicating it wants all of quux, blah, bar and baz, and part of foo. It doesn't know what part it needs, but it indicates that foo needs to be loaded partially[2]. The server can figure out which parts are needed by looking at which files within foo are depended on by the modules that are being requested.

Note that there is an inefficiency here: baz is requested even though we won't need it, because we aren't going to load the part of foo that it depends on, but the client doesn't know that.

How the server resolves partial dependencies

[edit]

The server gets a request asking for all of quux, blah, baz and quux, and part of foo. The server determines which parts of foo are needed by looking at which files in foo the fully-loaded modules depend on, then resolving internal dependencies. It finds that quux depends on foo/one.js, which in turn depends on foo/two.js. It responds with the full contents of the fully-requested modules, and the partial contents of foo (only one.js and two.js, but not three.js).

How the client manages state for partially-loaded modules

[edit]

The client receives a partial response for foo, which is flagged as such[3]. It makes these files available for loading with require(), but it doesn't mark the module as fully loaded. If, later, a module is loaded that also has a partial dependency on foo, the client will follow the same protocol and let the server figure out which files to send, which might duplicate files it already has. If this happens, the client will simply ignore the files in the response that it has already loaded. If, later, the full foo module is asked to be loaded in its entirety (or a module is loaded that has a full dependency on foo) , the client will ask the server for the entire module, and again ignore the files it already has.

Inefficiencies

[edit]

This proposal has two main inefficiencies. First, and more important, when a module is loaded partially, all of its external dependencies are loaded too, even the ones that aren't needed for the files that are being loaded. This is difficult to avoid with this architecture, and it might be an issue if there are many unnecessary dependencies that are loaded this way or if the unnecessary dependencies are large. I don't think there's a good way of dealing with this other than splitting the module.

Secondly, if the same module is partially loaded twice, to satsify different dependencies in different requests, some of its files could be downloaded twice. I don't think this will be much of an issue, because this is likely to be infrequent (the same module being partially loaded twice, on separate occasions, on the same page won't happen often) and the impact is likely to be low (few files double-loaded each time). If there is a large "core" part of the module that almost all files depend on, breaking that out into a separate module could address that.

Examples for where this could be used

[edit]
  • OOUI icon packs: these consist of fully independent parts (individual icons) with no internal or external dependencies. All OOUI icons could be put in one big module, with each module using them specifying which exact icons it needs
  • OOUI itself: each widget could, in principle, be exposed separately
  • mediawiki.widgets.*: There is a mediawiki.widgets module with relatively unrelated widgets, and there are 16 more mediawiki.widgets.SomethingWidget modules (and 5 mediawiki.widgets.SomethingWidget.styles modues) that contain individual widgets. These could potentially be consolidated into one omnibus mediawiki.widgets module.

Open questions

[edit]
  • Should modules have to opt into letting other modules depend on their files, or should it be allowed for all package modules? If a module allows other modules to load its files, should all its files be exposed, or only a limited list of files that it specifies?
  • How do we support CSS files? We'd need this for OOUI (widgets come with styles) and for icons (which are only CSS). In the code for .vue file support, we do have an internal content type script+style that allows CSS to be bundled with a (JS) package file, but we don't allow this to be used in the module definition (yet). This may be as simple as allowing .css (and .less) as a package file extension that maps to a script+style with an empty script part, then having JS files express internal dependencies on the CSS files they need.
  • Should we / Would we need to allow direct loading of individual files? Icon packs are often loaded directly through addModuleStyles(), the mediawiki.widgets.*.styles modules are too, and in some cases we may want to consolidate many modules into a single module but still be able to load parts of it directly through addModules(),

Footnotes

[edit]
  1. This means having to send another boolean flag along with every dependency. Note that, in the startup manifest, a module's dependencies are expressed as an array of numbers, where each number is an index into the modules array. One hacky way of encoding this boolean for each dependency could be to use the sign of this number: encode full dependencies as positive numbers (as they currently are) and partial dependencies as negative numbers. (This may require using 1-indexing, because -0 is difficult to work with in JS.)
  2. This could be done, for example, by listing the partially-requested modules in a separate parameter: &modules=quux|blah|bar|baz&partialModules=foo
  3. We'd need some way in the mw.implement() call to distinguish. Our current format for full package modules is mw.implement('foo', {main: "mainFile.js", "files": {"one.js":function...}}), so perhaps the format for a partial response could drop the "main" field, or replace it with partial: true.