It makes me sad that the architecture principles codify a business model that is not backed up by any sort of research or strategy. I think the most pressing question that needs to be answered is: What problem(s) does MediaWiki solve and for whom does it solve them for? A lot of the principles attempt to answer this question, which is not a question that engineers could possibly have the answer to, and business people shouldn't attempt to answer it without a significant amount of customer development.
Talk:Wikimedia Engineering Architecture Principles
Appearance
I thought I read the whole thing but could you point out where it codifies a business model for me? I'm dense I guess.
Mostly the points under: "To provide a web application that can be freely used to collaboratively collect and share knowledge," which codifies who our "customer" is. It attempts to answer the question I asked above without actually having performed the customer development needed in order to answer it.
Obviously, providing MediaWiki to "be freely used to collectively collect and share knowledge" is a noble purpose that is consistent with our mission, and is a valuable business strategy; however, I think it's important to know what problem(s) MediaWiki solves and for whom it solves them for. Without knowing that, how could we know if MediaWiki is actually solving our "customers" problem(s)? If it does, is it the complete solution? What other products and services do they need to make it work? Are we going to provide these products and services (for a fee?) or are we going to partner with others in order to provide them?
Releasing MediaWiki as a software product for use by third parties has been part of WMF's "business plan" for as long as it has existed. Re-assessing the value of this for our mission would certainly be good, but as long as there is no decision to drop the goal of releasing MediaWiki for third parties, it is a goal, and it should be present in the principles.
When the product goals and requirements change, the architecture principles change. The principles apply the product goals to the technical realm. Such such, they indeed codify product goals. But they do not dictate them.
Anyway, let'S pelase continue this on phab:T220657#5135741
"Extensible and Sustainable" declares the rights, but not the obligations. Technical Collaboration Guidance/Principles doesn't say anything about that either. Forgive me if it is out of scope, but think is an important theme that is not (properly) mentioned.
These are not principles of architecture, so they are out of scope. The closest we have here is the FAST/CHANGE point. I agree we should have guidelines for planning processes and strategic decision making, but that's not what this document is about.
Let's please keep the discussion on phabricator, though: phab:T220657. If you want to discuss this further, please copy our conversation there.
What's the relationship with Architecture guidelines? Should that be archived now? Or interlinked with this document?
You are right that the Architecture guidelines should be mentioned and the relationship clarified. The guidelines are more concrete, and mostly about how to make changes to MediaWiki core. They are less about what we are trying to achieve. So I think they are much closer to what I have been writing on User:DKinzler (WMF)/Software Design Practices than to the principles. I'll look into consolidating these (as well as Manual:Coding_conventions/PHP).
I have added a link from the guidelines here. I'll link back to them from the non-normative section of the principles.
Perhaps more goals should be made into strong requirements (MUST), e.g. the principle of "data austerity", to collect and retain only the data we actually need.
While we should be careful not to create hard requirements that we cannot always meet, which would lead to such requirements to not be taken seriously, we shouldn't be "doubly soft": We can for instance say that horizontal scalability MUST be a design goal for services with high load - it being a goal does not mean the software cannot be deployed if this goal is not be fully met.
Currently the document uses:
- SHOULD - 49 times
- MUST - 8 times
- MAY - 1 time
I just read the whole thing for the first time, and honestly it felt like a passive aggressive attack from someone giving drive-by code review.
Even the document itself only has a SHOULD endorsement from the Wikimedia Technical Committee.
Agreed. This struck me right away when reading the document. As a reflection of current reality, it seems pretty accurate, but if it's meant to be prescriptive (as I assume is the case), I'd like to see stronger stands, or at least some written justification of why SHOULDs are SHOULDs and not MUSTs.
Personally, I consider at least the following current SHOULDs to be MUSTs:
- All points under the heading "To ensure the data integrity of the content on WMF systems, and protect the privacy of our users"
- software that interacts with users MUST be designed to make key functionality available on devices with a variety of capability and restrictions [I'd also add form factors], as well as potentially limited connectivity
- software that interacts with users MUST follow accessibility guidelines
- data formats and APIs that provide access to user generated content MUST be designed to ensure verifiability through the integration of provenance information
- data formats and APIs that provide access to user generated content MUST be designed to provide easy access to all necessary licensing information
Would be nice to change "comprehensive documentation SHOULD be maintained along with the code" to MUST. I don't think that would be controversial. I don't agree with MHolloway about making "follow accessibility guidelines" a MUST. There are rare cases where other considerations (including accessibility) override accessibility guidelines.
What does 'MUST' mean in this case? Is there teeth to it?
> What does 'MUST' mean in this case? Is there teeth to it?
The "teeth" depend on the people enforcing this. At the very minimum, RFCs that violate a MUST will not be approved. Ideally, no code that violates a MUST is deployed.
If we are serious about the MUST, any code that is currently live but violates a MUST would have to be pulled. If we made everything suggested in this thread a MUST and pulled everything that doesn't comply, we'd have to shut down the site tomorrow. That's actually the reason for having a lot of SHOULD and not that many MUSTs.
Maybe it makes more sense to go with a softer interpretation on MUST, that essentially only applies it to new code and major changes and rewrites. If we interpret it that way, we can have a lot more MUSTs. Does that sound good?
I think it would be better to write the standard that the working group wants even if there are parts that are aspirational. A list of "grandfathered" applications with known violations could be offered as an appendix if needed and be footnoted into the standard when appropriate.
Uppercase MUST invokes RfC 2119 (MUST: This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.) to anyone familiar with it. I'd rather use it less often than water it up. We can make allowances for legacy code that's already in production, but for new code or changes to existing code MUST should really mean that anything violating that will not be merged or deployed ever, even if there's a deadline or a grant or a mob of editors with pitchforks at the WMF office entrance or whatever.
Also, maybe it would be a worthwhile exercise to list what existing practice (ie. not legacy code but development practices we follow today) violate any of the MUSTs? E.g. "data we offer for re-use MUST use clearly specified data schemas" is probably not true for most things (wikitext? Action API response formats? file metadata? ...I guess it comes down to what exactly is meant by data schema).
I'm happy with more MUST if we are really serious about enforcing it, and nobody comes crying once we do...
I changed a number of MUSTs to SHOULDs now. The document now has 43 SHOULDs and 21 MUSTs.
The document now says "When existing code is discovered to violate a MUST or SHOULD principle, steps for making the code compliant with the architecture principles need to be planned." which is ambitious, but maybe that's a good thing :) Any plans on how this should work in practice? If I find such code, where do I report it, who is responsible for planning the steps, who is responsible for actually making it happen?
The concrete steps necessary will be very different from case to case. Some such changes only take a 20 minute patch, some may need major refactoring or changes in infrastructure.
The only general answer I can give is "track it on phabricator, so it becomes visible".
As to who is responsible - I'd say either the person or group who wrote the offending code (mostly for newer code), or the group who owns owns it (mostly for older code).
Also comprehensive documentation is now a MUST. While most other things can be enforced in the planning or code review phase, documentation normally only happens when the code is live (at which point code authors often lose interest). How do we ensure that it does not get forgotten?
Documentation of architecture and information flow should ideally be written before the code (as specifications, plans, RFCs, etc). They should be required to be merged into the repo along with the code, just like test, and just like method-level and class-level documentation. Documentation (and testing) should not be an afterthought.
Documentation for end-users and wiki-owners will generally not live in the same repo as the code, since it's less "bound" to the code, but it should, ideally, also be written *before* the code, as user stories, UI designs, etc. Turning the plan into proper documentation may happen after the fact, but then it's the responsibility of the team who deployed the feature to make it happen.
In my mind though, the architecture principles don't really apply to documentation for end-users and wiki-owners. Combined with the lead sentence of the section, the principles reads: "To maintain a code base that can be modified with confidence and readily understood, comprehensive documentation MUST be maintained along with the code." I think it's pretty clear that this refers to documentation of the code. Do you think this should be made more explicite?
We’ve consulted with the engineering units in the Audiences department at the Wikimedia Foundation and following are our recommendations. We generally agree with the sentiment of the document, although want to express our strong support for a heightened emphasis on security and user privacy, as well as our consensus view on re-use and contemporary deployability.
-Audiences Engineering leads: Runa Bhattacharjee, Ryan Kaldari, Adam Baso
Before we dig into other items, first, the phrase “MediaWiki Platform Architecture Principles” should be changed to “Wikimedia Engineering Architecture Principles”.
Major Considerations
The framing of the document should make clear that the goal is not to stop all software development, but instead these principles describe the sort of architecture we’d like in the future. We suggest that the “Application” section be amended to note that investment in improving engineering sustainability should be in a healthy balance with investment in feature development.
The following three items should be changed to MUSTs:
- our software and infrastructure SHOULD be designed in such a way to prevent unauthorized access to sensitive information, and to minimize the the impact of individual components getting compromised.
- resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
- our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.
The requirement “software components SHOULD be designed to be reusable, and be published for re-use” should be changed to “software components with broad applicability MUST be designed and published for re-use; software components limited to Wikimedia project-specific use MAY be designed without the need for re-use but MUST be published for auditability”.
The requirements “the MediaWiki stack SHOULD be easy to deploy on standard hosting platforms” and “small MediaWiki instances SHOULD function in low-budget hosting environments” should be amended to reflect the Wikimedia Technical Conference 2018 decision about shared hosting. Additionally, they should be amended to ensure that for each component the target audience and platform should be specified.
We’re unsure how this should be worded, but we believe that observability and analytic instrumentation should always be considered for Wikimedia project components. Not all new or changing components will require observability and analytic instrumentation, but there ought to be a pause to consider this.
The phrase “as well as potentially limited connectivity” should be changed to “with tradeoffs explicitly considered in design for mobile form factors and connectivity.”
Terminology Updates
“scripting languages” should be changed to “programming languages”.
The term “domain model” should be clarified where used.
“(annotated HTML)” should be changed to “(e.g., annotated HTML)”.
Follow Up Actions
As a follow on after ratification of the principles: there is a desire for more concrete examples. For example, there’s a desire for standards on “high granularity” and versioning of web APIs, defined test coverage targets, and guidance on processing existing/discovered technical debt. There is some consideration for these specific examples in Foundation planning, although more concrete examples and some uniformity would be welcome.
I have implemented several changes according to the feedback above:
Before we dig into other items, first, the phrase “MediaWiki Platform Architecture Principles” should be changed to “Wikimedia Engineering Architecture Principles”.
Done.
- our software and infrastructure SHOULD be designed in such a way to prevent unauthorized access to sensitive information, and to minimize the the impact of individual components getting compromised.
- resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
- our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.
Done
The requirement “software components SHOULD be designed to be reusable, and be published for re-use” should be changed to “software components with broad applicability MUST be designed and published for re-use; software components limited to Wikimedia project-specific use MAY be designed without the need for re-use but MUST be published for auditability”.
Done somewhat differently
The phrase “as well as potentially limited connectivity” should be changed to “with tradeoffs explicitly considered in design for mobile form factors and connectivity.”
Done
“scripting languages” should be changed to “programming languages”.
Done. "community with ways to develop workflows using scripting languages" was written with an eye to Scribunto and Gadgets, but I think there is no harm in a broader phrasing. The original wording contained the phrase "on-wiki", but that is gone now anyway.
The term “domain model” should be clarified where used.
Done. Are there any other terms that need clarification, or should be linked to a wikipedia article?
“(annotated HTML)” should be changed to “(e.g., annotated HTML)”.
Done as well.
Thanks. Seeing as you asked about terms needing clarification, here are some more:
- "APIs and libraries" might at present read incorrectly to not include "services". This is always a difficult nomenclature problem, as an API often means the API interface at the class level as well as network exposed API, but services may or may not be network exposed. I usually throw my hands up at this point and use the term "component".
- "through the integration of provenance information" could use an illuminating for-example.
- the term "standard hosting platform" should be disambiguated.
"APIs and libraries" might at present read incorrectly to not include "services".
Well, it includes the APIs of services. The kind of community maintained code we are talking about here includes Gadgets, Lua modules, extensions, and bots. All of these use APIs, and it would be nice if we could supply them with libraries. I will add "services", but it seems redundant - and may be taken to include service objects, as opposed to web-exposed services.
"through the integration of provenance information"
What this really means is "when exposing parts of a wiki page via an API, also expose the relevant citations". But that seems too concrete for include in the policy. Also, re-reading this, it seems like MUST is too strong here. This is rather hard to do. A MUST would block any new feature that doesn't do this.
"standard hosting platform"
This was intended to be future-compatible. It currently means "vanilla LAMP stack with no shell access and no admin rights". But if node.js support becomes standard in such environments, the policy should allow us to make use of that without having to amend it.
Re this:
We’re unsure how this should be worded, but we believe that observability and analytic instrumentation should always be considered for Wikimedia project components. Not all new or changing components will require observability and analytic instrumentation, but there ought to be a pause to consider this.
I now added:
observability and analytic instrumentation SHOULD be explicitly considered in in the design of new components and services.
Does that sound good?
That works. I think the translation here is that for new things there should be a solid reason for not considering it.
Thanks for the feedback! This all sounds pretty reasonable. I'll probably get around to incorporating this and some of the other feedback next week. I'll let you know once that is done, and we can discuss whether the changes I made seem sufficient to you.
Thanks!
You wrote:
The requirements “the MediaWiki stack SHOULD be easy to deploy on standard hosting platforms” and “small MediaWiki instances SHOULD function in low-budget hosting environments” should be amended to reflect the Wikimedia Technical Conference 2018 decision about shared hosting. Additionally, they should be amended to ensure that for each component the target audience and platform should be specified.
I now added:
for every component and feature, the intended target audience and supported target platform MUST be clearly defined.
However, I'm unsure how to incorporate the decision made at TechConf. It reads:
If we commit to an easy-to-use tool for MW platform installation, configuration, and maintenance, then we can drop support of "one-click installs" on shared hosting environments; A special interest group is necessary to further these goals and facilitate implementation (see Wikimedia Technical Conference/2018/Session notes/Choosing installation methods and environments for 3rd party users#Decisions)
The "if" part has not happened, there is no such commitment, and no such special interest group exists. I would be very happy to see this happening, but until then, the policy should document the status quo: MediaWiki has to run on shared hosting.
IMO the current wording is generic enough to incorporate that - if we provided, say, easy-to-use docker containers with a long-term support commitment, that would be a stack that's easy to deploy on standard low-budget hosting platforms (cloud providers being reasonably standard and low-budget these days).
The one thing I'd maybe change is "SHOULD be easy to deploy and maintain" as containers often tend to be easier to deploy than to operate over an extended period of time and sufficient thought is not always given to how they can be kept up-to-date with OS security updates etc.
Ease of maintainance is I think covered by the subsequent bullet points:
- it SHOULD be possible to install and upgrade MediaWiki without much technical knowledge.
- it MUST be possible to upgrade MediaWiki without the risk of losing content or disrupting operation.
Some things maybe worth mentioning:
- interfaces should be written with ease of extensibility without B/C breaks in mind. (E.g. use an option array instead a list of arguments, return an associative array instead of a scalar.)
- APIs (or published data, more generally) should support the use case of history reconstruction, to the extent it is feasible. (E.g. think of all the reasons creating the edit dataset was hard.) The same generic principle is relevant to other use cases as well (e.g. page_props being hard to match to revisions), not sure how to articulate it. Data that gets exposed should always be versioned? (Not in the "schema version" sense, but individually.)
Re "ease of extensibility": We already have "our software architecture SHOULD be modular, with components exposing narrow interfaces, to allow components to be replaced and refactored while maintaining a stable interface towards other components as well as 3rd party extensions." Maybe we can add "Extension points MUST be clearly documented as such, and SHOULD be designed in a way that allows them to remain stable over time".
Re "history construction": I know what you mean, but I see no good way of phrasing this as a principle. Maybe "APIs SHOULD provide mechanisms that allow consistent data sets to be retrieved with multiple requests"? This would certainly be nice, but hard to do in general. This seems really aspirational...
It is not completely clear to me what scope of software engineering activities this document is intended to apply to. From the title I can infer that MediaWiki is in scope, but "other Wikimedia engineering endeavors" is really, really vague.
Does it apply to:
- Puppet code used to automate Wikimedia servers?
- Non-MediaWiki user facing web applications (for example https://scholarships.wikimedia.org)?
- Indirectly user facing services (for example the dynamicproxy deployments used by Cloud Services)?
- Non-MediaWiki cli tools developed by Wikimedia (for example scap)?
- Gadget code deployed to Wikimedia wikis?
- Lua code deployed to Wikimedia wikis?
Thanks Bryan -- that was one of my concerns as well. Similarly, it talks about "software that interacts with users", but it's unclear to me what that is envisioned to be. Is it wiki-users, third-party MediaWiki users (so e.g. sysadmins), users of third-party MediaWiki installs, API consumers, development communities, Cloud Services users etc. etc.
"software that interacts with users" should perhaps be clarified by adding "though a graphical user interface".
The idea is that this document should govern all engineering decisions, including all the areas mentioned by Bryan - but of course, not all principles are applicable in all contexts. dynamicproxy doesn't need internationalization, and puppet code doesn't need to be re-usable as a library.
Are developers considered users in this context? For example, should API sandboxes support i18n? Or things like Quarry? If the answer is yes I'm not sure a MUST requirement is realistic (although we should certainly strive for it more than we do now).
Yes, developers should be considered users, and the UIs of developer tools should be localized. The Special:ApiSandbox is already fully localized (yay!). Quarry should be, at least if it's an official WMF tool. For tools written by volunteers, the principles can of course not be enforced, but following them should still be encouraged.
@DKinzler (WMF) What is an "abstract domain model"? Would it be possible to use less jargony language here?
I came here to say something similar; in general, one useful addition to many of these principles would be specific examples where SHOULD/MUST have been followed correctly and places where we could improve.
I think I understand what is meant by "APIs geared towards a specific user interface MUST be considered part of the component that implements that user interface, and MAY be considered private to that component" but I'm not sure what specific instances/examples (if any) this statement was written in response to.
> What is an "abstract domain model"?
For example, "title", "page", and "revision" are entities in mediawiki's abstract domain model. APIs should be built around such concepts. In contrast, and API that exposes internals, like the database schema, should be avoided. APIs that cater to a specific client are OK (e.g. CategoryTree has an API foe returning rendered sub-trees), but should not be used by anything else (they are "private to that component").
I'm not sure how to rephrase these points to make them clearer. We can add examples, I just fear that it will clutter the page too much. Suggestions?
So should that say "SHOULD be considered private to that component" instead?
> I'm not sure how to rephrase these points to make them clearer.
Link to Wikipedia articles on the first use of each "term of art" in the doc? If you are using terms of art that are so obscure to not be covered at least as a sub-topic on an enwiki article then that might be a good guide on rewording. Unless of course the term of art is purely of local origin; in that case I would hope there is a mw.o or wikitech page you could link to for clarification.
Ok, I'll add links.
For the case at hand, see domain model.
A separation into three "layers" seems to be industry standard: storage, process, and presentation (note that this is conceptually analogous to, but different from, the MVC pattern used in user interfaces).
This separation of concerns would help to allow the implementation of different interactions/flows and different representations for different users, use cases, and devices. It's not the only way to achieve this, but it seems to be an obvious win. Should it thus be part of the architecture principles?
Suggested wording: The software stack SHOULD be separated into three layers: storage/persistence, processing/application, and presentation/interaction.
I don't think this is useful without further clarification. For example, MediaWiki has a database abstraction layer; is that the same as the storage layer, or is a storage layer expected to encompass all knowledge of DB structure? What about caching and cookies (technically all forms of persistence but often awkward to separate from processing logic)? What is "interaction", does that include processing request data? How do these apply to an application where a significant part of the logic is frontend code and "interaction" has very different meaning from the request/response based web applications?
In my mind, the storage layer should abstract all knowledge of what technology is used for storage, and what schemas we use in these technologies. Ideally, application logic is completely isolated from that. Cookies should be handled by the Request/Response layer ("presentation" layer is misleading, since it also deals with request parameters, input validation, and user sessions). Caching can happen in all layers. The storage infrastructure behind caching and session should be encapsulated in the storage layer.
So yea... this seems too specific for "architecture principles". These are more "architecture techniques" or something...
Maybe you could articulate the reasons for having separate storage etc. layers, and try to turn those reasons into principles. Is it to minimize the amount of code that depends on technology choices (such as storage engine)? To minimize the amount of code that deals with data in an uncertain format (user input, old data from storage)? To separate code that is expected to change often from that which isn't? To enable certain types of site customizations?
yes ;)
Are the MUST/MAY/SHOULD keywords intended to be interpreted according to https://www.ietf.org/rfc/rfc2119.txt? If so, this should be explicitly stated in the document. If not, the intended meanings MUST be defined in the document or a prominantly linked definition document.
Ideally yes, but see my comment about the interpretation of MUST in the "More MUST, less SHOULD" thread.