Technical decision making/Decision records/Authority
Decision made in February of 2021.
What are your constraints?
[edit]Many times we have implicit constraints, based on time, resources, performance, security and other aspects. This is a place to make them explicit and share with your team and stakeholders.
General Assumptions and Requirements
Use a new line for each assumption or requirement which you are using to constrain your proposed solutions. |
Source
The person or document that this requirement comes from. |
Decoupling code from global state is valuable | Established best practice |
Encapsulation and separation of concerns is valuable | Established best practice |
Migration to the new system needs to follow the Stable Interface Policy | Stable Interface Policy |
Disruption of existing code should be minimized, migration should be straight forward | PET |
Security Requirements
Describe any security limitations or constraints for the proposed solutions. |
|
Permission checks performed via the new system should be functionality equivalent to checks made through the old system. The semantics of individual permissions remains unchanged. | PET |
The chance of mistakenly reducing rigor of permission checks should be minimized. | PET |
Privacy Requirements
Describe potential privacy limitations or constraints for the proposed solutions and how they will be mitigated. |
n/a |
Important Questions
[edit]Fill out any questions you should answer or follow up on before making a decision. Record those answers here.
Question | Who can answer? | Resolution, answer or action |
Does the proposed design cover existing and anticipated use cases that stakeholders have? | Engineers writing code that performs permission checks in PHP. | All use cases covered. See the consultation section for details |
Is the proposed design easy to use in the context of stakeholder’s use cases? Or does it add complexity? | Engineers writing code that performs permission checks in PHP. | No concerns raised. Use cases are expected to benefit from reduced complexity. |
Decision
[edit]The table captures your decision as well as important information documentation, context, and support materials. Fill this in after you list out options on the following pages.
Selected Option | Option 2 (see below) |
Rationale | Previously considered options did not provide the desired encapsulation:
|
Data | |
Informing |
|
Who | Daniel Kinzler (PET) |
Date | February 18 2021 |
Technical Forum Chair review
[edit]This is to be filled out by the Technical Forum Chairs after the decision is made. These questions are designed to help the chairs give feedback to the decision team to help them make an informed decision.
Are the options detailed enough to make an informed decision? | |
Were all parties identified in the RACI consulted? | |
Does this decision require C-level review? Why or why not? |
Product Chair | The name of the Product Department Chair reviewing |
Technology Chair | The name of the Technology Department Chair reviewing |
Date | When was this presented in the Technical Forum |
What are your options?
[edit]On the following pages, list at least three options to address your problem. Try to keep the description clear and use language that can be easily understood. The first option should always be the “nothing” option. This helps you properly evaluate the pros/cons/costs of the status quo as you evaluate potential solutions. If you need to dig deeper into options, technology, etc… link to them in the references.
Option 1: Do Nothing | |
Description | Keep using PermissionManager together with User to perform permission checks. Keep rate limiting separate. |
Benefits | No direct engineering cost. |
Risks | Global state makes the code hard to test. Use of stateful “monster” objects make the code unpredictable and hard to understand. |
Effort | none |
Costs | none |
Testing | Test coverage of existing code is incomplete. |
Performance & Scaling | The current solution has been optimized to avoid performance impact on high traffic code paths. It is in production now. |
Deployment | n/a |
Rollback and reversibility | n/a |
Operations & Monitoring | n/a |
Additional References | n/a |
Consultations | |
Timo | In 2019, Timo said that not only should we not rely on global state to access the current web request: information from the web request should not be available in service wiring either (see T218555#5438529). Instead, they should generally be passed from the entry point as a parameter. This led us to dismiss an earlier proposal to keep building on PermissionManager, and further to the proposal of ActingUser (option 3) and later Authority (option 2). |
Product Infra | In 2018, Gergö identified a number of problems with the existing system in T212639. This greatly influenced the design of Authority (option 2). |
AHT | When AHT was working on PermissionManager in 2018, David Barratt identified problems regarding the way the block system interacts with the permission system in T212311. This influenced the design of Authority (option 2). |
Option 2: Introduce Authority | |
Description |
|
Benefits |
|
Risks | As with any refactoring, there is a risk of oversights and misunderstandings. We tried to reduce this risk by keeping the new interface as close to the existing interface as possible, allowing it to act as a drop-in replacement under most circumstances. |
Effort | The cost for putting the new system into place has already been covered in the experimental spike. It turned out to be about four weeks of work for two FTE.
The cost for converting most of core and extensions maintained by WMF to the new approach will likely be roughly that much again. The cost of converting the long tail of third party extensions is hard to estimate. It should be mentioned that these conversions are typically trivial changes. |
Costs | There is no direct cost beyond the engineering effort. |
Testing | All new code added will have full test coverage.
Test coverage of code affected by this test varies greatly. The most crucial functionality is covered by end-to-end tests. |
Performance & Scaling | The refactoring is designed to retain the scaling characteristics of the existing implementation. |
Deployment | The code changes will be deployed iteratively with the weekly deployment “train”. |
Rollback and reversibility | As with any refactoring, this is reversible, but the cost of reversing is relative to the number of places in the code that bind to the new interface. For Authority, that number is rather high.
To mitigate the risk of having to roll back, we implemented the new functionality and changed a limited scope of code to use it, to verify the design and gather experience. Converting more parts of the code base will be done in an iterative manner, allowing us to roll back each step individually. |
Operations & Monitoring | There are no additional operational requirements.
The monitoring in place for the existing system will cover the new system as well. |
Additional References | T231930
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/655438 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/654965 |
Consultations | |
Wikidata | One concern addressed: MCR slots are not modeled as separate “targets” of actions, targets are always pages; instead, operations on some slots may be modeled as special actions requiring additional permissions. |
Performance | No concerns raised regarding performance. Timo expressed a preference for this option over the others presented here (PermissionManager and ActingUser) |
Security | Conceptually, the Security Team does not have any issues with the proposed Authority system and believes it should make many of the convoluted permission and security checks within MediaWiki more sane and sustainable. One potential issue that the Security Team sees somewhat often is developers simply not knowing how or when to check various rights, etc. within MediaWiki but that is more of a social/documentation issue unrelated to Authority. |
Multimedia | Same question about MCR slots as from the Wikidata team. |
Tech Engagement | Birgit asked whether this would have any impact on people writing user scripts, gadgets, or bots. It does not.
It does have an impact on third party extension developers. |
Abstract Wikipedia | This change is a good one, and we support it. We have a minor concern of potentially wanting more fine-grained targets for actions in the future (more than at the page or MCR slot level), but this can be a later iteration of the new system. |
Option 3: Introduce ActingUser | |
Description |
|
Benefits |
|
Risks |
|
Effort |
|
Costs | There is no direct cost beyond the engineering effort. |
Testing | All new code added will have full test coverage.
Test coverage of code affected by this test varies greatly. The most crucial functionality is covered by end-to-end tests. |
Performance & Scaling | The refactoring is designed to retain the scaling characteristics of the existing implementation. |
Deployment | The code changes will be deployed iteratively with the weekly deployment “train”. |
Rollback and reversibility | Since only a limited number of classes are affected, this change is easy to undo. |
Operations & Monitoring | There are no additional operational requirements.
The monitoring in place for the existing system will cover the new system as well. |
Additional References | T231930 has a section titled “Original Proposal (ActingUser)”, and some discussion about this proposal from 2019. |
Consultations | |
Phab discussion | Starting with the comment at T231930#5462972 in 2019, several collaborators analyzed and discussed the proposed solution (ActingUser), identified weaknesses, and identified improvements. This discussion led to the proposal of Authority (option 2). The collaborators involved were Brad, Daniel, and Simetrical (PET); David Barratt (AHT), Gergö (Product Infra), and Máté (Fandom). |
Resource:
https://www.atlassian.com/blog/inside-atlassian/make-team-decisions-without-killing-momentum