Core Platform Team/Initiatives/Image Suggestion API
Appearance
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Image Suggestion API
|
Epics, User Stories, and Requirements
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Image Recommendation API (Proof of Concept)
List Unillustrated Articles and their image suggestions
- As a developer, when I make a request to the Image Recommendation API,
- I expect to see a list of unillustrated articles and their image suggestions
- The list should be at most 10 images per 1 page request
- Of the 10 images, at most 3 of the images should be from ImageMatchingAlgorithm and 7-10 images should be from MediaSearch
- I expect to see a list of unillustrated articles and their image suggestions
List Image Recommendations for all Wikipedia languages
- As a developer, when I make a request to the Image Recommendation API with a page title,
- I expect to be able to make requests for all Wikipedia projects in any language
- e.g. Arabic, Cebuano, English and Vietnamese Wikipedia
- I expect to be able to make requests for all Wikipedia projects in any language
Provide the Image Source and Confidence Rating of an Image
- As a developer, when I receive a list of images
- I expect to know the source of how the recommendation was provided
- e.g. I see the image recommendation for the Frog page is from "Commons"
- I expect to know the confidence rating for each image recommended per page requested
- e.g. I see that the image for "Amazonian Tree Frog.jpg" has a confidence rating of "high"
- I expect to know the source of how the recommendation was provided
Filter # of Image Recommendations Per Article Request
- As a developer, when I provide a parameter to limit the number of image recommendations per page
- I expect to get somewhere between 1 and 10 images recommended per page requested
Non-Functional Requirements
- Authorization/Authentication
- Performance Metrics
- As a member of the Performance Team, I want the Image Recommendation API Response time to be less than or equal to 250ms RTT (not including network latency)
- Uptime
- Average and Max Latency
- Errors Per Minute
- API Product Metrics
- API Usage
- Unique API Customers
- Data metrics
- As a member of the Platform Team, I want the Image Recommendation data pipeline to respect system and data quality SLOs.
- System
- Spark sinks (in / out records, cpu usage, memory usage, executor counts
- Datasets
- Summary of population statistics (purpose: identify regressions, population/model drift, anomaly detection)
- Size and counts of intermediate and final datasets (purpose: identify regressions)
- ML Metrics
- Accuracy by
- Method (ImageMatching Algorithm, MediaSearch)
- Sources (WikiData, Commons, etc.)
- Recommendations resulting in
- Rejections
- Applied Edits
- Skips
- Accuracy by
- Documentation
Time and Resource Estimates
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Estimated Start Date
None given
- Actual Start Date
None given
- Estimated Completion Date
March 3, 2021
- Actual Completion Date
None given
- Resource Estimates
None given
- Collaborators
None given
Open Questions
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Project Organisation
- What are our definitions of done and success criteria? How are these broken down per components and aligned across teams?
- Are we one project team, or two (backend/API) that work together?
- 1 team with 2 concerns: Image Suggestion API & Data Pipeline
- 1 team with 2 concerns: Image Suggestion API & Data Pipeline
- How would we like to communicate with other teams?
- Do we have points of contact for the support teams?
- Do we need a RACI for the project?
- Are we missing any resources?
- No
- No
Timelines and Scope
- Are there critical intermediate deadlines for other teams that we should be aware of?
- What is the timeline for the various parts of the project
- Android: MVP Release by March 3
- Android: MVP Release by March 3
- Are there any teams we can decouple dependency from?
- What can we, platform team, stop caring about? (out of scope)
- Are the expectations clear and realistic?
- Can we deliver within the timeline?
- How do we bound this project if it is also going to be iterative?
- What are the risks?
- What constitutes scope creep?
- What internal deadlines can we set for ourselves?
- Proof of Concept target delivery date is March 3
- Proof of Concept target delivery date is March 3
Requirements
- Are there any eventual requirements whose deferral jeopardizes the architecture?
- What prereqs must we satisfy before we can start a POC Task API implementation?
- Who approves the API spec?
- The Client Team(s)
- The Client Team(s)
- How often do we expect to re-train the model? The best we can do is currently once a month.
- What system / team will be responsible for tracking recommendations state?
- Can we alter the Image Rec. Algorithm to run more performant(ly)?
- Is it proven that the image rec. algo provides "better" results than MediaSearch?
- Does the ranking system need to be part of the first iteration (where does it fall if the SD is no longer going to use the Task API)
- Confidence Rating will be included as part of the Image Recommendation API proof of concept
- Confidence Rating will be included as part of the Image Recommendation API proof of concept
API Service
- What language or framework should we build the api in?
- The proof of concept will be built with nodejs.
- Is the API going to be an extension or service
- The API will be a service.
- Is task api storing the data from image rec algo + MediaSearch somewhere, or doing queries to both in real time, and then smashing the results together?
- The API will "smash" the results together of the image rec algo + MediaSearch if the results from the image rec algo are not "sufficient". This may mean not enough results to satisfy the number of requested results. The API will likely do a query to MediaSearch in real time, and then have intermediate storage between the image rec algo Hadoop cluster and the task API.
- How do we update tasks to reflect user's actions (accept/reject a task)
- What’s meant by the Image Recommendation bot as an end user? I was under the impression the API would be used by human interaction only
- The API will serve both end users (e.g. android app users) as well as MediaWiki bots that will automatically select images (with a high enough confidence score) to be added to articles.
- What happens if a user rejects images for not being relevant? Do we update the options for the next user or remove the recommendation for improvement? Also how are we capturing this information for the algorithm so that it doesn’t offer the same image recommendations the following month (assuming an image hasn’t been added to the page in the last month)?
- Does the POC include the requirement of "List Image Recommendations for a Given Article"?
Storage
- Will the task API use Elastic search as a backend or other storage (MySQL, Cassandra etc)
- What storage are we using for the ETL pipeline
- What are the performance requirements?
Documentation Links
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Phabricator
https://phabricator.wikimedia.org/T260832
- Plans/RFCs
None given
- Other Documents
https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_an_image
Subpages