Wikimedia Search Platform/Decision Records/Recommendation Flags in Search
Status: In progress
Responsible | What work/what role is this team/group playing? | Point of Contact for this team/group (a person) |
---|---|---|
Search Platform | Search | Zbyszko Papierski |
Accountable | Why is this person accountable? | |
Consulted | Context | Point of contact |
Informed | Context | Point of contact |
What?
[edit]What is the problem or opportunity?
[edit]For the needs of a Structured Tasks project, we need to be able to filter out articles that have recommendations assigned to them. Currently we allow for two types of recommendations, that are calculated beforehand and assigned to articles - link and image. In both cases, results are calculated periodically.
What does the future look like if this is achieved?
[edit]Ability to filter the results will allow a flexible display of articles that can be combined with other criteria - like full-text search or category.
What happens if we do nothing?
[edit]Static list of articles can still be displayed without the help - but that will greatly reduce the usability of the solution.
Why?
[edit]Value | Objective or Value it Supports and how |
---|---|
Streamlined Search Platform | |
Dynamic recommendation lists |
Current background
[edit]Assumptions and requirements |
---|
The actual recommendations are not necessary for the search process |
Articles with recommendations should be marked as such and there should be a way to clear this mark |
It should be possible to do a full-text, or filtered search with inclusion of recommendations available filter |
Search has preexisting feature for handling articletopics and drafttopics |
Options
[edit]Only one option written down - decision was made during a meeting
Weighted flag for external input | |
---|---|
Description | There will be a field named weighted_tags that will contain the externally provided tags for articles |
Pros |
|
Cons |
|
Risks |
|
Effort | Implementing tags in search is a simple task, but we also need to migrate existing field to a new name and format - no estimation yet, but it will take some time. |
Costs | No additional costs |
Reference | Any links to additional materials or more detailed plans? |
Decision Type | Decision is reversible, but since it's a matter of internal implementation (externally, search features are not affected by the change), there a low chance for that. |
Important Questions
[edit]Question | Who can answer? | Resolution, answer or action |
---|---|---|
Delete API - how should it look like? | ||
How to model the process of addding new tags in the future |
Decision
[edit]Option | Weighted flag for external input |
---|---|
Rationale | Leaving the situation unattended generates additional technical debt which, we didn't want. The decided solution isn't complete - but will allow next steps |
Data | No data yet |
Who | Search Platform team (internal design decision) |
Date | 2021-01-12 |
Informing | This decision is an internal design decision |
Details
[edit]Current solution
[edit]Currently, we support article topics and drafts, both coming from ORES. Current structure example:
"ores_articletopic": [
"Culture.Media.Radio|475",
"STEM.STEM*|741"
]
In the first tag, "Culture.Media.Radio" is a value of a tag, 475 is a term frequency value used as a weigh to sort the values in search. We currently put also draft data there, which isn't ideal. This is a technical debt, we need to resolve before streamlining platform solution
Desired solution
[edit]We want to have a field that behaves the same way, but is designed for more general data. Proposed format is "<tag_source>/<tag_value>|<tag_weight>". Here's a more detailed example:
"weighted_tags": [
"classification.ores.articletopic/Culture.Media.Radio|475",
"recommendation.image/exists|1",
"recommendation.link/exists|1"
]
This structure allows us to reuse preexisting features
Existing tags from ORES will be migrated to this field.
Migration
[edit]Recommendation features are currently under development and can leverage the new structure immediately, but ORES classifications need to be migrated. Steps required:
- Implement recommendation features with new structure in mind
- Develop handling of the new structure alongside old one with BC code in CirrusSearch (search both old and new fields).
- Reindex articles to add the new field in the elasticsearch mapping
- Repopulate ores articletopics and drafttopics for all the articles (see
wikimedia/discovery/analytics:spark/ores_bulk_ingest.py
) - Remove BC code from CirrusSearch
- Reindex the elasticsearch indices to remove the old fields (using the
--fieldsToDelete
options)