Jump to content

Reading/Web/Release Manager updates

From mediawiki.org

This records the current state of chores as defined in the chore list. View team norms

Chore Status Notes Last updated
Monday
Browser tests All green

Previously: Popups failing due to https://phabricator.wikimedia.org/T370460#10057377

KSarabia-WMF (talk) 15:38, 16 December 2024 (UTC)[reply]
Visual regression

Some expected visuals changes from the appearance menu, and some false errors from noise DO NOT EDIT HERE PLEASE UPDATE Reading/Web/Release timeline/Visual Changes with any expected or unexpected changes.[1]

NOTE: If doing this chore late, please take note of the dates of reports on pixel instance. You will need to run the regression tests locally. This check is for the release that goes out this week not next week.


Note: Old jobs are available at https://pixel.wmcloud.org/archive/

KSarabia-WMF (talk) 20:22, 03 December 2024 (UTC)[reply]
Check our core vitals and performance dashboards Check our core vitals on google search console and grafana and make sure there are no performance problems.
Wednesday
Review dashboard Run through the list in "Needs triage" and tag tasks with the relevant team. If in doubt add tag #readers-web-backlog to every task here. KSarabia-WMF (talk) 22:55, 04 December 2024 (UTC)[reply]
Logstash (client errors) Check group 0 and 1 wikis "Top 10 errors (group 1) widget" on dashboard. Flag any significant spikes as potential deploy blockers. Anything with error rate that is not a gadget issue over 1000 should have a corresponding bug and is worth flagging to the team. KSarabia-WMF (talk) 22:55, 04 December 2024 (UTC)[reply]
Handover If a new sprint is starting handover to the next release manager.
Thursday/Friday
Logstash (client errors) Previous: Filed https://phabricator.wikimedia.org/T375417

Do not remove these instructions: Bugs with > 1400 occurrences in 7 days:

1) create a ticket (unless gadget - in which case use slack channel or gadget talk page).

2) Create a filter for the bug labeling the filter with the open ticket number if it belongs to.

3) If the bug does not belong to us, enable the filter by default and exclude it from the view. If the bug belongs to us, disable the filter.

For banner errors see meta:CentralNotice/FAQ

Investigate bugs < 100 optional if have time and interested.

When investigating bugs filter those errors out ideally using a stack trace wildcard. If a bug belongs to reading web change the label to the phab ticket number. If not add it to one of the existing team or gadget filters.

As part of chores we'll check and remove existing pins where the task has been resolved. The less pins at the top the cl

Please read.

KSarabia-WMF (talk) 17:43, 03 December 2024 (UTC)[reply]
Logstash (PHP errors) <Create tasks for anything notable that you are use relates to our work> KSarabia-WMF (talk) 17:43, 03 December 2024 (UTC)[reply]
EventLogging validation errors
All good!

Previously (open tasks): None

KSarabia-WMF (talk) 17:43, 03 December 2024 (UTC)[reply]
Friday
Review dashboard Run through the list in "Needs triage" and tag tasks with the relevant team. If in doubt add tag #Web-Team to every task here.

In particular identify any bug reports that may relate to work we shipped in the current sprint.

Additional roles

[edit]
  • The release manager is accountable for updating the team on the state of the current deploy (whether the train is blocked, whether any new bugs have been flagged that relate to the team or whether everything is going to plan)
  • The release manager is accountable for any e-mail alerts with subject [FIRING]
    • Email's relating to client errors:
      • Make sure the alert is understood and bug(s) are filed
      • If a gadget is related, make sure the gadget author is notified or the gadget is fixed
      • Update the alerting threshold to stop the e-mail alert
    • Email's relating to QuickSurveys
      • Make sure we understand which quicksurvey got enabled and where
      • Review the Grafana graphs to see if this is impacting performance in any way (QuickSurveys pulls down the entirety of Codex on the page)
      • Find out when the QuickSurvey is going to be disabled
      • Update the alerting threshold to stop the e-mail alert
      • File a ticket associated with the deployment task with #Web-Team-Backlog to remove the alert
    • Email's relating to perforance
      • Review the Grafana graphs
      • Open bug(s) if necessary documenting any regressions
      • Update the alerting threshold if necessary
  • The release manager is accountable for tagging any deployment blockers and making sure UBN's get handled (they do not necessarily need to deploy the patch themselves but they must make sure the deploy happens where needed.

Archived chores

[edit]
Other
Check talk pages All Archived. Will be done outside chore wheel from now on.
Say something nice to a team member! All 🎉 Archived but please still do this :)
Document notable events All 🎉 Archived. We will do this as part of sign off steps on tasks from now on
A11y tests Moving temporarily into archived unto flakiness is fixed

Previously

  • 6 April 2023: Tests are very flaky, need to be investigated and fixed
  • 2/23 ignored color contrast and "This link points to a named anchor "[link target]" within the document, but no anchor exists with that name." errors
  • Around 02/22 logged in sudden increase in errors saying "This link points to a named anchor "Oil_and_gas_development" within the document, but no anchor exists with that name." Unsure why this is suddenly being reported, could be change to the a11y testing library. Will update the config to ignore this rule
  • Around 02/17 decrease in errors, due to fixing T328584
  • 02/09 Logged-in spike. Looks like this is primarily caused by new color contrast errors of the page tools links (which matches the TOC).

Visit the a11y grafana dashboard to see the metrics over time. If there are any changes, look at the daily reports in Jenkins to investigate.

Note: The "logged in" test will sometimes fluctuate due to different notices/gadgets being enabled in beta cluster. Changes can be made to the config to hide problematic elements and improve consistency.

BWang (WMF) (talk) 15:34, 18 May 2023 (UTC)[reply]
Review the Reading Web Backlog Will be done as part of grooming meeting Jdlrobson (talk) 21:27, 9 February 2023 (UTC)[reply]
Check volunteer patches on Phab We now have a grooming meeting. This will be covered there. Jdlrobson (talk) 21:27, 9 February 2023 (UTC)[reply]
Check patches on Gerrit have attention or hashtags We now have a grooming meeting. This will be covered there.


Jdlrobson (talk) 21:27, 9 February 2023 (UTC)[reply]
Check Slack for bug reports for code in production without Phabricator ticket We now have a grooming meeting. This will be covered there.

____

Often we might report bugs initially on Slack and submit patches. For issues that have hit production, our team norm is to make sure there a bug is created so that the issue can be tracked. Check the back scroll for the last week and create Phabricator tasks where needed.

Jdlrobson (talk) 21:27, 9 February 2023 (UTC)[reply]
Ensure goals are current Asked in standup today. Jdlrobson (talk) 21:32, 5 January 2023 (UTC)[reply]
Page previews dashboard Archived in favor of alerting.

Previously:

  • Median Time to Preview seems significantly reduced in past few weeks
  • API request failures seems to be higher since 02/01. It might be noise, but worth monitoring
  • Note well that the most recent HighTimeToPreview alert was triggered by a backend error.
  • We've been getting alerts for HighTimeToPreview. Looks like p95 TTP chart started going a bit haywire (up + down between 780ms - 860ms) about a week ago. Is this something to be concerned about? Jon adjusted threshold (every 5 mins for 2 mins) and haven't seen a HTTP alert since
NRay (WMF) (talk) 19:36, 17 February 2023 (UTC)[reply]
Generic dashboard Archived in favor of alerting.

Previously:

  • Not sure if this is a known issue - web_ui_scroll events dropped starting 4/7/22
  • QuickSurveyInitiation events dropped due to T305498 - again hopefully resolved if/when backported or with train rollout.
  • Some big spikes in the "Mobile beta opt in/outs" graph (note, this is inative and likely non-Wikimedia sites and should be ignored)
  • MobileWebUIActionsTracking looks right around ~200 events/second
  • The spike in MobileWebUIActionsTracking is from the newly added init logging. We should make sure it doesn't come close to 1000 events per second and ideally is around ~200 events/second though
  • Schema:DesktopWebUIActionsTracking errors have been steadily increasing the past month
  • There was an issue with some gadgets relating to the jQuery migration (T280944). More client errors than normal.
  • EventLogging Schema errors have been down on average for the last week!
  • VirtualPageViews went offline. Now fixed. Updated Reading/Web/Notable_incidents.
NRay (WMF) (talk) 19:36, 17 February 2023 (UTC)[reply]
performance dashboard Archived in favor of alerting.

Previously:

  • Nov 16th: Since October 12th we've been shipping an additional 1k of CSS to mobile on beta cluster only.. what happened? (gadgets)? This does seem consistent with production so probably nothing to worry about but let's keep an eye on it in case there is a new feature enabled there. I've synced the MediaWiki:Common.css with enwiki today which might change that.
  • PLEASE NOTE (Nov 3): Peter from the performance team is in the process of changing this dashboard to use data from Browsertime instead of WebPageTest. If all goes well, you shouldn't notice a thing. However, if something looks funky, this might be the cause!
  • A banner seems to be triggering a performance regression.
  • Timing metrics look much better after T305572 was resolved on 4/7/22
  • In Timing Metrics, desktop graph has significant jumps on "Fully loaded" graph starting 3/30/22 on both production and beta cluster. Logged T305572 and assigned to performance team.

Please note anything significant on Reading/Web/Notable_incidents

NRay (WMF) (talk) 19:36, 17 February 2023 (UTC)[reply]
Search performance dashboard Looks good

Previously:

  • Search data for Vue showing up now
  • Search data should be fixed now after 832577
  • What happened to the Vue search data? (jon to follow up with Nick)
  • NR: I updated the graphs to show 30 days rather than 7
  • Interesting drop between 2/1/22 - 2/8/22 for the legacy search graph - what would account for this?
    • NR: I think that is when vector skin split stuff happened and no data was being collected for legacy since `useskinversion=1` no longer worked
  • Search Query to Render is lowering, looks good
  • Should we be concerned that Search Query to Render has been creeping up? - hard to say at this point as past 30 days has shown similar spike, will be interested to see next week's data


Note this is a new dashboard that displays the results of synthetic tests (not measuring real users) and is based on the work from T251544 . These metrics currently measure Vue and Legacy search performance on frwiki.

BWang (WMF) (talk) 18:09, 6 April 2023 (UTC)[reply]

Helpers

[edit]

Status fields:

is fine
⚠️ needs attention
🚨 is on fire
🎉 For your attention but positive

Last updated (Use four tildes to autosign)

  1. Reminder: Ensure you have run Pixel against the upcoming release number to avoid missing changes over the weekend
    Visit https://pixel.wmcloud.org/ to see the latest visual diff between the latest release and master. This report is updated every hour. Update Reading/Web/Release timeline/Visual Changes with any expected or unexpected changes. If needed, you can also run the following locally on the latest pixel version to compare the latest release to master:
    ./pixel.js reference -b latest-release
    ./pixel.js test