Jump to content

Data Platform Engineering/Data Platform SRE/Status Update/2024-11-29

From mediawiki.org

No update was sent last week, so this update covers 2 weeks!

Airflow migration to k8s

[edit]

All Airflow webservers are migrated to k8s. This brings some quality of life improvements:

  • reach their airflow UI via a public domain (no need for SSH tunnels)
  • manage roles and permissions via LDAP group management
  • get working links in alert emails

We've migrated the scheduler of our test instance to k8s. We'll need to replicate this work for all production instances, but at this point we are confident that this should work with only minor surprises.

A new T368033 automated DAG deployment process has been discussed, implemented, documented, and communicated. Merge requests to Airflow DAGs now require formal approval by a peer before being deployed.

Spark version upgrade (in support of Dumps 2.0)

[edit]

Replace Archiva with Gitlab artifact repositories

[edit]

Migration of the Search clusters to OpenSearch

[edit]

Operations

[edit]

We've had some disk space and number of folders issues related to changes in how we deploy Refine. The immediate issue has been resolved (big thanks to DC-Ops for a quick reaction on adding disks!). This needs to be further addressed and has been communicated with Data Engineering.

Hardware

[edit]

Access Requests

[edit]