User:AKlapper (WMF)/Bitergia data quality queries
Appearance
The data behind wikimedia.biterg.io regularly needs updates to make our metrics reliable. The database can be queried via the Sortinghat Identities API. The database can be edited via the Sortinghat Identities API and via the web interface.
For convenience this page lists GraphQL queries and bash scripts that User:AKlapper (WMF) may occasionally run.
Find accounts which likely should have an affiliation / enrollment
[edit]- By potential email address:
query { individuals(filters:{term: "@wikimedia.org", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "@wikimedia.de", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "@wikimedia.se", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "hallowelt", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "speedandfunction", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "thisdot.co", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
- By potential username:
query { individuals(filters:{term: "(WMF)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "-WMF", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "(WMDE)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
query { individuals(filters:{term: "-WMDE", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
- Look at GitLab accounts and if they should get merged into existing accounts (very cumbersome, see phab:T306770, could manually check email addresses and/or group membership on https://ldap.toolforge.org/user/someusername but does not scale):
query { individuals(filters:{isEnrolled:false, isBot:false, source:"gitlab"}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
Queries not possible due to GraphQL limitations
[edit]- To identify folks that should have an affiliation set, use hostnames of email addresses of user accounts in the Phabricator database, then re-use those usernames as a condition in a GraphQL query on the Bitergia database.
- To find duplicate Phabricator accounts which only changed their "Also Known As" (as long as phab:T305230 remains unresolved): Query for
mk
s which share the very samename
and both havesource:"phabricator"
but have differentmk
s.- Same applies to any other
source
which allows renaming accounts.
- Same applies to any other
- To find accounts with same email addresses to merge: Query for
mk
s which share the very sameemail
but have differentmk
s.
Check detached accounts with same mw and phab usernames if they are connected to merge
[edit]Expensive / time-intense. See the script and DB commands.
Query all existing Phab accounts about their connected MediaWiki.org accounts
[edit]Expensive / time-intense because >10000 accounts. See the script and DB commands. (For a cheaper version that requires more manual checking, see phab:T170091.)