Editor campaigns/Notes

Here are some notes about Editor campaigns. :)

Initial features and the road beyond

The project page sets out initial goals and features for Editor Campaigns. What may follow is a bit fuzzier. Still, it could be useful to start thinking about the next steps; we may want to consider (1) what will come next, as part of deciding (2) how to implement what comes first. This document touches on both points.

Initial use cases

Here's a use case diagram based on the user stories described by the project.

This diagram describes an unrestricted campaign configuration. Some campaigns will probably want to set tighter limits on who can do what. Also, there are some actions (like following campaign activity) that will probably be open to everyone, but that will be mainly of interest to participants and organizers.

Question: is it appropriate for organizers to be able to add or remove people at will? Shouldn't some action be required on the receiving end?

Possible future user stories

Some ideas for more stories:

Campaign organizers, participants and others want to study and compare campaigns using a variety of metrics.
Campaign organizers, participants and others want to link data about campaigns to data from other sources, including Wikidata.
Campaign organizers and participants want to organize their work in steps (i.e., as workflows) according to the requirements of a specific campaign or type of campaign.
Campaign organizers and participants want different UXs and data points for different types of campaign.
Users want flexible, productive, fun and easy-to-use tools for working and hanging out on Wikipedia in groups.

Structured data store

Quite tentatively: we might consider the possibility of using a structured data store like Wikibase instead of MySQL tables to persist campaign data.

The possible future user stories mentioned above would benefit more from such a design than the ones we're tackling now. (See below for more specific use case ideas.)

We could call it “WikiCommunityData”. The database tables described in the RFC could still be created as a first step while we consider the details and usefulness of a structured data store.

One option would be to create an internal repository using Wikibase, but with no external Wikidata UI, only an internal read-write API and a public read-only API.

Advantages

Flexibility. It would be easy for extensions that build on Editor Campaigns to augment the base campaigns data structure according to their requirements.
Encapsulation. The base Editor Campaigns component wouldn't need to know anything about the data structure elements used by the components that build on it.
Expressiveness. With hierarchies of types, multilingual labels, complex datatypes and “no-value” statements, Wikibase is much more expressive than a relational database (as so are other similar models, like the Semantic Web).
Exportability. It's possible to export Wikibase data to RDF and other formats. This would make it easy to link campaigns data to structured data from multiple sources.
Clean separation of private data. Database tables contain private information. Putting public campaigns-related data in a separate repository would make it possible to offer query functions without worrying about unauthorized access.
It's hot...! This is the type of technology behind Siri and Google Knowledge Graphs. Seems to have some potential.

Disadvantages

Additional Ops work would be required; feasibility is unknown.
Everyone's more familiar with SQL, and a lot of this could be done with database tables, too (though probably not as elegantly).

Specific use cases

Find campaigns by the Wikidata categories or Geotags of the articles they work on.
Place a campaign's activities on a timeline together with events described in Wikidata. For example, visualize edits by a project about the Ukraine alongside major events in the recent protest movement there.
Create automatic text analysis of articles and link it to campaigns to compare their results. A text analysis component could add structured data to the repository. Campaign results could be compared in terms of content persistence, for example. Or one could compare the types of discourse on the Talk (or Flow) pages used by different campaigns.
Describe campaign participant networks. See, for example, this study of Twitter networks.
Create a system for modeling workflows, hook it up to UI features, and let campaigns tweak their own workflows and link to their own data structures as needed.

Why a separate Wikibase instance?

Why use a separate, isolated Wikibase instance?

We would probably be able to re-use Wikibase code for dealing with geolocation, time and other basic stuff.
We would be able to leverage Wikibase property definitions (such as participant and organizer) and contribute to Wikidata ontologies.
Our data would differ from that in Wikidata in that it wouldn't need sources, and would not be expected to contain contradictory statements.
There might be different performance and caching requirements.

Architecture

The code produced during the first few sprints will probably be pretty straightforward. Still, it would be nice to have an architectural plan to aim for. Here's a rough proposal:

The persistence layer would encapsulate DB or Wikibase access. The business logic layer would handle basic operations in isolation from the UI. The base Campaigns UI and reusable UI logic layer would provide a simple, general UI for working with campaigns of any type (course, edit-a-thon, etc.), and would provide reusable UI components for use by the top layers. Finally, the top layers would access the Campaigns business logic layer and use said UI components to create interfaces and additional features specific to one type of campaign.

To integrate Workflows, two middle layers could be added next to the Campaigns business and UI layers, with similar functions. Other components with general functionality to be used by the upper layers (for example, a component for edit feeds) might insert middle layers in a similar manner.

Question: Do we want to use existing frameworks at all? Composer? Anything from Symfony?

Initial UI

Here's an unoriginal proposal for the UI:

A Special page for listing and searching for campaigns. Each campaign name would be a link to the page for that campaign. Authorized users would see an add button and/or delete buttons.
Campaigns would have a page in their own namespace. Pages would be implemented using ContentHandler, and would have tabs for viewing, editing, viewing history, moving and deleting (again, depending on user rights). The edit page would let users modify general campaign information and (for admins) remove users. The view page would also list the participants/organizers and provide a means of inviting more users.

This general layout is close to the one used by the Education Program extension, though there's a lot of room for UX and design improvement within that framework. Consultation with EP users will be fundamental for ideas on possible improvements.

This basic UI should display all types of campaign, regardless of whether or not they have other, more specialized UIs.

Initial API

Two API functions will be provided initially for use by Wikimetrics: (1) for listing and searching for campaigns, and (2) for retrieving lists of participants. Both will be list query modules.

list=allcampaigns

Parameters:

allcprefix Search for campaigns whose name begins with this value. Optional; if omitted, get a list of all campaigns.

allclimit Maximum number of results to return.

Example:

api.php?action=query&list=allcampaigns&allclimit=10

list=campaignparticipants

Parameters:

camppid The id of the campaign to get a list of participants for.

campplimit Maximum number of results to return.

Example:

api.php?action=query&list=campaignparticipants&camppid=1&campplimit=100

Considerations:

It seems the term cohort is most appropriate for groups with set, rather than shifting, membership. Even if we don't implement queries of campaign membership at different times, the terms we choose might reflect the fact that the sets of users in a campaign at different times or over different time ranges are technically different cohorts.
Wikimetrics will be the only known user of this API. We should tailor it to their needs.
- I'm not sure that standardizing the API functions to pull cohorts from different providers makes sense; the idea may need more flushing out. I think it's Wikimetrics's call, though.
- What I think does make sense, and is easy, is for certain API calls to have a standardized output format that Wikimetrics understands as providing a cohort.

As data points are added (articles worked on, time frame, etc.) additional campaigns-specific API functions might be be added.
Top-level components for specific campaign types might also provide specialized APIs.
If Wikibase is used for data persistence, it will also be possible to access the data via Wikibase APIs.

Privacy notice and opt-in

Tentatively (?): When a user signs up via a campaign, they'll see a notice on the Create account form explaining that information about their participation in the campaign will be publicly available. There will also be a link to more information about the campaign, and a checkbox that lets them opt out if they wish.

Opting out will clear any campaign assignments and refresh the workflow.

Legacy operation

We shouldn't disturb legacy operation. Here's how we might handle it:

If a URL campaign identifier isn't recognized, we'll just log via EventLogging, as we do now.
If a current campaign organizer wishes to use the new system, they should create a Campaign page using the new interface, and set whatever URL identifier they want to use. (It can be the same one they're already using.) From then on, new participations from Create account URLs will be stored in the database and will be available on Wikimetrics. New participations will still be logged with EventLogging, too, as before.
Users may also create a campaign page but keep only legacy operation (i.e., only event logging) by setting an option on the campaign page.

Deleted users

Only users with the hideuser right can view deleted users. (?)
The API we're creating doesn't require authentication.
The presence of a significant number of deleted users in a campaign might be relevant, so we shouldn't just pretend they don't exist.
Alternative solutions:
- When providing information about the users in a campaign (via the API, or to unauthorized users via the UI), show "fake" user IDs or user names, or placeholder text such as "deleted", clearly indicating that they refer to deleted users.
- Unenroll the deleted users from their campaigns. Historical enrollment is available through another auditing API.

Persistence layer

There are two patches, still WIP, for the persistence layer:

As required, the new code may be merged independently of schema update application.

It's quite a few lines of code, but there are many the advantages to the approach taken:

Full encapsulation of all DB access.
- All access to the data from outside this layer takes places only through an opaque PHP API.
- No dependency on the details of the database implementation outside the layer.
- Unlike with ORMTable and friends, no direct setting of query conditions or options from outside the persistence layer.
- Unlike much access to data in Mediawiki, no dependency on database column names outside the layer.
- Within the persistence layer and in the outward-facing PHP API, class constants are used instead of hardcoded strings.
Only persistence logic in this layer.
- No mixing with more general business logic (such as permissions for actions, logging or page revisions).
- No mixing with UI elements (such as messages related to input verification).
Keeps open the possibility of switching out the database implementation for another form of storage without affecting other layers.
Loose coupling and explicitly described coupling within the persistence layer, and between it and other layers, via dependency injection.
A minimalist dependency injection mechanism that may be switched for an external library once one is approved for use.
Unit testing: full coverage of all code paths.