Jump to content

Git/Gerrit evaluation

From mediawiki.org
< Git

This is the homepage for the Gerrit review taskforce created at the 2012 Berlin Hackathon to evaluate alternatives to gerrit; see the outcome at #Where we stand.

Timeline

[edit]

Brion Vibber lead this evaluation, with help from Chad Horohoe and David Schoonover.

Where we stand

[edit]

Gerrit

[edit]

For the time being, we're sticking with gerrit for ops, MediaWiki core, and extensions:

  • it works for what we need
  • we have existing deployment workflows built on it
  • it's being actively developed, and various improvements are coming in

That said there are known negatives; the Java+Google Web Toolkit front-end is intimidating to people who might want to help improve the UI; even Gerrit devs don't love it. :)

Improvements to the UI and to the git-review CLI tool are welcome...

The good news for those who aren't so fond of Gerrit is that we do have some nice alternatives on the horizon, which we can start working with and improve... we'll be re-evaluating things across the board for core & extensions next year. (Ops can keep Gerrit forever if they like it, that's not my concern. ;)

Phabricator

[edit]

We've been pretty impressed with Phabricator and I encourage continued experimentation with it. At present it's not capable of fully taking on all our stuff -- actual repository hosting isn't finished, and I get the impression there's some UI work needed for some workflows, eg large numbers of repos (all our extensions).

Being PHP-based, it should be easier for our engineers and volunteers to jump in and modify or contribute back on Phabricator; this is something that shouldn't be underestimated. Phabricator devs seem very open to talking about improvements, and that makes me happy.

Note that projects hosted separately on GitHub can make use of Phabricator in parallel with GitHub's own pull request review system -- the test instance at https://phabricator.wmflabs.org/ includes the Wiki Loves Monuments mobile app and I plan to do more testing with it myself.

We may also be able to devise a Gerrit+Phabricator hybrid setup using the gerrit repository hosting, but that'll be something we'll have to experiment with in the future.

Github strategy

[edit]

I very strongly recommend having official mirrors on github that can be easily forked for experimentation. I understand this is currently waiting on the Gerrit 2.5 update, which makes things easier for the mirroring setup -- that should be ready in a month or so, unless Chad can figure out how make it work on 2.4 sooner.

Even if we don't have an automated way of accepting pull requests, a pull request is easier to work with than a patch posted to Bugzilla -- you can do the entire pull/test workflow within git, then squash (if necessary) and send up to gerrit for final review.

Note also that git-review should be able to push up to Gerrit even if you cloned from a GitHub mirror -- so a mirror may also relieve some stress on the main Gerrit server for large clones etc.

As a potential primary repository, Github is pretty good but loses on a couple fronts:

  • not open source -> limited in what we can customize
  • third-party hosting -> potential availability and customization issues
    • their "GitHub enterprise" self-hosted service is not inconceivable, but still isn't open source and it's unclear what we could modify etc

Note that we are still using GitHub for projects like the Wikimedia mobile applications, which mostly predate the gerrit switch for MW core:

as well as for patched forks of other projects using GitHub, such as Apache Cordova (PhoneGap) which hosts on Apache git servers but takes a lot of contributions through forks on their GitHub mirror.

Having an official GitHub presence just makes sense.

Reference Materials

[edit]
  • Our Gerrit, in all its glory.
  • In order to properly understand Gerrit, you might want to try performing a few common tasks (please do add more here):
    • Attempt to find a link to the source for a project
    • Search for a username
    • Do a fulltext search for a commit, knowing the project name or text from the commit message
  • Video review of Gerrit, GitHub & GitLab

Criteria by which to judge a code review tool

[edit]

The following criteria are what we will judge the system by. Note, it is almost certainly the case that no system currently fully meets these criteria, so every proposal should include a credible plan to meet the missing criteria.

MUSTs

[edit]

The following are required features of any alternative plan:

  • It must not be ugly
  • It must be reasonably intuitive and user-friendly
  • It must support pre-commit review well
  • It must support post-commit review reasonably
  • It must have easy and clear diffs between commits
  • It must make source code easily discoverable
  • It must be fast and responsive
  • It must be extensible
  • It must support automated actions from Wikimedia-approved bots
  • It must have robust ACLs (grant/deny permissions on repository level and branch level)
  • It must support private repositories (Labs puppet)
  • It must support automatic merge on approve
  • It must have command line access
  • It must have a license the Wikimedia Foundation can agree to (or we must have a plan to negotiate it)
  • We must have a plan for on-site-only, private repositories in some cases
  • We must have a plan for automatic repository mirroring

SHOULDs

[edit]

The following are strongly suggested features:

  • It should be beautiful
  • It should be completely intuitive and user-friendly (user interface should be translatable!)
  • It should support pre-commit and post-commit review workflows equally (un-opinionated / does not impose a particular workflow)
  • It should support for pull requests
  • It should enable deep integration with bugzilla
  • It should have LDAP integration
  • It should be open source
  • It should be under active development, and not rely too heavily on Wikimedia involvement
  • It should have built-in SSH support
  • It should be something Ops will migrate to as well
  • It should offer self-serve account creation

Overview of Alternatives

[edit]

This section is a light overview of all of the options we're considering or should consider.

Integrated options

[edit]

Integrated option means it manages the repositories as well as the review process.

Standalone code review tools

[edit]

These tools profess to do the job of code review completely independently of repository management

Standalone repo management tools

[edit]

These tools don't provide code review, but only provide access control for the repository.

  • Gitosis
  • Gitolite

Git repo viewers

[edit]
  • gitweb (default with Git and Gerrit)
  • cgit

Serious Alternatives

[edit]

This section is to enumerate the set of complete alternatives, with serious advocates and a credible plan to address deficiencies. If there is an option you believe deserves serious consideration, please add a new section for it.

Staying with Gerrit

[edit]

The case for Gerrit

[edit]
Primary editor: Chad Horohoe. Others should feel free to contribute.
  • LDAP integration. LDAP is a requirement for tie-ins with Labs and other developer/operations projects. Gerrit supports this out of the box with minimal configuration.
  • Robust ACL. Gerrit has the ability to grant/deny permissions on the repository level and even down to the branch level. This is important because different teams have different workflows for their branches.
    • Additionally, Labs requires a private repository for puppet.
  • Automatic merge on approve, unlike Phabricator that only works with patches (which then must be applied to the repository).
  • Open source, under active development. Using free and open source software is very important to our community, and Gerrit is licensed under the Apache License. Additionally, the Gerrit project has a very active development community that is highly responsive to feedback, questions, and contributions. Google develops Gerrit to support the Android Open Source Project (AOSP), but there are many other contributors, including Qt and OpenStack.
  • Quick release cycle. 2.2.x, 2.3.x and 2.4.x have all been released in the last year (with 2.3 and 2.4 both in the last several months). The branching of 2.5 is currently being discussed.
    • I pushed the Gerrit repository to GitHub as a way of getting some stats about commit activity. The data does indeed show that Gerrit is under active development, with the number of commits per month increasing slightly but steadily over the course of the past year. Ori.livneh (talk) 23:00, 13 July 2012 (UTC)[reply]
  • Handles 'pre-commit' style review. One of the major motivations for switching from Subversion was for the ability to review changes before they landed in trunk/master. This has stabilized our codebase and enabled the rolling release cycle to WMF wikis (currently every 2 weeks, working towards every week).
  • Command line access. It's possible to interface with Gerrit from the command line, which makes scripting Gerrit actions relatively easy, and provides a less hideous alternative to the Gerrit UI.
  • SSH handled by gerrit From a security perspective, it's nice that Gerrit doesn't use the system SSH, which means users don't have shell access to the gerrit host, and we don't need hacks like sillyshell.
    You could simply use a second ssh instance for that. Platonides (talk) 20:44, 13 July 2012 (UTC)[reply]
    • A second ssh daemon would still do system authentication, and would require something like sillyshell. It's nice to be able to separate system authentication from daemon authentication. It means we don't need to have a production system that allows authentication against Labs credentials. --Ryan lane (talk) 23:34, 29 July 2012 (UTC)[reply]
  • Abandoning Gerrit would mean changing a system users just got used to a little bit, and introduce another one that will have its own issues, and scare away beginners who are overwhelmed already by the number of tools and UIs they are supposed to use for labs. Mutante (talk) 09:15, 12 July 2012 (UTC)[reply]
    Part of the migration pains were due to the move from svn to git, which is more complex, so it's not an entirely fair comparison. Besides, those who are already used to Gerrit will have little trouble understanding a more intuitive system, and the future developers will have an easier learning curve to climb. --Waldir (talk) 17:01, 20 July 2012 (UTC)[reply]
  • Git, Gerrit and Bugzilla are all supported by eclipse.
    • This hints that some of the criticism stated is not be 100% relevant.
    • Every time there is a major change in the critical toolsets (bugzilla/git/gerrit/jenkins/labs) this is a major setback in time -- I need to keep abreast of changes in the Java ecosystem (maven/lucene/solr/haddoop/open relavance/carrot/uima/apertium/gate/etc...)
  • Gerrit has a large community behind it, including some organizations we work fairly closely with.

The case against Gerrit

[edit]
Primary editor: David Schoonover. Others should feel free to contribute.
Gerrit flotsam
Gerrit flotsam -- which we will be able to delete in v2.5
  • Gerrit has one key architectural assumption: It uses git commits and git repository to store proposed changes. While this is a relatively simple and nice idea, it has a far reaching consequences. Everything that is proposed (at various stages of readyness) is already committed to the repository. Some other code review approaches (like Linus' email workflow or Mercurial's patch queue) treat changes for review as freely floating patches, not commits (in git this means snapshots of directory trees), thus focusing on differences. Therefore it should be argued that Gerrit does not support pre-commit review, as suggested above; it's post-commit and before eventual merge to the deployment branch. One unit of review, a changeset is just one commit (patchsets are just different incarnations, or revisions, of a single commit). This assumption forces a one commit at a time workflow on developers and forces the use of git commit --amend as the only way to update patches. As a consequence, it is impractical to work on local branches with multiple commits to be merged - this is one of the reasons to use git at all (convenience single commit branches created by git review do not count). Even if multiple dependent commits are pushed to the repository, a need to update one of them breaks a whole chain of dependencies between them, since the original commit is left in the void and the subsequent changes depend on that change in git terms. Commands like git cherry-pick (without -n), git merge (other than --ff-only) cannot be used by unpriviledged developers; this is a consequence of the workflow imposed by the single-commit-based submission. Therefore integration with GitHub (as wished by some) will be almost impossible to implement - at least until Gerrit breaks away with the one change - one commit paradigm.
  • Gerrit is ugly, intimidating, and inhumane. More than anything else, we need people to actually use the system we choose: success is measured in activity, in contributions. If the tool is always in the way, or hard to figure out, or unpleasant to interact with, people will give up rather than use it. Gerrit is all of those things (with a healthy dollop of "doesn't actually work" thrown in at irregular-but-frequent intervals). I would never choose to subject my coworkers to such a tool, to say nothing for people freely volunteering their time and energy. Usability should be the first and most important criterion above all others, and for that, Gerrit has committed a cardinal sin. Dsc (talk)
    • Though this is a very impassioned critique, it lacks any actionable issues. Can you please expand this with exactly how the usability sucks? If we don't know your issues with usability, we may end up with the exact same issues you hate in a new system.--Ryan lane (talk) 18:58, 25 July 2012 (UTC)[reply]
  • Gerrit accounts must be requested. This slows down all contributions, and is particularly egregious in that it slows down onboarding of new WMF staff. Weeks have been lost waiting for all necessary rights to be given to new staff.
  • Gerrit does not have an official API.[1] Existing tools work by reverse-engineering Gerrit's internal APIs[2], which are largely undocumented.
    • Gerrit certainly has the SSH based API. Also, 2.6 has introduced an official REST API.
  • It is not possible to alter the sort order in which changes are listed.[3] Changes are listed by last update date, so the changes that have been neglected the longest are also the hardest to see. A feature request for parametrizing the sort order was accepted in 2010, but no progress has been made as of July 2012.[4].
    • I think this is a rather serious issue. Not being able to sort columns in Gerrit coupled with the fact that recently added/commented on patchsets jump to the top of the list makes it easy for older/unreviewed commits to languish in the backlog. I've made it practice when doing code review to start from the bottom of the stack to help combat this, but it's a total PITA and it is not likely common practice. Arthur Richards (talk) 22:01, 13 July 2012 (UTC)[reply]
  • Gerrit's developers want to move away from using a relational database as a data store. As a result, they are reluctant to invest time in improving Gerrit's query interface.[5] Such a change would entail a costly migration for us.
  • Gerrit's codebase is written in Java and Prolog(!)[6] and uses Google Web Toolkit. Few (if any) engineers at the foundation are proficient with this stack. Large swaths of code in the Gerrit codebase are undocumented, making it hard to understand or modify.
    • I dont see this as a problem necessarily unless we expect to modify the codebase. As for Prolog and Java, I have coded in both languages (it has been a while using Prolog, but it is fairly easy to pick up if ever this became necessary). Ssastry (talk) 15:58, 11 July 2012 (UTC)[reply]
      • I disagree, its a very very minor point to be sure, but things are better when most people can read/edit the code, since its easier to fix issues when they arise, and customize to our needs. (I also know prolog, and like that language, but it is an obscure language). Bawolff (talk) 19:48, 11 July 2012 (UTC)[reply]
  • The ability to add a repository requires administrator privileges on Gerrit. However, administrator privileges also grant unlimited access to gsql, a raw SQL interface to the Gerrit database.[7] As a result, administrator rights cannot be extended beyond a narrow group of people (currently five).[8] This creates a frustrating bottleneck for developers, who must compete for the attention of the overburdened administrator group for routine Git requests.
    • Creating a gerrit project does not require admin access, it requires Project Creation access. Which was granted to a larger group of users. [9] Additionally, there is currently discussions going on to implement a "request queue" type of system so users can say "I want a repo called foo" and the approver can just come along and press a button.
  • Gerrit is slow.[10][11][12] The "Groups" listing takes almost a minute to render.
  • There is no way to delete projects. A request for this feature was opened in 2009 and has been starred by over a hundred people.[13]
    • In Gerrit 2.5, there will be a new plugin that makes this possible. [14]
  • As of 19:59, 10 July 2012 (UTC) There are 164 issues on the Gerrit bug tracker that are status:accepted and have no owner.[15] Of those, two are set as "critical" and twelve are set as "major".[16]
    • I don't neccesarily see that as a bad thing. Have you seen our bug database? Better they track issues that they don't have resources to or otherwise aren't working on now, then to just simply sweep them under the rug. Bawolff (talk) 19:48, 11 July 2012 (UTC)[reply]
  • Post-merge review is impossible. Comments just get lost and bugs are not effective.
    • Lost how? Do they not save, or are there error messages? What do you mean "bugs are not effective"? --brion (talk) 17:49, 11 July 2012 (UTC)[reply]
      • I think that may be the same thing I was going to post. Once a change is merged, it's sacred. You can't mark it -2 even if it's a php error. You can provide a comment, but nothing marks it as requiring attention (how many mails reminding fixmes were sent by hexmode?).
      With a "magic MW VCS" it would be "unmerged", and leave pending for a new merge after it gets fixed. As git doesn't support it (without producing merge conflicts to git users), the most sensible approach would probably be to "freeze" master until a patchset which solves it gets merged. Or even without such strict ruling, if it keeped a track of broken ranges that would be useful (obviously show a big warning when master or a branch is inside such range...).
      Platonides (talk) 20:36, 13 July 2012 (UTC)[reply]
    • post-merge workflows are almost impossible; self-review is a hacky and tedious workaround, see the recent discussion about the operations/puppet repository and the percentage of self-reviews.
  • It is difficult to find or track projects unless someone walks you through it. For example, there is only one very small, hidden link to view the list of projects. This makes the state of any particular branch pretty opaque and potentially creates an issue for managers or others who want to have a window into our code.
  • There is little to no ability for non-technical staff or volunteers to meaningfully contribute (e.g. to processes like code review or simply commenting). Compare to GitHub or even Bugzilla, where FOSS projects see a large amount of comments or other contributions from designers, product managers, or users whether they have familiarity with git or commit access.
    • Can you articulate more clearly what the issue is here? Anyone with an account can comment or even -1/+1 commits. And we have a very large and complex codebase (MediaWiki + extensions + ops stuff + tools and misc), no matter where we host it, which of course makes it harder for people to jump in and comment. Does GitHub offer specific tools that would help engage non-technical folks, or are you just saying "Gerrit's UI sucks" one more time?.--Eloquence (talk) 17:41, 11 July 2012 (UTC)[reply]
      • I suspect the main problem comes down to 'accounts must be requested' up above -- you can't make comments until you've been placed in the system, whereas github, bugzilla etc allow open registration. --brion (talk) 17:46, 11 July 2012 (UTC)[reply]
        • It's not just having to request an account. Unlike tools such as GitHub, Gitorious, or Bugzilla, it is really really hard for anyone who's not a developer to find the relevant projects, relevant commits, and then contribute comments that are not strictly part of the code review process. For example, I had to go ask RobLa in IRC about how to find a simple list of projects. In GitHub, a repository is not just easier to navigate through the GUI (AKA the issue Erik is talking about) but is very tightly integrated with tools that anyone can use to track what is happening in a project (i.e. watching users and repos) and then contact someone when they have a comment. I find it was actually better when we were on SVN and the only Web interface was the comment-less ViewVC, because it forced conversation either on to Bugzilla or on-wiki. One of the largest weaknesses about the MediaWiki development community is the fact that we have historically not done much to include designers, product people, and users in the collaboration on code. This is how we ended up with the MediaWIki we have today, which everyone agrees has deep issues with usability. We need to strike at the heart of this problem by choosing tools that work for everyone who has a stake in the project, not just people committing or reviewing code from a purely technical standpoint. Steven Walling (WMF) • talk 20:38, 17 July 2012 (UTC)[reply]
          • A more general note: from my interaction with the Gerrit community, it is usually embraced by the enterprise deployments. It seems that tighter control on access, permissions and workflow appeals to those kinds of users. For this reason it might be also more suitable for Wikimedia operations than to the development community at large.  « Saper // talk »  23:46, 7 August 2012 (UTC)[reply]
  • UI sucks profoundly: too wide, clumsy, no visual comparison for image changes (ol' good CodeReview did this!), no "view this file" link, just "download as zip" (WTF????). Almost no flexibility when it comes to searching for anything other than the most recent activity overall.
    • This was a configuration issue, we've had image diffs available for a couple of weeks now.
  • Operaphobic
  • Per the UI sucks, It's not desinged to encurage volenteer development, its geered towards a single project, single team proccess for deployable code. Look at the main "landing page" ... there is little concept of what projects are represented, no per project pages with descriptions, no agragate views of which projects are more active than others, no person icons, no sense who is working on what, too focused on individual commits rather then the final change state represented by a pull request for a particualr feature or refactor. There is a false sense of linear pipeline rather than a distributed collection of interconected activity. There is no big "fork me" button! ( there is no per-project interface views, so not clear where you would put that ). Aside from getting deployable code the tools for feature develop should also encurage expeirmentations and be fun. Github is good at doing this, makes coding and sharing "fun". Gerrit is not fun. --Mdale (talk) 23:28, 11 July 2012 (UTC)[reply]
  • Sometimes, development requires a number of commits before a feature/bug/change/refactoring can be fully worked out. So, code review requires a review of a set of commits rather than individual commits. I may not be fully aware of Gerrit's abilities, but a commit-based review model gets in the way of such review flows. Alternatively, it prevents developers from committing something and amending commits till all work on that phase is complete. This may also encourage work in smaller reviewable chunks, so, I am not going to make this a strong case against gerrit. I prefer frequent and regular commits to let others work on it or take over partially done work -- and the in-progress commits may not necessarily be reviewable or need to be reviewed. Ssastry (talk) 03:32, 12 July 2012 (UTC)[reply]
    • This is not a hypothetical situation. I am right now reworking some code and I dont see a way to get everything reviewed at once. I have to either rebase all commits to squash them into one giant commit (which is totally undesirable), or I have to submit all commits as a chain of dependent commits, and hope that I dont have to amend any of the earlier commits as part of a review, or work out an arrangement with the reviewer to accept my commits en masse and that I'll made any fixes as part of a new set of commits which could be reviewed. Am I missing some Gerrit feature which will make this less of an issue? There was an email recently on wikitext-l where Mark Bergsma was asked to squash his commits (http://osdir.com/ml/general/2012-07/msg20847.html) -- I personally think this is bad practice. -- Ssastry (talk) 21:54, 13 July 2012 (UTC)[reply]
    • Related comment: dependent commits are problematic because amending an earlier commit breaks the commit sequence and I have to either amend all later commits or break up the commits into independent branches by rebasing (which I used recently when I forgot to use topic branches). -- Ssastry (talk) 21:54, 13 July 2012 (UTC)[reply]
  • Can't search for file names / changes that touch a file. We no longer have a database table that enumerates every file modified in a change, which makes it impossible to query for the changes that touch a file. [1] Mutante (talk) 09:12, 12 July 2012 (UTC)[reply]
  • It's related to Ssastry's comment a bit above this one, but simplified and generalized: Gerrit suggests using lots of amends. The more natural Git way of improving on an idea is adding another commit in the branch. That's how Github works. It allows to see how the idea developed. As far as I understand it, amends are supposed to be used only for fixing commit messages or embarrassingly mistaken commits, but Gerrit takes amends to the extreme. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)[reply]
  • In some cases Gerrit shows unrelated files as files that were changed in this commit. I never quite understood which cases are these exactly, but it has something to do with rebasing. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)[reply]
    • This has to do with rebasing between changes, and work is being done in master right now to resolve this issue.
  • A changeset can't be applied to several branches. If the fix landed on master has to be committed to REL1_19, REL1_20, wmf/1.20wmf7 you need to submit other three independent changesets. Expected: you could request a merge to several branches on one commit, and perform the merge on that same page (assuming it merges fine, which it probably does). Platonides (talk) 20:42, 13 July 2012 (UTC)[reply]
  • UI: I wanted to look up one of my older reviews on gerrit so I could paste a link here for documenting the dependent commits amend problem, but my dashboard only shows me the most recent X actions and I cannot go back in history. -- Ssastry (talk) 21:54, 13 July 2012 (UTC)[reply]
  • LDAP support in Gerrit is kind of afterthought. HTTP, SSH and other interfaces are highly decoupled inside of Gerrit and there is hardly a common, application-wide, notion of the user and user authentication. That's why SSH supports only keys stored in some SSH-specific store; that's why issue 1124: Use LDAP for ssh keys is not easy to solve.
  • Baroque permissions. It is difficult to find out why something is not working and which permissions are exactly needed to achieve something without detailed analysis of global, group and per-project permissions.
  • Gerrit defeats git capability to support off-line and detached work. While, with some effort (git fetch remote refs/changes/*:refs/changes/*) it is possible to download refs for all changes pending in gerrit, it is impossible to review and comment on the code off-line and send review results upstream. So, no off-line work. Also difficult to work from command-line only (only some operations are possible via SSH). See also Git/Conversion#Rationale.
  • Confusing naming of git refs: You have to use refs/for/master (or refs/publish/master is supposed to take it over in future - TBC) to submit a change (git-review seems to be born to simplify this for end user). Then the patchset gets named refs/changes/14/3714/1 but you can still push changes to refs/changs/3714. Looks like the need to hash refs/changes directory and ability to clone the Gerrit-specific ref tree (git fetch remote refs/changes/*:refs/changes/*) forced Gerrit developers to expose that confusing naming to the end users. Pity that 14/3714/1 label gets exposed in gitweb... Looks like a case number in some Kafkesque trial.
  • Gerrit produces a messy git history, impossible to read e.g. with gitk or bisect, due to the significant amount of (useless) merges.
  • Gerrit's email notifications are close to useless. There's a lot of repetition between mails, little context about the changes and no way to actually get the unified diff. Extending it to do it (something that could be a three-line shell script with git) seems non-trivial.
  • There's no single page diff or side-by-side. There's a "diff all" which spawns a new tab for every file touched in a commit. This makes it difficult to review changes that spawn multiple files without going back and forth between your browser's tabs.
  • Gerrit's urls are not very human-readable and their syntax is not intuitive ("r", "#", "q", ",n,z", ";a=", etc.)
  • Gerrit's project/repository listing is in the admin panel. While the blokes from the project claim it can handle over 9,000 repos, I am not terribly excited about scrolling through that list, or explaining it to new users.

Requirements filled by Gerrit

[edit]
MUSTs
[edit]
  • It must be fast and responsive
  • It must not be ugly
    • Recent stylesheet changes have addressed the problems with the most egregious problems with the default stylesheet (the main background colors and the abrupt boundaries)
  • It must be extensible
    • Gerrit 2.5 adds support for plugins (stable but shallow API) and extensions (unstable but deep API). This should make it easier for us to add MediaWiki/WMF-specific features we want or need (in fact, "Project Deletion" is already being written as a plugin!)
  • It must support pre-commit review well
  • It must support automated actions from Wikimedia-approved bots
  • It must have robust ACLs (grant/deny permissions on repository level and branch level)
  • It must support private repositories (Labs puppet)
  • It must support automatic merge on approve
  • It must have command line access
  • It must have a license the Wikimedia Foundation can agree to (or we must have a plan to negotiate it)
  • We must have a plan for on-site-only, private repositories in some cases
  • We must have a plan for automatic repository mirroring
SHOULDs
[edit]
  • It should be something Ops will migrate to as well
  • It should have LDAP integration
  • It should be open source
  • It should be under active development, and not rely too heavily on Wikimedia involvement
  • It should have built-in SSH support

Plan to make Gerrit conform to requirements

[edit]
MUSTs
[edit]
  • It must be reasonably intuitive and user-friendly
    • Submitting inline comments requires submitting a "cover message" -- this confuses users
    • General usability work
    • Better searching - Lucene support
    • Different sort orders on searches
    • See other bullet points below
  • It must make source code easily discoverable
    • Publically accessible (and findable) project list
    • Well-integrated source code browsing
  • It must support post-commit review reasonably
    • Even for MediaWiki where we prefer pre-commit review, post-commit review on branches would be wonderful
    • Post-commit commenting after merge
    • Need a way to query for FIXMEs in merged code
  • It must have easy and clear diffs between commits
    • May just be a matter of training - perhaps intro video?
    • Merge commits have confusing diffs -- work upstream ongoing to fix this
SHOULDs
[edit]

The following are strongly suggested features:

  • It should be beautiful
    • Beauty is in the eye of the beholder, but most would argue Gerrit needs improvement in this area. Currently, the hardest aspect of improving the appearance is the way CSS is used. (XXX-insert link to Gerrit request).
    • Need better default skin
  • It should be completely intuitive and user-friendly
    • See "reasonably intuitive" section above
  • It should support pre-commit and post-commit review workflows equally (un-opinionated / does not impose a particular workflow)
    • Gerrit is just going to be opinionated, but see the
  • It should support pull requests
    • GitHub integration would be wonderful
  • It should enable deep integration with bugzilla
    • Awaiting plugin support, and someone with the time to do this. Use JIRA plugin as example.
  • It should offer self-serve account creation
    • Ryan Lane has this on his TODO list. (XXX-link to ticket)

See also: Gerrit bugs that matter

GitHub

[edit]

How GitHub would be configured for Wikimedia use

[edit]

Deploy and development repositories:

  • Maintain 2 repositories: one "deploy" repository that has very few admins, and another "development" repository that has a lot more trusted developers.
    • Why not simply use "deploy" and "development" branches? No one should be committing directly to the main repository anyway -- access should be mediated via pull request from user repos. This is the point (and power) of the "fork". Dsc (talk)
  • Code gets into the deploy repository only via pull requests from the development repository -- this can only be done when "trusted" developers/admins do necessary code reviews and issue pull requests to the deploy repository.
  • Periodic downstream rebasing/merging from the deploy repo into the development repo, if necessary (to account for any direct commits into the deploy repo).

To ensure bad code does not get into the deploy repository, development repository might have to be managed as follows:

  • Topic branches are used for code that requires review.
    • There's no need for topic branches if all merges are pull requests from user repositories. Dsc (talk)
  • Commits on the master branch that are not reviewed, and cannot be part of a pull request are moved to a branch.
  • Problematic commits into the development repo either get fixed or reverted.
    • (I know I'm a broken record, but...) There's no need for reverts if all merges are pull requests from user repositories. Dsc (talk)
  • Protocols required for who can issue pull requests into the deploy repository.
    • There's no need for a protocol on issuing a pull request -- anyone should be able to do that. Since each pull request opens an issue, we simply close the bad ones with an explanation of why it's not being merged. No need for a new workflow, and we always have an archive of both successful and unsuccessful requests. Dsc (talk)

Code gets into the development repository in 2 ways:

  • trusted developers (staff, volunteers, aliens) get commit rights and can directly push code.
    • I'm of the opinion that no one should commit code directly to the repository. Trusted contributors should be able to moderate pull requests and accept them for merge, but without going through that process, there's essentially no review. To me, the pull request approval process is the review process.
  • anyone else in the world (who dont have commit access to the development repo) forks the development repo and issues a pull request into the development repo.
  • pull requests are used to do code review.

Reviewing development repo commits:

  • Via github comments -- drawback is that there is no way to track reviews. Given current review requirements via Gerrit, unlikely to be an acceptable model.
  • Via pull requests on topic branches -> master of the deployment repository.
  • Via an external standalone review tool like barkeep.

The case for GitHub

[edit]
  • Better usability all around
  • Anyone can participate in development -- creating user accounts, forking, commenting, submitting issues, etc. are all easy and open to anyone and removes WMF from the loop as far as these areas are concerned.
  • Commit rights determine who gets to push code to the repo.
  • Open source projects on GitHub are accessible to thousands of the most skilled, professional Web developers around,[17] making it the best location to recruit new participants without spending donor funds on outreach
  • Already includes outstanding materials for introducing developers to git,[18][19] providing better documentation than we currently have
  • Strong cross-platform support (at the desktop and browser level)
  • Allows access using SVN[20] - which means people used to either SVN or git can both contribute without extra learning curve.

The case against Github

[edit]

Drawbacks:

  • Reviews are done through pull-request (a bit like Gerrit merge requests). However contrary to Gerrit, those with the rights to merge to master are also able to push to master directly (bypassing review).
    • This is not really a drawback, in Gerrit it is also possible to practically push to master by self-reviewing. This is a matter of setting rules, it is not an issue of trust or technical requirements. GitHub doesn't force people with merge access to push, it just means they are technically able to. It's no different from Gerrit in practice. The difference is that anyone (whether someone with merge access, a known volunteer or a brand new contributor) can submit a pull request.
  • Depending on review requirements, could require a standalone review tool like barkeep to track reviews.
  • Depends on developers acting trustworthily and following protocols (ex: use topic branches, use pull requests to merge into master, etc.).
  • Requires MW contributors to get an account at a third party site which is time consuming
    • (discussion) Really? How is that time consuming compared to having to manually request an LDAP account and waiting for someone to review and create it. At GitHub you create an account in under a minute and off you go without needing dedicated user rights. Krinkle (talk) 05:08, 18 July 2012 (UTC)[reply]
    • It would also decouple getting a developer account from getting a Labs account, which makes the barrier to entry to Labs higher.
  • Unnecessarily bringing in a further foreign authority is against hacker ethics, as hacker ethics call for decentralization.
  • Github's terms of service contain items that allow to disconnect users at GitHub's sole discretion: “If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.”[21]
  • github.com's web ui itself is not free software nor open source. We want to use free tools to develop free software, so use alternatives like gitlabs hosted on our own infrastructure. (independent from the gerrit evaluation). There is an Enterprise product to have github on your own servers. [2] but we should not pay if there are free alternatives. Mutante (talk) 09:37, 12 July 2012 (UTC)[reply]
    • While I understand and appreciate the impetus behind this, 'free tools to develop free software' principle also means all mediawiki developers would have to stop using macs, ibooks, iphones, ipads during development. That means, Wikipedia could not be supported on iphones. No using Amazon EC2 either. I am being facetious, but only to make a point. You have to draw the line somewhere. To me, more pertinent is the ownership of code, data, and interactions on the site. There is something to be the said for the collaborative and social nature of Github. -- Ssastry (talk) 12:16, 12 July 2012 (UTC)[reply]
      • It's one thing to allow people to work with/for us using non-free tools (e.g. many Wikimedia Foundation employees prefer using some non-free stuff because they feel more productive and happier using those tools). It's another thing to make use of non-free software (and agreement to a corporate EULA) a requirement to work with us. The latter is much dicier, especially for volunteers. -- RobLa-WMF (talk) 18:54, 18 July 2012 (UTC)[reply]
    • We can't use github enterprise. Totally ignoring the high cost, it's terms disallow public repos, which is a non-starter.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)[reply]
  • This is not inevitably a minus, but at least a flag: GitHub recently received $100M in VC funding. Companies that receive large amounts of funding often pivot in order to meet investor objectives, so there may be large and sudden changes in GitHub's strategy affecting open source developers. While one can't strictly speak of vendor lock-in given that Git itself is open, moving our entire tooling to GitHub and then moving away from it if they do something we don't like is likely to be very difficult (given how difficult it is today to move away from Gerrit, which is entirely open source). While we can at least fix Gerrit if it bugs us, we will have limited recourse with GitHub. IMO it would be strategically unwise and inconsistent with our values to support a semi-open ecosystem and put our eggs in the GitHub basket.--Eloquence (talk) 17:37, 17 July 2012 (UTC)[reply]
    • Note though that because of the nature of Git, we don't truly rely on GitHub for anything. The repository can be pushed anywhere else, so in case of an emergency decision to move away from GitHub there is nothing that keeps one from doing that. "Everybody" has a local mirror of all public history. I agree, though, that it would be inconsistent with our values. Krinkle (talk) 05:13, 18 July 2012 (UTC)[reply]
  • It appears that github does not like giving access to bots (TOS: “You must be a human.”). We'd have to work around this item in a legally sound way to get things like L10n-bot.
    • This is false: GitHub has a fantastic and well-documented API[22]. Bots merely must be operated under a user account, via an API key (like all respectable websites). Dsc (talk)
      • Correct me if I am wrong, but it seems to me that this API is pretty useless for our L10n-bot, as the API does not allow to push. Besides, the current API v3 is rate limited. While I do not know if the 5000 requests/hour are a currently a problem for our users (as the limit is per user, not per bot), we have no authority in bumping this limit.
  • No true pseudonyms on github. Github at least knows your real name, even if it might not show it (TOS: “You must provide your legal full name, [...]”)
    • This is a fairly major point. We actively encourage users to use pseudonyms if they want to. In some countries (Japan for instance), people don't use their real names online, and prefer to be totally anonymous if possible. Enforcing a real-name policy ensures that we will alienate a number of potential developers.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)[reply]
  • Operations would very likely not move their repos to github, as we can't trust our repos to a third party for security reasons. If we used github we'd split our workflow between two places, and that's a bad idea.--Ryan lane (talk) 18:55, 25 July 2012 (UTC)[reply]

Phabricator

[edit]

How Phabricator would be configured for Wikimedia use

[edit]

Phabricator consists of:

  1. Code review (Differential) -- This is the part we need
    Note: since 2018 a volunteer proposed to dismiss Differential and we are doing it (phabricator:T191182)
  2. Command Line Support (Arcanist) -- Nice to have, Gerrit also has
  3. API (Conduit) -- see below
  4. Source browser (Diffusion) -- Nice to have, needs further examination
  5. Bug/issue tracker (Maniphest) -- Not necessary for us (part of another discussion)
  6. Wiki (Phriction) -- Useless to us
  7. Support for different versioning systems -- not needed
  8. Notification and flagging system (Herald, Flags) -- Nice to have, needs further examination

What Phabricator is missing for our workflow

  • robust ACLs? see [3].
  • ...

Resources:

The case for Phabricator

[edit]
  • FOSS and PHP-based. Even babies and dogs can contribute.
  • Active developer community: ~1200 commits/year from 40 authors (Phabricator) compared to 500 commits/year from 50 authors (Gerrit) [4]
  • Has an official API, unlike Gerrit
  • Good command line support through Arcanist
  • Includes project activity feeds that normal humans can use
  • Search works very well, unlike Gerrit
  • Supports open registration with a variety of methods
  • Pre-commit and post-commit review workflows

The case against Phabricator

[edit]
  • Has stuff we don't need
    • Removing options from a software package being somewhat less than ideal, this might be problematic, but since the main bits of the software are rather good, it might be worth it to strip away what we don't need, or simply ignore it. --MarkTraceur (talk) 18:35, 26 July 2012 (UTC)[reply]
  • Doesn't manage repositories - would have to be interfaced with another repository management tool, or such a tool would have to be built
  • The permissions system doesn't seem to be very fine-grained, e.g. not designed with large public projects with varying levels of access involved (volunteers, staff, etc.)
  • The test instance seems to require logging in to view anything?

Online Demo

The case for GitLab

[edit]
  • Well maintained (active, monthly release schedule)
  • Inspired by GitHub, looks a lot like it
  • Self-hosted, not an external service
  • Easy to use & pretty
  • Manages repositories (using a custom-written system) and users; not just a code review tool
  • Sane repository listing
  • Supports LDAP authentication
  • Granular permission scheme
  • Pre-commit and post-commit review workflows
  • Has an external API
  • Somewhere in the middle between Gerrit & GitHub

The case against GitLab

[edit]
  • Impossible to find anything with the user interface.
  • [No longer true] No way to insert inline comments like with Gerrit, which is far supperior in usability btw!.
  • All change diffs in a single page, which is completely dumb. Now way to view the changes file by file.
  • [No longer true] Comments are published immediately which is extremely counter productive. With Gerrit you review the code, place your comments, review your comments and publish them when you're done. Much better.
  • No way to amend changelists once it's pushed. Creates ridiculous situations where you basically have to create new CLs to address code review comments!... rather than amending the existing change list.
  • All in all, a very very poor code review tool!... Gerrit is far more usable albeit a little harder to setup.
  • [No longer true] "does not allow branch-level access control" [5] (note: doesn't this conflict with the "Granular permission scheme" item above?)
  • [No longer true] "All projects are system level, you can’t create project under your account. As a developer, you can only create branches in the repository."[23] (note: how is Gerrit different?)

Other points to consider

[edit]
  • Written in Ruby on Rails
  • Uses a "pull/merge request" workflow (that some people love and others hate)
  • Includes an issue tracker and a... wiki; probably easy to disable, at least in the HTML from showing
    • The bug tracker might be an advantage: it could replace Bugzilla, people seem to want a replacement for that as well and a separate account would no longer be necessary
      • [no longer true][24] "Issues are not related to commits; in order to fix an issue, you need to submit a merge request, and then close that issue."[23]
  • [No longer true] All projects are system level; users can't create projects under their account. Developers can only create branches in the repository.[23]
  • [No longer true] "All projects are private; the project owner needs to assign other developers to his/her project."[23]

Extended evaluation

[edit]

How this would work

[edit]

This alternative is the "keep looking" alternative. This assumes that dissatisfaction is very high with Gerrit, but that we won't have the tools to make a decision by August 10. This proposal suggests that instead of spending time and money improving Gerrit, the Wikimedia Foundation invests in a more serious and disciplined investigation of the serious alternatives above.

The case for extending the deadline

[edit]
  • The people arguing against Gerrit don't have enough time to fully stage alternative setups, and that is what is needed to make an informed decision
  • See "the case against Gerrit" above.

The case against extending the deadline

[edit]
  • This takes away from our ability to invest in making Gerrit better, without actually making life easier for developers for the foreseeable future.

2015 comparison by iOS team

[edit]

See https://phabricator.wikimedia.org/T95749


Features Qualities

CI integration task integration notifications lock-in git workflow viewing diffs/changes ease of use iteration during code-review community engagement community support
GitHub yes, has hooks for many CI providers (jenkins, hubot, travis, etc.) need to add GH integration/webhook to phabricator, has integrations for other task managers though supported, needs setup easy to push git repo to another remote nothing on top of git except for pull-requests (as flexible as git is) state of the art simple as pushing to a branch (forced/rebase/squashed whatever you want) iOS/ObjC/Swift goes to GitHub as the norm, already getting PR's (though they get ignored) most tools that come out look to integrate as seamlessly as possible, much more likely to get support if GH integration is broken
Gerrit only what we set up (existing zuul/jenkins hooks) already integrated with Phab, unclear how to integrate with other tools already setup to notify IRC channels depending on repo easy to push git repo to another remote forced to use branching scheme and have change-id hook, severly lacking, many usability issues forced to amend commits, losing history and comment continuity can't think of many public iOS projects except ours using Gerrit for code review. there seems to be some community support for gerrit, but our lack of internal enthusiasm makes it challenging to get support for making gerrit better
Differential + Arcanist currently unsupported part of phab, so integration should be seamless unclear support, but can be done through CI if not through VCS easy to push git repo to another remote seems more analogous to GitHub, but unclear what the flow is actually like seems better than gerrit, but unfamiliar seems better than gerrit, but unfamiliar unknown unknown, maybe get support from other Phab users or Phab maintainers?

Notes

[edit]
  1. https://groups.google.com/forum/?fromgroups#!topic/repo-discuss/RC4xNQVCMRs
  2. https://github.com/bcwaldon/python-gerrit/blob/master/README
  3. http://gerrit.googlecode.com/svn-history/r3021/documentation/2.1.4/user-search.html
  4. http://code.google.com/p/gerrit/issues/detail?id=536
  5. http://code.google.com/p/gerrit/issues/detail?id=536
  6. https://gerrit.googlesource.com/gerrit/+/master/gerrit-server/src/main/prolog/gerrit_common.pl
  7. http://gerrit.googlecode.com/svn/documentation/2.1.2/cmd-gsql.html
  8. https://gerrit.wikimedia.org/r/#/admin/groups/1,members
  9. https://gerrit.wikimedia.org/r/#/admin/groups/119,members
  10. http://code.google.com/p/gerrit/issues/detail?id=950&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  11. http://code.google.com/p/gerrit/issues/detail?id=954&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  12. http://code.google.com/p/gerrit/issues/detail?id=1370&sort=priority&colspec=ID%20Type%20Stars%20Milestone%20Status%20Priority%20Owner%20Summary
  13. http://code.google.com/p/gerrit/issues/detail?id=349
  14. https://gerrit-review.googlesource.com/#/admin/projects/plugins/delete-project
  15. http://code.google.com/p/gerrit/issues/list?can=2&q=status%3Aaccepted+-has%3Aowner&sort=owner&colspec=ID+Type+Stars+Milestone+Status+Priority+Owner+Summary&cells=tiles
  16. http://code.google.com/p/gerrit/issues/list?can=2&q=status%3Aaccepted+-has%3Aowner&sort=owner&colspec=ID+Type+Stars+Milestone+Status+Priority+Owner+Summary&cells=tiles
  17. https://github.com/popular/watched
  18. https://github.com/training/free
  19. http://learn.github.com/p/intro.html
  20. https://github.com/blog/966-improved-subversion-client-support
  21. https://help.github.com/terms-of-service
  22. http://developer.github.com/
  23. 23.0 23.1 23.2 23.3 http://blogs.perficient.com/multi-shoring/blog/2012/05/02/opinionate-review-on-gitlab/
  24. Issues, Merge Requests and Integrations in GitLab