Jump to content

Manual talk:Coding conventions/PHP/Archive

From mediawiki.org

Assignment expressions

This is neither an error nor is it surprising, and in 5.3x allows direct access to referenced element:

$this->assertInstanceOf( 'ExpectedClass', $obj = new $testClass );

- Amgine (talk) 05:10, 25 August 2012 (UTC)

Same to me. It s neither surprising nor is space cheap (as my displays are always having less lines than desirable for a good code structure overview) and it is much much less readable. You do not forbid $x++ either, do you? --Purodha Blissenbach (talk) 09:16, 2 October 2015 (UTC)

"comment should be put on its own line." ?

This rule can be found in the Spaces section of these conventions. Apparently it was moved here in r547162. I can't really find the related text in the old rules in the Coding conventions diff between before and after the move so this has been added during the move, please correct me if I am wrong. So does this rule make any sense? Comments are used in the same line with code all over the core code. --Danwe (talk) 12:58, 14 September 2012 (UTC)

Since nobody seems to know about this, I have just removed the rule again as described above. --Danwe (talk) 10:01, 27 September 2012 (UTC)

Spacing within brackets

Judging by the examples, I guess we don't use spacing within brackets; so use, e.g., $options['q'] rather than $options[ 'q' ], eh? Leucosticte (talk) 17:39, 10 November 2012 (UTC)

Correct. I've noticed several extensions incorrectly leave space around $wgHooks keys, e.g. $wgHooks[ 'BeforePageDisplay' ][] = ... -- S Page (WMF) (talk) 06:29, 6 July 2013 (UTC)

Ternary contradiction

It says we don't allow it because it only works in PHP 5.3, before noting we now require PHP 5.3. So the only reason to forbid it would be for reasons of readability.

I think it's a useful occasional shortcut, so we should allow it. But if we want to forbid it, it should just say that's a convention we're setting. Superm401 - Talk 06:52, 11 January 2013 (UTC)

@param format

There are two ways given to use @param:

  1. @param type $varname: description
  2. @param datatype1|datatype2 $paramname description

The first one specifically says, "Multiple types can be listed by separating with a pipe character." so, the only real difference is the colon (ignoring the space for now).

The second is correct. To show examples, isUtf8 (generated doc, code) does it that way, and it is correct in the HTML. The type is italicized on the left (if you inspect it, you'll see the HTML class is "paramtype". There is no extra comma for each parameter in the generated docs.

For an example with colons, you can look at a method called outputFileHeaders (generated doc, code). It has two parameters, one strictly matching the first, and the other with an extra space before the colon.

But both the ways with colon are wrong. You see both colons and in the HTML, and there's a spurious comma after one. There's a lot more inconsistency in other methods. Bottom line, I propose we use:

@param datatype1 $paramName description

It's the same as #2 above, except I made the parameter name match our conventions, and removed the second type. To be complete, we can also show:

@param datatype1|datatype2 $paramName description

I'm not proposing a big fix up commit, just that we do this for new code and when we're already touching existing code. Superm401 - Talk 07:30, 11 January 2013 (UTC)

+1 But what about @param $paramName datatype1|datatype2 description I think it is also used and seems to work fine. --Danwe (talk) 22:19, 11 January 2013 (UTC)
It is used, but it actually does not work (maybe it once did, e.g. with PHPDoc) right. Example, Article::_construct (generated doc, code). Unlike isUtf8], the types are not in the right place, nor do they use the paramtype HTML class. They're just crammed into the description. Superm401 - Talk 14:20, 12 January 2013 (UTC)

Control structures

This page currently says this:

Opinions differ as to whether control structures if, while, for, foreach etc. should be followed by a space; the following two styles are acceptable:

// Spacey
if ( isFoo() ) {
	$a = 'foo';
}

// Not so spacey
if( isFoo() ) {
	$a = 'foo';
}

Is it time to pick one, at least for new code?

Some statistics: in mediawiki/core, it seems about 84.7% uses the spacy style. Broken down by keyword:

Keyword Spacey Not-so-spacey % spacey
elseif 1394 250 84.7932
for 231 67 77.5168
foreach 1987 413 82.7917
if 14797 2476 85.6655
switch 138 116 54.3307
while 297 74 80.0539
TOTAL 18844 3396 84.7302

It looks to me like we should standardize on "spacey". Anomie (talk) 18:58, 22 January 2013 (UTC)

+1. --Amir E. Aharoni (talk) 19:06, 22 January 2013 (UTC)
Support. --Krenair (talk • contribs) 19:26, 22 January 2013 (UTC)
Support I think we already use spaces where most people don't (inside parens). If we now decided not to use them where most people do (after control structures) I would really call our coding conventions perverted. So +1 for spacey here. --JGonera (WMF) (talk) 19:37, 22 January 2013 (UTC)
Oppose Oppose A space before the first opening parenthesis should match all other function calls, instead of having a separate one. Think of the new developers - we break away from most coding practices of other projects, shouldn't we try to minimize the differences rather than introduce more of them? There is no need to have a unique one for control structures - they are already highlighted by the IDE and followed by the indented block. --Yurik (talk) 21:58, 22 January 2013 (UTC)
Can you provide examples of popular open source projects which don't use spaces after control structures? I just checked the first that came to my mind (Linux kernel) and it does have spaces. On the PHP front, the most popular web frameworks (Symfony, Kohana, lithium) also use spaces. --JGonera (WMF) (talk) 22:40, 22 January 2013 (UTC)
Funny - I just checked linux kernel and gnu specs too, and they have a space after if (), BUT they do not have spaces inside the (), which balances the visual of the parens. A quick google code search shows over 4 million hits for various projects, but this number is significantly lower than the one with the space. Again, it's not a fair comparison because they don't have spaces inside. What I meant initially was that I don't know any major projects that puts spaces both inside and outside, so we should try to get rid of at least one to match. --Yurik (talk) 01:56, 23 January 2013 (UTC)
Support I'm fine with the status quo, which is also consistent with our JS coding style. Let's not change the standard away from what 84% of the code already uses. --Catrope (talk) 22:08, 22 January 2013 (UTC)
Stylize.php also converts forcefully to spacey format. --Nikerabbit (talk) 07:43, 23 January 2013 (UTC)
Spacey is a bit closer to PSR-2, which the PHP world seems to be moving towards. --Tgr (talk) 11:38, 26 January 2013 (UTC)
Support Platonides (talk) 16:52, 27 January 2013 (UTC)
Support I really don't care as long as we pick one. Parent5446 (talk) 05:27, 27 February 2013 (UTC)

When I wrote the document, I left the question open out of respect for Brion, who wrote code using the not-so-spacey style. I was blind to the difference until I wrote stylize.php, which demonstrated differences between my style (which stylize.php represented) and code written by Brion. -- Tim Starling (talk) 06:53, 13 May 2013 (UTC)

Recommending some minimum documentation practices

I would like to propose adding the following paragraph to the beginning of the "Comments and Documentation" section:

It is essential that your code be well commented so that other developers and bug fixers can easily navigate the logic of your code. This is especially important for an open source project like MediaWiki where technical complexity can be a deterrent to volunteer collaboration. To make your code easier to navigate, it is recommended that you add comments providing brief descriptions of new classes, methods, and member variables (unless the functionality is obvious). It is also recommended that new methods include parameter and return definitions when applicable.

Kaldari (talk) 23:57, 5 July 2013 (UTC)

(ec) this seems like something that pretty much went without saying. The only part i would remove is deterent to volunteer contribution. Decent docs benefit everyone. I wouldnt want it to come across like we are only documenting things to appease the pesky volunteers Bawolff (talk) 00:25, 6 July 2013 (UTC)
Honestly, this is supposed to be an enforced document, so I'd prefer if we worded things a bit more strongly. For example, something like:

New methods must have their return type and parameters' types and values documented using Doxygen conventions. New methods should also have a description of what the function does or, if an abstract function, what the function is expected to do. Additionally, all code should be commented as necessary to allow easy understanding of the code's intentions.

My motivation for this is because I'm getting sick of all these abstract classes without any clear documentation as to how to inherit them. Parent5446 (talk) 00:20, 6 July 2013 (UTC)
Code should be clear and easy to understand. Comments can help with this. They are however best used as a last resort, when your efforts to make the code be obvious on its own, failed. To quote Fowler "When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous." I recommend keeping comments to a minimum, as they can easily end up hurting readability. "Gets the user name" is a useless comment for method "getUserName", and ends up just cluttering the code, teaching the reader to ignore the comments. Comments also tend to go out of date, get moved about, and often end up being disinformation. So lets please not have some absolutist policy stating that every method should have a comment. That is going to hurt more then help. --Jeroen De Dauw (talk) 01:25, 6 July 2013 (UTC)
The main types of comments are prologue and inline. The first should always be used on public methods, non-trivial protected methods or ones meant to be overriden. Classes themselves should have prologue comments too. That second should always be used for clever/tricky things or things that are done for very non-obvious reasons, remote the code in question (e.g. some subtle aspect of another system). They can also be used for non-obtrusive and quick local variable docs (e.g. " list of (a => b)") and various other, more-optional, reasons. All this IMO of course, so I tend to agree with having some minimum standard. And codifying it is nice since it is obviously not common sense for MediaWiki. Aaron (talk) 03:21, 6 July 2013 (UTC)

Agreed. Personally I think we should build tools to support this. For instance if there was a script that auto-compiled documentations for all active extensions this would encourage adding documentation by highlighting where it is missing. I'd suggest breaking out the conversation around documentation of methods from 'commenting'. Commenting in my opinion should be used as little as possible. Documentation by standard. Jdlrobson (talk) 20:27, 8 July 2013 (UTC)

I've added a new intro paragraph to the section based on the feedback here. It emphasizes documentation over inline comments and uses slightly stronger language than my original wording. Kaldari (talk) 00:36, 10 July 2013 (UTC)

Interview, narrative, lessons-learned approaches

I read in the research in Making Software that the main thing people often wish they could find in documentation, but can't, is the rationale behind a particular decision. Why was it architected that way? What other approaches did the author consider? Was this a workaround, a bugfix, an experiment? They want the backstory, the narrative, to understand how this fits in with the larger norms of the developer community, and important events. And if the author doesn't happen to explain that in a commit message, a Gerrit code review comment, or a Bugzilla thread, then the reader can't get at it without finding the author and asking, months later.

I don't know where the best place is to put stuff like that. A commit summary might be best, as long as git blame works well. Some approaches:

  • Interview. Ask another developer to pretend to interview you about what you chose and why, and write down the answers.
  • Narrative. Some recommendations say "every commit message should be a user story readable by a nontechnical executive" but in most cases that goes too far. Instead it might work to have a developer walk through the narrative of their development process in their commit summary: "I started by thinking foo, then I ran across problem bar, so I modified this to account for baz."
  • Lessons learned. In a commit summary, include a lesson you learned about this bit of the codebase.

This is a bit weird but might be helpful. Sharihareswara (WMF) (talk) 06:24, 27 July 2013 (UTC)

Triple-equal operator

I was under the assumption that === is preferred to == unless there's a reason to use ==. Now that I checked I don't see that written here. Should it be written? --Amir E. Aharoni (talk) 13:35, 2 December 2013 (UTC)

Yes. --siebrand (talk) 13:50, 2 December 2013 (UTC)

Braces

Is the use of braces mandated within the MW coding standard? For instance, it seems like this:

 foreach ( $res as $row ) {
     showRow( $row );
 }

is more common than this:

 foreach ( $res as $row )
 {
     showRow( $row );
 }

and this:

 if ( $i == 1 ) {
     foo();
 } else {
     bar();
 }

is preferred to this:

 if ( $i == 1 ) {
     foo();
 }
 else {
     bar();
 }

Can someone comment? Should it be part of this document? @User:Sharihareswara_(WMF) Rs561 (talk) 19:10, 21 May 2014 (UTC)

@Rs561: It is mandated, yes (personally I dislike the style we've adopted, but such is life). Not sure how best to document. Jdforrester (WMF) (talk) 19:57, 21 May 2014 (UTC)
I see someone went and added this information to this page. I replaced it with a reference to Manual:Coding conventions#Indenting and alignment, since there's no need to have two copies of the information that would get out of sync. Anomie (talk) 13:18, 23 May 2014 (UTC)

Static Binding

In PHP 5.3, a new feature called "Late Static Binding" (LSB) was added to help work around this perceived lack of functionality in static functions. However, the usefulness of LSB is debatable among MediaWiki developers and should be avoided for the time being.

I believe we should use static in new code written to avoid issues with binding while leaving self on old code to avoid possible bugs. static has the binding behaviour on php that you would expect if you come to it from other languages NRuiz (WMF) (talk) 16:51, 22 May 2014 (UTC)
I see several uses of "new static" or "static::" in core. I wouldn't necessarily go recommending using static over self when people don't know why they're using static instead of self though. Anomie (talk) 13:13, 23 May 2014 (UTC)

Namespaces vs. prefixes for classes

I've seen a few instances of namespaced class names like "WikiGrok\Api\ApiWikiGrokResponse" or "OAuth\Frontend\SpecialPages\ SpecialOAuthRegistrationFrontend" that combine both prefixing/suffixing and namespacing. This seems totally redundant. Would it be worth adding something to the Classes section to discourage this redundancy? Kaldari (talk) 18:47, 23 October 2014 (UTC)

I'm not seeing "OAuth\Frontend\SpecialPages\SpecialOAuthRegistrationFrontend" anywhere in mediawiki/core or mediawiki/extensions. Nor the ApiWikiGrokResponse class, but that may be because WikiGrok doesn't seem to be in either of those repos yet either. Anomie (talk) 13:30, 24 October 2014 (UTC)

Recent edits

Krinkle recently made two major changes to the coding conventions:

  • Forbidding use of empty(), based on the fact that he doesn't trust people to use it correctly. When used when !isset( $foo ) || !$foo would otherwise be used, I don't see any reason to avoid it since that's its explicit purpose.
  • Forbidding casting of arrays to booleans, apparently for the same reason. I doubt the claimed advantage to using count() is worth the tiny cost of the additional function call.

Personally, I'd rather not mess around with the extra verbosity. But let's discuss. Anomie (talk) 22:32, 15 December 2014 (UTC)

Do interfaces start with an I?

Going forward, should interfaces start with an I (e.g. IDatabase), or not (e.g TitleFormatter)? Mattflaschen-WMF (talk) 15:21, 8 September 2016 (UTC)

In general, I'd say no. Sometimes it makes sense if the interface is being split from an existing base class, but often enough it's as unnecessary as prefixing member fields with "m". Anomie (talk) 13:20, 9 September 2016 (UTC)

Can we enforce no-spaces-after-cast?

The guidelines here say that we "do not use a space within or after the cast operator", but this is not currently enforced by the codesniffer rules. There seems to be a bit of a long history of discussing this, and the Generic.Formatting.NoSpaceAfterCast and Generic.Formatting.SpaceAfterCast rules have been added and removed at various times (see here and here for some of the discussion).

I can't see the advantage to having this not tested one way or the other by phpcs, so would like to submit a patch that re-adds Generic.Formatting.NoSpaceAfterCast to the MediaWiki CS ruleset. I thought I'd bring it up here first though, because obviously there's more to this than meets the eye! :-)

Thanks! Sam Wilson 03:23, 28 October 2016 (UTC)

I don't see any reason not to. Glancing through the links, it seems only Krinkle has stated objection to the current text of this page and has unilaterally blocked enforcing the style in phpcs. If there has been other discussion on this issue where others have supported Krinkle's position, links would be appreciated.
For the record, "no spaces after a cast" was added in February 2013, removed (by Krinkle) in June 2013, and readded in August 2013. It seems to have stood since then. Anomie (talk) 13:48, 28 October 2016 (UTC)
Okay cool! I've created task T149544 and submitted Gerrit change 318880. Sam Wilson 04:36, 31 October 2016 (UTC)

Spaces inside declare()

I think declare statements should be mentioned in the CC. Should they be formatted like normal function calls?

Example 1:

declare( strict_types = 1 );

I prefer this option as it would be more in alignment with the "space-heavy" style of the rest of the convention.

Alternatively, we could adopt a more terse style, to make the declarations kind of stand out:

declare(strict_types=1);

What do you think?

--Gabriel Birke (WMDE) (talk) 14:56, 12 April 2017 (UTC)

At the moment there's probably little reason for us to use declare since we haven't dropped support for PHP5 and PHP5-compatible versions of HHVM. But if we do use it, as we do here, it should not be a strange exception to the spacey style we use everywhere. The only question would be whether it should be treated as a control structure or not, i.e. declare( ... ); and declare( ... ) { ... } versus declare ( ... ); and declare ( ... ) { ... }. Anomie (talk) 13:24, 13 April 2017 (UTC)
Apparently, declare goes back to PHP 4, though I've never seen it before. It is a "flow control construct[]", per docs, so it should be spacy, exactly like if. encoding was added in 5.3 (though I don't see why we would want to use it), and strict_types is added in PHP 7 (and thus won't be available for a while). Mattflaschen-WMF (talk) 05:06, 18 April 2017 (UTC)

Declaring multiple properties with same public/private/protected statement

PHPCS is currently OK with lines like:

private $mBlacklist = null, $mWhitelist = null;

What about the following (taken from TitleBlacklist):

	private
		$mRaw,           ///< Raw line
		$mRegex,         ///< Regular expression to match
		$mParams,        ///< Parameters for this entry
		$mFormatVersion, ///< Entry format version
		$mSource;        ///< Source of this entry

Should the latter be allowed? This was originally reported as phab:T166381. Legoktm (talk) 19:17, 6 June 2017 (UTC)

I'd say not. It makes more sense to format these in the normal way:
	/** @var string Raw line. */
	private $mRaw;

	/** @var string Regular expression to match. */
	private $mRegex;

	/** @var string Parameters for this entry. */
	private $mParams;

	/** @var string Entry format version. */
	private $mFormatVersion;

	/** @var string Source of this entry. */
	private $mSource;

Sam Wilson 23:47, 6 June 2017 (UTC)

@Legoktm and Samwilson: It should be formatted as given at Manual:Coding_conventions/PHP#.40var:_documenting_class_members, or another way with equivalent Doxygen formatted (HTML) output. I am certain the first two examples do not have proper output (since they have no types). I have tested similar examples to Sam's, and believe that also won't render correctly at doc.wikimedia.org (correct me if I'm wrong, though). Mattflaschen-WMF (talk) 22:50, 7 June 2017 (UTC)
@Mattflaschen-WMF: hmm yes good point, I always forget how different Doxygen is to the rest of the PHP world! I now remember having read about this problem before. Sam Wilson 22:56, 7 June 2017 (UTC)
Thanks, Sam. That bug links to a StackOverflow answer, which might be helpful to avoid the redundancy (repeating the variable name). We already have an input filter; that could be added if it's reliable enough to handle our code base. However, we should make sure the format the filter accepts matches what IDEs expect (for type-hinting, etc.) Mattflaschen-WMF (talk) 23:48, 7 June 2017 (UTC)
BTW, I can't reproduce the bug (needing to repeat $fieldName) as of today. Not sure the exact situation or what changed. Mattflaschen-WMF (talk) 20:36, 28 June 2017 (UTC)

Why not use boolean conversion to test for empty arrays, if you know it's an array?

This edit added a restriction to use === [] to test for an empty array. That makes sense if the variable might be a string, but if you know it's an array that seems like completely pointless verbosity. Is there some reason for this that I'm missing? Anomie (talk) 12:52, 2 July 2018 (UTC)

In a generic style guide like the one we are talking about here I don't want to encourage users to replace if ( $var === [] ) with if ( $var ) without fully understanding the consequences. Sure, if a variable is guaranteed to be an array, and this guarantee is easily visible in the source code, there is nothing wrong with if ( $var ). But more often than not there either is no guarantee, or it's not easily visible. Reviewing === [] is almost a no-brainer – compared to a scarily vague if ( $var ) that enforces the reviewer to read the rest of the code to make sure no unintentional boolean conversion can happen. Example:
function ( array $var ) {
    if ( !$var ) {
        // Still guaranteed to be an array at this point
    }
}

// Unspecific boolean casts can easily introduce bugs the moment the surrounding changes:
function ( array $var = null ) {
    if ( !$var ) {
        // $var is not guaranteed to be an array any more
    }
}
--Thiemo Kreuz (WMDE) 13:29, 2 July 2018 (UTC)
That doesn't really answer the question, though. Even in your second example, boolean conversion is less wordy than if ( $var === null || $var === [] ), and the more-wordy option would still have "$var is not guaranteed to be an array" as a potential issue. Anomie (talk) 13:24, 3 July 2018 (UTC)
You might need to rephrase your question then. --Thiemo Kreuz (WMDE) 17:24, 9 July 2018 (UTC)

Directory Structure

Can we briefly document the recommended directory structure for extensions and/or composer libraries (if they are different). For the Parsoid/PHP port, we are proposing to put the main PHP source code in src/, which we believe is the "new" convention for composer libraries and most PHP code. But a lot of the MediaWiki code base uses a different convention -- for example, core uses includes/ and small extensions often put their source code directly in the top-level directory. I'm proposing that we standardize and recommend src/ for new code. cscott (talk) 21:58, 8 January 2019 (UTC)

@Cscott: Sounds like a good idea; src/ is definitely the norm in the PHP world these days. Best practices for extensions currently says "MUST: Using the following standard directory layout: includes/ or src/: Contains all (and only) PHP classes". I think deprecating includes/ is a good way to go. Sam Wilson 22:40, 8 January 2019 (UTC)

Declaring of return types

PHP 7 adds the ability to declare the return type of a function or method (docs), and this is supported in HHVM 3.18 too even in PHP5 mode. Thus, use of the feature has crept into our code base (search) since we've stopped linting against PHP 5.

There seem to be two ways that it has been written:

function withASpace() : ClassName {
}

function withoutASpace(): ClassName {
}

Looking at just MediaWiki core and extensions in Gerrit, I find:

With space Without space
core 81 0
ArticlePlaceholder 0 1
CirrusSearch 0 42
JADE 4 0
OrphanedTalkPages 1 0
Translate 1 0
Wikibase 6 23
WikibaseLexeme 22 65
WikibaseMediaInfo 0 1
WikibaseQualityConstraints 0 14
WikibaseSchema 0 7

Core seems to have standardized on the with-space version, CirrusSearch on without-space, and Wikibase mostly on without-space but with a significant number of with-space instances too. Other extensions have very few uses, mostly with-space.

We should probably standardize on one, as we do with other constructs, and have phpcs enforce it. IMO the version with a space fits more naturally with the rest of the guidance at #Spaces. Does anyone disagree? Anomie (talk) 16:00, 13 February 2019 (UTC)

Personally I like the spaced version better. As you said, it seems more consistent with our spacey style. Of note, a quick search on the internet showed that the non-spaced version is preferred (e.g. by the PHP manual itself), but I guess that doesn't really mean much. --Daimona Eaytoy (talk) 16:08, 13 February 2019 (UTC)
TypeScript seems also to not use a space, at least in the doc. But having a space seems more coherent with MediaWiki coding style. Tpt (talk) 16:13, 13 February 2019 (UTC)
PSR-12, which is currently under review, standardizes on without-space. --DBarratt (WMF) (talk) 16:19, 13 February 2019 (UTC)
I agree that the having a space is more consistent with our established style. Our style is already inconsistent with multiple external sources (php.net, TypeScript, PSR-12), so I don't feel there's anything to lose by preferring our style over theirs in this instance as well. If we were ever to choose to adopt a different style, I would prefer to do it wholesale instead of piecemeal. BPirkle (WMF) (talk) 16:39, 13 February 2019 (UTC)
I don’t think that with-space is wholly inconsistent with our style – we don’t write getFoo ( $b ) or $a [ 0 ], for example. I don’t think the : needs to be emphasized like a binary operator. --Lucas Werkmeister (WMDE) (talk) 16:43, 13 February 2019 (UTC)
I vote for without-space:
  • I find the pattern …( … ) : … to easy to confuse with a ? … : … operator. The ) : character sequence makes me constantly trip, like I need to check if it is a misplaced ternary operator.
  • I would argue an …(): int is a little bit like an (int)… cast operator where we also disallow extra spaces.
  • I believe it's a good idea to follow PSR.
--Thiemo Kreuz (WMDE) 17:14, 13 February 2019 (UTC)
I'm with the no space gang, we don't always use spaces anyway, like in (casts) and $array[$indexes]. The colon here is not a separate operator that we customarily surround with spaces, it's just a part of overall syntax. Making it more prominent will not make anything more readable, I personally rather stumble on it. Max Semenik (talk) 20:24, 13 February 2019 (UTC)
Same here I prefer with no space DCausse (WMF) (talk) 09:29, 14 February 2019 (UTC)
I find the no-space version easier to read and I don't think it is inconsistent with our current style. DMaza (WMF) (talk)

I'm not a fan of spaces in general but find (): hard to read (too many special characters in one place) and liked the version with space better. In the end it is a pretty insignificant bikeshed though. IMO the more interesting question is, should we mandate declaring return types (where possible)? Or rather, is there a reason not to? Also (although this is not really a coding convention question) where do we want to be in the long term wrt. strict typing? --Tgr (WMF) (talk) 23:36, 13 February 2019 (UTC)

I see there's already T178136 about encouraging type hints. --Tgr (WMF) (talk) 01:35, 14 February 2019 (UTC)
I think requiring them is a great idea (for new code). I thought that'd already been decided actually (seems not), but the phpcs rules hadn't caught up. Sam Wilson 01:48, 14 February 2019 (UTC)

For me, without space seems more aesthetically pleasing - the space version just has : dangling around not attached to anything, and that's not a natural place for : in C-like language. The sight trips over it each time - "something wrong is going on here - bare : is weird!". Also, since new PSR seems to be no-space, I think that's what we should adopt. This is probably more high-weight than aesthetics - if PSR suggested space version, I'd probably begrudgingly vote for it despite it looking weird to me. Smalyshev (WMF) (talk) 00:56, 14 February 2019 (UTC)

On the other hand, PSR-12 also calls for a lot of other things we don't do, starting with indenting with spaces rather than tabs. Anomie (talk) 14:20, 14 February 2019 (UTC)

I am in the without-space gang here. :) SSastry (WMF) (talk) 19:14, 20 February 2019 (UTC)

On way or other but it should be decided. I've added the preference for having return type hints to the coding standards (that seemed long overdue) so it would be nice to have a clear recommendation on that. Filed T220719 about having a sniff for it. --Tgr (WMF) (talk) 17:06, 11 April 2019 (UTC)

At WMDE we have been using PHP7 since 2016 and have always used the "without space" variant. Our FundraisingFrontent and used repositories is very likely the biggest codebase in WMF-verse using PHP7. See https://github.com/wmde/FundraisingFrontend --Jeroen De Dauw (talk) 15:03, 4 May 2019 (UTC)

PHP 7 nullable type declarations

Perhaps we should discuss all of this with 7.1 in mind and nullable types as some of us would change their mind? Here are all the possibilities I can think of (I haven't included the () : ? Type { possibility which seems odd):

function fooA( Type $bar ) : ?ClassName {
}

function fooB( Type $bar ): ?ClassName {
}

Personally I still prefer the fooB option but no strong opinions on this. DCausse (WMF) (talk) 17:41, 18 February 2019 (UTC)

I think your fooA looks more like our existing style. I'd attach the ? to the type name, as shown, as putting a space seems wrong to me in that case. Anomie (talk) 14:42, 19 February 2019 (UTC)

TypeScript

TypeScript uses the same syntax for return types, and the two TypeScript-using repositories I could find in the wikimedia GitHub organization (wikibase-termbox and resource-modules) both seem to use the no-space version for TypeScript return types. Since our JavaScript style is fairly close to our PHP style, I think we should synchronize the TypeScript style here as well (i. e. use no-space in PHP too). --Lucas Werkmeister (WMDE) (talk) 11:11, 20 June 2019 (UTC)

Since our JavaScript style is fairly close to our PHP style There are several differences in use of spaces, for example in JS we prefer foo[ 0 ] while in PHP we use $foo[0]. Anomie (talk) 13:24, 8 July 2019 (UTC)

Static methods and properties

The advice on static methods seems to be fairly outdated (it was added in 2012), it focusses a lot of keystroke count. Methods should be static if they have nothing to do with a single instance, and that should be the only consideration.

It states "However, they make sub-classing and reuse more difficult for other developers" - I don't think that is true anymore with late static binding. LSB is mentioned as an afterthought that "the usefulness of LSB is debated among MediaWiki developers ... and should be avoided for the time being", although there is no evidence for that.

Regardless of LSB, most static methods are never overridden, so this doesn't seem like a particularly good reason to avoid static methods altogether. Using static has lots of other benefits, such a signalling to the developer the intended purpose of the method, as well as avoiding a performance hit by running an unnecessarily complex constructor. Proposed change below. ESanders (WMF) (talk) 14:26, 8 March 2019 (UTC)

Static methods and properties can be used in PHP, but care should be taken when inheriting to distinguish between the self and static keywords. self will always refer to the class in which it was defined, whereas static will refer to the particular sub-class invoking it. See the PHP documentation on Late Static Bindings for more details.

The real reason static methods should not be used is that they create tight coupling which goes against the SOLID principles (classes should only access classes from a different component through specific narrow interfaces). Very simple utility classes and classes which are internal to the same component are exceptions but generally it's better to avoid static methods. --Tgr (WMF) (talk) 22:32, 11 April 2019 (UTC)

IIRC static are pretty hard to mock, so this should be considered too. PHPUnit removed support for mocking statics a while ago, and in general from what I remember if static method needs to be mocked for some reason, it's a PITA. So this needs to be taken into account when writing service classes. Smalyshev (WMF) (talk) 00:57, 12 April 2019 (UTC)

Another reason to avoid static methods is that they often depend on global state. This makes it hard to isolate any code that uses these methods for testing, and it makes it next to impossible to write cross-wiki code. Static methods that are pure functions are fine in my book, if we are absolutely sure they will always stay pure. E.g. a method for finding the smallest common denominator of two numerbs can be static, but the methods splitting a string at word boundaries cannot, because the concept of word boundaries is language specific (at least potentially). -- DKinzler (WMF) (talk) 09:29, 12 April 2019 (UTC)

Trailing commas in multi-line arrays

I was under the impression that trailing commas in multi-line arrays are encouraged, though not required, because they result in cleaner diffs (adding or removing elements at the end of the list doesn’t have to touch the preceding line to add or remove a comma there). I can’t find any reference to it on this page, though. Should it be added, or does anyone disagree with this suggestion? --Lucas Werkmeister (WMDE) (talk) 18:27, 25 March 2019 (UTC)

We encourage them in the config repo, certainly, as that repo mostly has changes that should only change one line, and it is subject to a lot of changes that need rapid review. Not sure about in other code; it makes keeping PHP and JS code to do the same in sync very slightly more tedious, but is probably a good idea. Also, is there a code sniff we could add to encourage this? Jdforrester (WMF) (talk) 18:33, 25 March 2019 (UTC)
Yes please. NoCommaAfterLast / NoComma checks for it but PHP_CodeSniffer#582 suggests it's not really usable without a lot of additional baggage, so we'd probably have to write our own or use some third-party sniff (that issue and #629 list some). --Tgr (WMF) (talk) 22:43, 11 April 2019 (UTC)
Filed as T222042. --Tgr (WMF) (talk) 18:23, 28 April 2019 (UTC)

Add a rule about not using PHP serialization

Per and RFC approved on May 7 2017, PHP serialization should be avoided wherever possible, since it is brittle and insecure. The coding convention should state that. I propose to add the following rule:

PHP serialization
PHP's build in serialization mechanism (per the serialize() and unserialize() functions) should be avoided if at all possible. The reason is twofold: 1) data serialized with this mechanism cannot reliably be unserialized with a later version of the same class. And 2) crafted serialized data can be used to execute malicious code, posing a serious security risk.
The first issue can be mitigated by converting any data to arrays or plain anonymous objects before serialization. The second issue can perhaps be mitigated using the whitelisting feature PHP 7 introduces for unserialization. But in general, it's best to avoid serialize() and unserialize() entirely, and use JSON based serialization instead.

Not quite sure where this section should go, though. Perhaps we need a new heading for things to avoid. It should probably also mention things like eval(). -- DKinzler (WMF) (talk) 20:58, 11 April 2019 (UTC)

Better not to take up space mentioning workarounds we don't want the reader to use, this doc is long enough and all superfluous documentation has a mental toll. The outcome of the RfC was that the whitelist is not reliable mitigation due to PHP's history of low-level serialization security issues which cannot be prevented in userland code. And if you are already converting to/from arrays it's trivial to use JSON serialization instead, so we should just recommend that. --Tgr (WMF) (talk) 22:48, 11 April 2019 (UTC)
I think it's useful to mention mitigation strategies to use when the serialization mechanism cannot be controlled, because it's baked into some internal function or library. -- DKinzler (WMF) (talk) 08:41, 18 April 2019 (UTC)
Some points: 1) whitelisting does not fix all security problems. For now, any unserialization of third-party data (i.e. data we do not control) is insecure, period (and likely to remain so at least for the lifetime of PHP 7). Talk to me if you'd like the whole long and sad story, but basically it can be used only for internal things. 2) I *think* serialization may still be OK for data we fully control (if there are any), such as things we store in the database, sessions, etc. My reasoning is if somebody manages to write to the database, they probably can take over this instance anyway (e.g. by assigning themselves admin privileges, etc.) - am I wrong? and 3) I am not sure JSON serialization is as powerful as PHP one. Of course, the advice can be "if you need more powerful serialization, rethink you code so you don't" but we should make it explicit that JSON one is weaker than PHP one. Smalyshev (WMF) (talk) 21:01, 17 April 2019 (UTC)
Re (1) whitelisting is a mitigation strategy, and discouraged. It's better than nothing. (2) An intruder who gained access to the network can write to memcached. If we unserialize things from memcached, the intruder can now run code. That's a much more serious threat. (3) not sure what you mean by "powerful". I'll add a warning about JSON's issues with number precision, though. -- DKinzler (WMF) (talk) 08:41, 18 April 2019 (UTC)
Presumably "less powerful" is referring to the fact that PHP's serialization can handle many PHP classes, differentiates associative arrays from stdClass objects, and so on. Anomie (talk) 13:16, 18 April 2019 (UTC)

I'll be bold and update the page. Please revert, amend and discuss as needed. -- DKinzler (WMF) (talk) 08:41, 18 April 2019 (UTC)

Ternary operator example update

Ternary operator chapter uses this as an example:

$wiki = isset( $this->mParams['wiki'] ) ? $this->mParams['wiki'] : false;

However, in modern PHP this should be written as:

$wiki = $this->mParams['wiki'] ?? false;

We should probably replace the example and add a note to that effect. Smalyshev (WMF) (talk) 00:53, 12 April 2019 (UTC)

+1 KHarlan (WMF) (talk) 15:01, 12 April 2019 (UTC)

Replaced. --Tgr (WMF) (talk) 18:02, 28 April 2019 (UTC)

Operator placement on line break

The manual used to say "The operator separating the two lines should be placed at the end of the preceding line." (now relaxed to not prescribe either way, per discussion). I suggest requiring operators to be on the left. The left edge is straight, the right is ragged, making it hard to scan. This is especially problematic when there are multiple possible operators, like in boolean expresssions. Compare

return $a > 0 &&
    $a < 100 ||
    $a = 0 &&
    $b > 0;

with

return $a > 0
    && $a < 100
    || $a = 0
    && $b > 0;

for readability. --Tgr (WMF) (talk) 00:16, 7 May 2019 (UTC)

Agreed with having it on the left as well. It makes it much easier to figure out which operators are being used. Legoktm (talk) 19:33, 20 June 2019 (UTC)

Importing global classes

It would be helpful if we had a policy for importing global classes. I've created task T223646 to track this. --DBarratt (WMF) (talk) 21:50, 20 May 2019 (UTC)

PSR-2 rule: There MUST NOT be a space after the opening parenthesis

I was wondering why don't we respect this PSR-2 rule here? It's the first important project I've seen which uses this custom syntax in PHP (but in JS it's more common). JackPotte (talk) 09:10, 6 October 2019 (UTC)

Probably because our rules were established long before PSR-2. And unlike some other PSRs that actually do impact interoperability, that one is just style. Anomie (talk) 13:52, 7 October 2019 (UTC)

Where to put declare()?

In gerrit:546787, I've noticed that now we're allowed to use declare(strict_types=1). So, to put it shortly, where do we want to put it? The obvious alternatives are:

  • After the <?php tag (i.e. <?php declare(...))
  • Before the namespace declaration, on its own line (see e.g. the gerrit patch linked above)

Whatever we agree on, in the meanwhile the directive can be placed wherever the patch author wants to place it; then, a PHPCS rule will enforce the agreed position and it should also autofix the issue. --Daimona Eaytoy (talk) 12:52, 31 October 2019 (UTC)

I'd say it should go as shown in that patch: after any file-level comment, and with blank lines before and after. I'd probably give a -1 in code review if someone tried <?php declare(...). Spacing should most likely also be as shown in that patch, although it might be debated whether "declare" counts as a control structure for CC/PHP#Spaces.

BTW, note that discussion at phab:T178136 seems cautious about using strict_types as it can result in having to add explicit casts to string, int, or bool. Anomie (talk) 13:28, 31 October 2019 (UTC)

Documenting generator methods

Is there an established way to document generator functions? It looks like there are two approaches:

  1. @return Generator|Foo[] which looks to be used in a few places in MediaWiki code, is mentioned here, and works in most IDEs; and
  2. @return Generator<Foo> which maybe is only done in Phan code.

Can we add something to the docs here?
—Sam Wilson 01:46, 20 February 2020 (UTC)

Max line length 120?

I'd like to suggest we change phpcs's soft line length limit from 100 to 120. This would be in line with PSR12. See gerrit:579119 for more. —Sam Wilson 02:44, 12 March 2020 (UTC)

Nullable types in parameter documentation

Is there an official preference for @param ?Foo vs @param Foo|null in PHP docs? I know I've had reviewers change the former to the latter in patches, and the former is used only 17 times in core while the latter is used 2018 times -- but maybe this is just due to the fact that PHP 7.2 hasn't been around as long? A formal statement of preference would be good. cscott (talk) 19:08, 14 July 2020 (UTC)

The (draft) phpdoc standard's ABNF does not have a ? character. See also https://wiki.php.net/rfc/union_types#nullable_types (the RfC did not pass, but that specific section did, although the eventually accepted https://wiki.php.net/rfc/union_types_v2 is less strict). --Tgr (WMF) (talk) 12:49, 29 July 2020 (UTC)

Exception conventions

The current text says Exceptions that indicate programming errors should be one of the exceptions that ship with PHP or a more specific subclass, while exceptions that indicate errors that are relevant to the end user should be an ErrorPageError or one of its subclasses. I recommend expanding this a bit to clarify the following:

  • Exceptions generally belong into one of three types:
    • Exceptions that indicate programming errors - these should be a subclass or RuntimeException or LogicException (this is already said in the current text).
    • Exceptions which indicate user errors should be subclasses of ErrorPageError (which is confusingly named now that PHP has an Error hierarchy, but that's another discussion) - that's a bit more concrete than relevant to the end user which is true for all kinds of internal errors.
    • Exceptions which do not indicate programming errors nor user errors (for example, exceptions that indicate the failure of some infrastructure MediaWiki depends on) should have their own hierarchy under Exception.
  • Exceptions that belong in the second or third group (in Java these would be called checked exceptions) should be tracked via @throws tags (ie. methods which have a throw clause or call a method which has an @throws tag should themselves have an @throws tag, unless they handle that exception).
  • Exceptions in the first group might have an @throws tag with text which explains when they are thrown, if that's something non-obvious; otherwise an @throws tag is generally not useful.

This aligns with how tools like PHPStorm treat exceptions (raising warnings about checked exceptions without an @throws tag), with common sense (runtime exceptions are thrown by all kinds of native methods, making it infeasible to track them) and programming best practice (callers should not be concerned about unchecked exceptions, those should only happen due to programming errors and the best way to handle them is to log a stack trace and die, so the code can be fixed promptly; other error conditions should be handled by the caller or it should pass that responsibility explicitly to its caller).

Thoughts? --Tgr (WMF) (talk) 12:37, 29 July 2020 (UTC)

@Tgr (WMF): That sounds reasonable, and accords with how we actually write code in practice, yes. Jdforrester (WMF) (talk) 12:44, 29 July 2020 (UTC)

Switch to StructredDiscussions?

The format of this page makes it very hard to see which discussions are still open, and how old they are, which in turn makes it hard to ever close them off. What do you think about replacing it with SD? --Tgr (WMF) (talk) 12:56, 29 July 2020 (UTC)

WFM, though we now have three active conversations in the past day on this page… ;-) Jdforrester (WMF) (talk) 14:01, 29 July 2020 (UTC)
imo structured discussions are harder to use. We can archive resolved discussions DannyS712 (talk) 19:57, 29 July 2020 (UTC)
+1 --MGChecker (talk) 13:37, 30 July 2020 (UTC)
I think we should aim at converting those SD pages back to normal wiki pages, instead of creating more. Ammarpad (talk) 10:10, 31 July 2020 (UTC)

Where should @coversDefaultClass test annotation be place?

There is another test annotation that is widely used in some of our projects: @coversDefaultClass

Per CS, we have: https://codesearch.wmcloud.org/deployed/?q=%5C%40coversDefaultClass&i=nope&files=&repos=, it's used in 185 files including test suites in MediaWiki core.

Based on our consistency pattern written down in Test annotations, where should it be place? I'm proposing it should be placed below @covers if they're written together? Cc @DannyS712: any thoughts?