Page MenuHomePhabricator

GlobalBlocking needs to be able to import a list of bad IP's from a text file or another wiki site
Closed, DeclinedPublic

Description

Author: carlb613

Description:
Currently, the only means provided to add spambot IP's to the globalblocking table is to enter each one individually from the special:globalblock page.

This is awkward for small wiki sites with few administrators.

There needs to be a means to import downloadable lists of known spambots from blacklist sites such as stopforumspam.com and fspamlist.com instead of having a local admin ban each problematic IP manually. There are hundreds of thousands of known "comment spammers" (a term which includes wiki, forum and blog spambots); even the list of IP's global blocked from the WMF sites looks to be a few thousand entries long.

Some ability to import a CSV (comma separated values, one long line) or plain text file (one IP address per line) would be a quicker means to block spambots.


Version: unspecified
Severity: enhancement

Details

Reference
bz44630

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:32 AM
bzimport added a project: GlobalBlocking.
bzimport set Reference to bz44630.
bzimport added a subscriber: Unknown Object (MLST).

I wonder if distributed (decentralized) lists make sense here, though. The spam blacklist, for example, relies on a centralized list.

carlb613 wrote:

The spam blacklist is a "centralised list" in the sense that it is used by multiple WMF projects.

It is not uncommon, though, for non-WMF projects to keep their own local spam blacklists. In some cases, sites are blacklisted on a non-WMF project for solely political reasons - like Wikia's banning all mention of http://absurdopedia.net (the Russian-language parody of Wikipedia) - so importing another site's blacklist carries some risk of misuse. The most common issue is that a site's operators will blacklist a valid but directly-competing site as "spam" (much as Wikitravel is triggering AbuseFilter to block every mention of Wikivoyage).

Spambot IP's are a more clear-cut issue (always disruptive) and certainly there are centralised lists which are used across multiple sites with different owners. DNS BL's as blacklists for spam e-mail are a prime example.

It is, however, preferable not to query servers like 'stopforumspam' repeatedly for addresses which are already known and listed as bad, as there are restrictive limits on the number of API queries per day to these sites.

There are means to query a remote MediaWiki installation to ask if an IP is globalblocked there; for instance a non-WMF server could retrieve meta.wikimedia.org/w/api.php?action=query&list=globalblocks&bgip=1.2.3.4 to see if 1.2.3.4 is already on the naughty list on WMF's wikis. I don't know of anyone using this as a centralised blacklist across independent sites, though, and would hesitate to advocate this in any form that generates huge numbers of unsolicited enquiries.

One possibility would be to use API calls to request the list of known-spambot IP's from one wiki host and import the entire list to the globalblocking table on the local server. The MediaWiki API for the Global Blocking extension does support a request for last ten IP's global blocked from (date), which could be used to synchronise blacklists across multiple, unrelated servers. Attempts at automated registration of "new users" could then be checked against the local copy of the remotely-originated list and the spambots blocked automatically.

Certainly using "local sysop enters IP's individually from a web interface" should not be the only means to get bad IPs into the globalblocking table. This may be adequate for WMF projects (as the table is "global" enough that most spammers are already known to one or more WMF wikis) but small, independent wikis need to be able to import lists of known-bad addresses from existing sources - either a text file or an API on some other site.

(In reply to comment #2)

The spam blacklist is a "centralised list" in the sense that it is used by
multiple WMF projects.

My point was that it's used by non-Wikimedia Foundation projects as well. The Wikimedia Foundation wikis subscribe to the list at Meta-Wiki in the same way any other wiki would (cf. https://noc.wikimedia.org/conf/CommonSettings.php.txt):


if ( $wmgUseSpamBlacklist ) {
include( $IP . '/extensions/SpamBlacklist/SpamBlacklist.php' );
$wgBlacklistSettings = array(

		'spam' => array(
			'files' => array(
				'http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1'
			),
		),

);

}

From my reading of this bug, it seems like you want a (G)UI for importing arbitrary lists of IP addresses (basically _batch importing_ inside the GlobalBlocking extension). Is that correct? Do you care what kind of interface is implemented (api.php v. index.php)?

Tangentially: it seems like what you really want is a subscription service, not unlike what the spam blacklist uses. I think the TorBlock extension may do something similar with Project Honeypot. I don't think one-time updates make sense here. It doesn't feel sustainable. While an import feature might be nice to have generally, I don't think it will resolve the spam issues you're seeing in a sane and sustainable way.

Checking IP addresses individually is, of course, insane. But that doesn't mean that you couldn't have a subscription that pulls in the entire list (or large potions of the list) in a few queries. It depends how many IP addresses we're talking about and how important it is that the list be completely up to date.

carlb613 wrote:

There was a PHP wrapper [[mw:Extension:Check Spambots]] which could install MysteryFCM's Spambot Search Tool to automatically query fspamlist.com, stopforumspam.com, sorbs.net, spamhaus.org, spamcop.net, projecthoneypot.org, botscout.com, dronebl.org, ahbl.org, undisposable.net and torproject.org but it had some significant limitations.

There was no explicit means to "whitelist" an address or remove just one false-positive IP from the list as there was no local list. The only way to resolve issues like
http://desciclopedia.org/wiki/Forum:Problemas_com_a_minha_conta or http://desciclopedia.org/wiki/Forum:Franklin_estranhamente_bloqueado#erro_de_spamcop.net.3F was to turn off whichever blacklist was generating false positives - which let more spam in.

A script checking external servers in realtime is problematic if one server is down (or just running incredibly slowly). It also triggers provider-imposed limits (like stopforumspam only allowing a few hundred API calls a day).

With no centralised list mirrored locally to flag addresses already known to be good or bad, every wiki on the server must make make its own separate (and often duplicative) enquiries to an outside master source.

These scripts also provided no means to pull in a list of IP's already blocked with [[mw:Extension:GlobalBlocking]] on Wikipedia or some other wiki, even though these were the most likely IP's to cause trouble on multiple wikis.

As such, I'm hesitant to expect a "subscription service" to eliminate the need to download lists of bad IP's to be stored for use on multiple local wikis.

Certainly there are a few means by which an import feature could be implemented.

One would be a PHP maintenance script that just takes a text file with a list of bad IP addresses (separated by commas or newlines) and imports them into the globalblocking table. That would handle downloadable .csv format lists from sites like fspamlist and stopforumspam. This should be the minimum level of functionality as there are hundreds of thousands of known bad IP's, too many to add manually.

Another marginal improvement would be to change the existing GUI at [[special:GlobalBlock]] to allow multiple-line input for the edit box so that a short list of IP's could be pasted in instead of a single IP.

A third possibility would be a maintenance script which makes an API call to some other wiki, gets the list of IP's already blocked there, then imports them in bulk to the local site's globalblocking table. That could run as a 'cron' job (maybe once a day or so) and would give similar results to your spam blacklist example, $wgBlacklistSettings = array('spam' => array('files' => array(
'http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1'),),);

I'd think that just an ability importing a list of bad IP's from a file on the local server with a small maintenance script would be the easiest to implement.

Once that exists, the question of where to get more lists of bad IP's could be addressed later.

This task is about implementing a way to import a list of IPs to mass block. This can be done through a script using the API. So I'm closing this. Feel free to open if I'm missing something.

Glaisher claimed this task.