How can we get User:Citation bot whitelisted it does high volume and exceeds policy -- they have their own Citoid install but it is not as good as the Wikipedia install which gives better results.
Topic on Talk:Citoid
To reduce load we do no query urls for citations that are already "complete". Since historically we are used in the sciences, we do not consider a citation complete without a volume number. We have added a short list of popular websites (CNN.com and such, it if purely my own writing based upon a few wiki pages) that do not have volumes and such to flag more as "complete". If we switch to Citoid, then it would be nice to get some feedback on what URLs websites are the most common, so we can add more to that last.
Hm, there are no limits on the citation API end points AFAIK. Could you provide a sample response?
en.wikipedia.org Citoid blocks us, so we run our own on the tool server.
Are there more than one endpoint?
We tried https://en.wikipedia.org/api/rest_v1/#!/Citation/getCitation
No, that's the correct one... are you using the correct query pattern? Requests for restbase installs look like
https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/http%3A%2F%2Fwww.example.com
But if you're using a native citoid install they look like
http://localhost:1970/api?format=mediawiki&search=http%3A%2F%2Fwww.example.com
The former puts the params in the url, the latter uses query params.
We got blocked so we now use our own on the tool servers. It does seem to time out at times also.
So I've asked operations, and we don't block IPs. However we do have rate limits of 1000 per 10 seconds (100 requests per second) which applies to all the mediawiki and restbase APIs. If you are receiving a 429 response, you just need to make sure you are adding a timeout in between requests so as not to exceed the limit.
Is 429 the response code you are getting?
(Also, it does take really long sometimes, depending on how long the time out is in your request package it may exceed it, so you may want to set a longer time out on your end.)
I will investigate. If it is 429, then we can sleep a little while and try again. The bot can be being run by lots of people so, we might hit the limit. How long of a time-out are we talking about?
The docs say "1000/10s (100/s long term, with 1000 burst)" - so I think if you exceed 1000 and wait 10s it should reset, but this is just from reading the docs.
For timeout I tried to find an outside value for you; in tests we allow a request to take to take up to 40 seconds, but there have even been cases of something taking 75 seconds to return in the wild: https://phabricator.wikimedia.org/T165105#4666586
And the caching layer sets a timeout of 360 seconds, so you will not get any responses that take longer than that.