This project would be a really big win for the Wikimedia ecosystem. I would also recommend you consider remote dumps as well. I've been building a wiki that monitors other wikis called WikiApiary. I recently added the ability to backup any wikis monitored by WikiApiary. It works great, and I'm using the dumpgenerator.py code from wikiteam. The biggest problem with this, especially since it is remote via the API is that it does not allow incrementals. Adding support for incrementals would be a huge win for this and would allow remote backup of thousands of wikis without undue load.
Talk:Incremental dumps/Flow
I'm not sure I will have the time to do this during the summer. I will certainly keep this in mind and try to make the code modular, to make adding support for dumping external sites using the API later relatively simple.
I might look into this after the GSoC ends.
Any news or developments on the issue of incremental dumps?
I don’t know the world of archives and dumps, but I wonder if there no existing standard (either de facto standard either a true standard) for our dumps. Perhaps some of the following compagnies/institutions have interesting standards that could fit our needs: MySQL, communities dealing with SQL dialects, archive.org, digital archives (Europeana, American archives, etc.), archive softwares (Bacula, etc.), Kiwix/OpenZIM, Drupal, etc.
Also it is probably simpler creating a special dump/archive type, a quick overview of existing archive types could orient some choices if we want move in some years to some standard archive type. I am thinking about that because I heard the longevity of the archive type is a true problem (difficulty of read archives created some dozens of years ago); although you are here only dealing with dumps (short-life archives).
That's not a bad idea, but I had a look and I didn't find anything that would had what is needed.
And I think longevity is a problem if there is no specification and no open source implementation. In this case, both will be created.