User talk:Wywin

Hello Wywin!

This proposal goes part of the way, but it's not quite there yet. If you have a look at how the dumps of full content are produced, you'll see that we only request the text entry from the db if it's not in the most recent complete dump (used for prefetch), in effect re-using the previous contents. Have a look at worker.py and dumpTextPass.php for that.

But even with this approach and running several processes at once it takes several days to read, uncompress, check integrity, recompress and write out all those old revisions and request the comparatively few new revisions from the db.

For this reason, work on a new format that lets us leave most of the content untouched in any merge of an incremental with a previous full dump is key. I'd like to see the proposal restructured with this in mind and a couple of your thoughts on how you might go about this.

Thanks! -- ArielGlenn (talk) 04:43, 24 April 2013 (UTC)Reply