I don’t know if there would be a market for it, but I imagine some customers might want to receive certain subsets of data in their feed/dump, for example:
- just the lead paragraphs before the first headings
- only the structured (unparsed) wikitext from infoboxes (c.f. DBpedia)
- delayed but cleaner feed, e.g. only revisions that have not been reverted or undone for x days
- only particular types of pages, e.g. English Wikipedia has articles, lists, disambiguation, and redirects all within the same namespace
- just the TOCs and hatnotes that describe structure rather than content
Or maybe nobody wants this, especially if our existing downloaders are already geared toward receiving big dumps and running their own parse-and-filter processes?