User:Mr.Z-man/cite

Ideas for a refurbishing of the current ref system:

The edit page

The current <ref></ref> system will continue to work, but will be deprecated in favor of a separate "reference manager"
- The reference manager will be separate from the editbox and list all the references in the article in their own textareas.
- Each reference can be assigned a name to use in the article and will have its own unique ID.
The current <ref name="Foo"/> system will be deprecated in favor of a simpler {{#ref:Foo}}.
- The new parser function will also have the option to add additional info, for example: {{#ref:Foo|page 50}}
The reference manager will work best with JavaScript, but will still be functional without it.
<references /> will continue to work as now.

Rendered page

The rendered output will be essentially the same, with a few minor changes.
Rather than lumping all refs that use the same text onto the same line, any reference that has additional info will be on a separate line, indented. For example:

^^a^b Doe, John. The Foo Book
^c page 55

^d pages 75-80

Any references included with the page but not used in the text will still be listed.

Database

One table will store the actual wikitext of the reference, and a unique ID number for the text.
Another will provide a mapping of rev_id -> ref_id
This is done in 2 separate tables to save space, as most edits won't change every reference on the page

Other requirements

enwiki citation templates are hard on the parser, so it needs good caching. This should be easier with this system than the current one
Need to make sure links tables are properly updated - parser should take care of this?
Reference changes need to be included in diffs somewhere
Need to be able to revert reference changes when reverting an article, either with rollback, undo, or manual reversion
Need to be able to retrieve and edit refs with the API.
Needs to be integrated with anti-abuse systems - spam checks, abuse filter
Reference text needs to be included with database dumps
FlaggedRevs - if text is tied to revids, this may not be an issue

Possibly

Allow importing of refs from other articles
Automatic conversion of the current syntax with an option while editing

Searching

Doing a plain text search would likely be too slow to implement. InnoDB doesn't support fulltext indexes but concurrent writes is probably more important than ability to search.
The options for searching are:
1. Instead of using a single textarea for the references stored in a blob, use small text fields stored in a varchar. Fulltext searching still couldn't be used, but searching for exact matches or prefixes on individual fields would be possible - e.g. author_last_name == "Doe" AND author_first_name LIKE "J%"
  - Downsides: puts a lot more limitations on input/output options (templates would have to be defined in system messages), less backward compatible, could make the edit page more cluttered with tons of input fields rather than a few textareas.
  - Benefits: Probably the easiest to use, not too hard to code.
2. Integrate with the current search engine, or create a new search engine with a MyISAM index table just for this.
  - Downsides: A lot more complex, extra tables/indicies would take up a lot of space for a non-critical feature, output might be harder to use.
  - Benefits: Would be fast and versatile, would work with any reference format.
3. Don't allow searching of citation text, just lookups by page or revid
  - Downsides: Not very useful.
  - Benefits: Easiest to code, performance would not be an issue.
4. Create a toolserver tool to search.
  - Downsides: Would be an external tool, so harder to integrate into normal editing, would be less useful if reference text needs to be put in external storage - text would need to be retrieved from dumps.
  - Benefits: Performance isn't as critical for the toolserver, so a slower query isn't as much of a problem, allows for more site-specific customization.