Page title size limitations
Page titles are currently limited to 255 bytes. The choice of using 255 bytes for title length wasn't an arbitrary choice. 255 bytes – not characters – was picked because 255 bytes is the maximum string length that can be represented using 1 byte. ie: 1 byte (8 bits -> 2^8=256 values not including 0) can represent the range 0-255 so a varchar uses 1 byte to declare the length of the varchar's contents. If you use any length larger than 255 the varchar will always require 2 bytes to store the length of the column.
Changing the maximum length of a title will require a huge number of changes and is not easy to make configurable so we should do this only one time changing the maximum length of title to the maximum we can feasibly make it.
- It's hardcoded in
class MediaWikiTitleCodec
- User::getCanonicalName will need some extra code to restrict usernames to 255 bytes as it currently depends on Title to do that. (And increasing the user name length is far more problematic than the title length)
- docs/title.txt and TitleTest.php will need an update.
- The canonical page.page_title needs an ALTER COLUMN.
- A number of other core tables will need column alters; archive.ar_title, pagelinks.pl_title, templatelinks.tl_title, image.il_title, categorylinks.cl_links, category.cat_title, langlinks.ll_title, iwlinks.il_title, recentchanges.rc_title, and so on.
- All extensions that store titles in the database will need to do similar ALTER COLUMN updates or else they will have unexpected errors. It might be worth doing this first. Since a larger varchar length in the database won't cause any bugs while the software still uses 255 bytes.
The next (and tbh final) title max length shouldn't be something arbitrary, we should pick the maximum we can theoretically store (max we can store with 2 bytes/a 16bit uint and the max we can possibly store are actually the same thing, at least on MySQL in regards to VARCHAR).
The next step up from 1 byte representing 0-255 is 2 bytes representing a length of 0-65535 bytes. However we actually cannot use VARCHAR(65535) for the title. There are a number of other limits we hit which cap varchar column lengths below what can be represented with a 16bit uint length.
- MySQL Has a maximum row length of 65,535 bytes[1]; This includes the storage of whatever all the columns on the table require in the row storage.
- PostgreSQL doesn't seem to have the same max row size issues as MySQL because its row max – depending on whether you ask the about page[2] or wiki FAQ[3] – is either 1.6TB or 400GB. And the column max size is 1GB.
- However PostgreSQL seems to say "indexes can not be created on columns longer than about 2,000 characters".[3] ((I don't know the precise details but it might make our limit 2000 bytes. We'll need some more input on someone who knows PostgreSQL.))
- The varchar WP page says Oracle's limit is 4000 bytes.
- Before MySQL 5.0.3 VARCHAR colums could only be declared a maximum of 255.[4] ((This means we'll have to drop support for 5.0.2 and change our "5.0.2 or later" MySQL requirements to "5.0.3 or later"))
- MyISAM's index prefix maximum is 1000 bytes[5][6] and InnoDB is 767[5] (unless you use a dynamic/compressed field and innodb_large_prefix[7]). This means that [[A{1000 bytes}A]] and [[A{1000 bytes}B]] cannot both exist as the index prefix is used to ensure uniqueness. This limit will be database dependent. And the only fix would be to add a new column containing a hash of the title text and drop the uniqueness constraint on page_title.
Deciding the title max we should use will probably need some more information than what has been gathered so far.
((Side topic: We use varchar(n) binary
for the title now. However anyone that feels like changing this to varchar(n) CHARACTER SET utf8
needs to be wary that MySQL triples the (n) so it can store "<n> utf8 chars" instead of "<n> bytes that happen to be utf8" so they may need to divide whatever we use as the new max by 3 (unless MySQL does additional enforcing of char lengths in which case anyone doing that is stuck doing significant changes to how MW determines the maxlength of a title).[1]))
External link
[edit]- http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/73437 – The original wiktech-l email by Daniel Friesen this page was based on.
References
[edit]- ↑ 1.0 1.1 https://dev.mysql.com/doc/refman/5.6/en/column-count-limit.html
- ↑ https://www.postgresql.org/about/
- ↑ 3.0 3.1 https://wiki.postgresql.org/wiki/FAQ#What_is_the_maximum_size_for_a_row.2C_a_table.2C_and_a_database.3F
- ↑ https://dev.mysql.com/doc/refman/5.0/en/char.html
- ↑ 5.0 5.1 https://dev.mysql.com/doc/refman/5.0/en/create-index.html
- ↑ https://dev.mysql.com/doc/refman/5.0/en/myisam-storage-engine.html
- ↑ https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html