If the value of a record of comment_text column is set manually by editing the database, is it possible to recalculate the comment_hash value afterwards for deduplication purposes of course?
Topic on Manual talk:Comment table
Appearance
Here's what I did. Obviously exercise caution when messing with the database directly; make a backup beforehand, et cetera. The following assumes you have a MySQL database:
- Modify the
comment_text
as desired. - In the statement below, replace <yourMWprefix> with your MediaWiki database table prefix.
- In the statement below, replace <yourCommentID> with the
comment_id
value of the record you altered in step #1. - Execute the statement below:
UPDATE <yourMWprefix>_comment SET comment_hash = (CASE WHEN (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) > 2147483647 THEN CAST(18446744069414584320 | (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) AS SIGNED INTEGER) ELSE CAST(CRC32(comment_text) ^ CRC32(IFNULL(comment_data, '')) AS SIGNED INTEGER) END) WHERE comment_id = <yourCommentID>
MediaWiki stores comment_hash
as a signed 64-bit integer, but to my great irritation, MySQL's CRC32()
function returns an unsigned integer of size: smallest-bit-set-that-will-fit-the-number (in other words UNSIGNED 32-BIT INT about half the time, UNSIGNED 64-BIT INT other times), thus the CASE..WHEN
segment with all the goofy bitwise math. I'm sure there's a more correct way to do the conversion (a pox upon anyone who suggests a simple CAST()
!), but I ran out of patience and this works fine. You can review the expected results before you make any changes with the following statement:
SELECT comment_id , comment_hash , CASE WHEN (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) > 2147483647 THEN CAST(18446744069414584320 | (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) AS SIGNED INTEGER) ELSE CAST(CRC32(comment_text) ^ CRC32(IFNULL(comment_data, '')) AS SIGNED INTEGER) END AS predicted_hash , comment_text , comment_data FROM <yourMWprefix>_comment;