Jump to content

Manual talk:Comment table/Flow

About this board

Is it possible to recalculate comment_hash?

2
Costas Athan (talkcontribs)

If the value of a record of comment_text column is set manually by editing the database, is it possible to recalculate the comment_hash value afterwards for deduplication purposes of course?

Medicinestorm (talkcontribs)

Here's what I did. Obviously exercise caution when messing with the database directly; make a backup beforehand, et cetera. The following assumes you have a MySQL database:

  1. Modify the comment_text as desired.
  2. In the statement below, replace <yourMWprefix> with your MediaWiki database table prefix.
  3. In the statement below, replace <yourCommentID> with the comment_id value of the record you altered in step #1.
  4. Execute the statement below:
UPDATE 
	<yourMWprefix>_comment
SET
	comment_hash = 
		(CASE
			WHEN (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) > 2147483647 THEN CAST(18446744069414584320 | (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) AS SIGNED INTEGER)
			ELSE CAST(CRC32(comment_text) ^ CRC32(IFNULL(comment_data, '')) AS SIGNED INTEGER)
		END)
WHERE
	comment_id = <yourCommentID>

MediaWiki stores comment_hash as a signed 64-bit integer, but to my great irritation, MySQL's CRC32() function returns an unsigned integer of size: smallest-bit-set-that-will-fit-the-number (in other words UNSIGNED 32-BIT INT about half the time, UNSIGNED 64-BIT INT other times), thus the CASE..WHEN segment with all the goofy bitwise math. I'm sure there's a more correct way to do the conversion (a pox upon anyone who suggests a simple CAST()!), but I ran out of patience and this works fine. You can review the expected results before you make any changes with the following statement:

SELECT 
	comment_id
,	comment_hash
,	CASE
    	WHEN (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) > 2147483647 THEN CAST(18446744069414584320 | (CRC32(comment_text) ^ CRC32(IFNULL(comment_data, ''))) AS SIGNED INTEGER)
    	ELSE CAST(CRC32(comment_text) ^ CRC32(IFNULL(comment_data, '')) AS SIGNED INTEGER)
	END AS predicted_hash
,	comment_text
,	comment_data
FROM <yourMWprefix>_comment;
Reply to "Is it possible to recalculate comment_hash?"
There are no older topics