Jump to content

Topic on Project:Support desk/Flow

Missing pages, but data appear to be intact

26
Jonahgreenthal (talkcontribs)

My wiki is missing a bunch of pages. I think it's specifically the pages that only had one revision. Possibly only if that revision was very old.

Here's an example page: https://www.qbwiki.com/wiki/St._Anne%27s . It says there's no text in this page, but the page, revision, and text tables all seem to have the right data.

Other observations:

The revision #0 of the page named "St. Anne's" does not exist.
This is usually caused by following an outdated history link to a page that has been deleted. Details can be found in the deletion log.

(but the deletion log doesn't show anything)

  • When I try to save an edit, I am told there is an edit conflict ("Someone else has changed this page since you started editing it.")
  • I can delete the page and then re-create it
  • I fiddled around with the PHP code to see where the error text was coming from. It's Article.php line 436, in the fetchRevisionRecord() method:
// $this->mRevision might already be fetched by getOldIDFromRequest()
if ( !$this->mRevision ) {
	if ( !$oldid ) {
		$this->mRevision = $this->mPage->getRevision();

		if ( !$this->mRevision ) {
			wfDebug( __METHOD__ . " failed to find page data for title " .
				$this->getTitle()->getPrefixedText() . "\n" );

			// Just for sanity, output for this case is done by showMissingArticle().
			$this->fetchResult = Status::newFatal( 'noarticletext' );
			$this->applyContentOverride( $this->makeFetchErrorContent() );
			return null;
		}
	…

But I don't know enough about how MediaWiki works (or PHP in general, really) to figure out what's going wrong.

This problem likely started after an upgrade that I struggled with. Unfortunately, I didn't notice this problem until long after the upgrade, and when I finished the upgrade I had thought everything turned out okay, so I don't remember exactly what went wrong.

The upshot is that there are a bunch of pages (this was just one example) whose contents exist in the database but can't be accessed through the website. I'm not sure even how to systematically identify such pages. I suspect some rows or values are just missing from some table(s), but I have no clue which or how to find out.

Thoughts?

Nikerabbit (talkcontribs)

Do you have backups from before the upgrade?

It is most likely that either the actor or comments migration has gone wrong. MediaWiki does a LEFT JOIN on those tables so missing entries in there will cause those revisions/pages to appear as missing.

If it is about comments, see https://phabricator.wikimedia.org/T249904.

If it is about actors, I had the following trick:

  1. identify the user names for those revisions
  2. Create proper users for them, e.g. `User::newSystemUser( '...', [ 'steal' => true ] );`
  3. Run database queries UPDATE revision SET rev_user = 0 where rev_user_name = '...'; (and similar for all affected tables, mostly logging, archive and recentchanges)
  4. Run php maintenance/cleanupUsersWithNoId.php


But you need backups to do this in case the rev_user and equivalent fields are already dropped. I guess it's possible to do it afterwards by updating the rev_actor and equivalent fields too, but I have not done that myself.

Jonahgreenthal (talkcontribs)

I do have backups from before the upgrade, but the upgrade was about a year ago so restoring from the backup isn't viable.

Thanks for pointing me at revision_comment_temp and revision_actor_temp. It looks like the problem is the latter—this query returns 164 rows:

SELECT * FROM revision WHERE rev_id NOT IN (SELECT revactor_rev FROM revision_actor_temp)

Do you agree with that reasoning? (Some of the corresponding pages do exist, but the revisions seem to be missing when I view the history through the web interface.)

The rev_user_text column contains the username, so that should address your step 1, right? Are you able to elaborate on step 2 (how do I do that? what's the steal thing?) and 3 (which tables are affected?)? Thanks so much!

Nikerabbit (talkcontribs)

I'd suggest running `php maintenance/migrateActors.php --force` to observe if there are errors. If there is, you should get list of usernames that match the rev_user_text of those rows. You could try running cleanupUsersWithNoId.php first or maybe even findMissingActors.php (if you have it) to see if is sufficient.

But if they don't work, my step 2 basically creates and user and actor for the name. The issue may be that there is no used account for the name, so actor cannot be created. Step 3 removes broken references to user ids which do not exist, so that cleanupUsersWithNoId can process it. The relevant tables and names should be printed out by the migrateActors script.

Jonahgreenthal (talkcontribs)

Thanks. migrateActors produced a bunch of messages like this:

User name "X" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

cleanupUsersWithNoId produced a bunch of output but didn't seem to actually do anything.

I don't know how to actually do your step 2. It looks like PHP code I'm supposed to run, but I don't know how to run custom code within the MediaWiki environment.

I don't have findMissingActors.

Nikerabbit (talkcontribs)

There is shell.php and eval.php under maintenance, both allow you to run that code interactively.

Jonahgreenthal (talkcontribs)

Thanks. I had to fight with shell.php pretty hard to get PsySH to work, but I think everything works now, including the solution of my original problem. I appreciate your help.

81.174.133.236 (talkcontribs)

I had a very similar problem and this thread was very helpful, thank you. I was able to resolve with this SQL:

INSERT INTO revision_actor_temp (revactor_rev, revactor_actor, revactor_timestamp, revactor_page) SELECT rev_id, 1 as actor, rev_timestamp, rev_page FROM revision WHERE rev_id NOT IN (SELECT revactor_rev FROM revision_actor_temp);

I was not concerned with correctly matching up the actors (and indeed think this data is lost), so replaced all of them with actor 1 (Administrator on my wiki). All of the page revisions are now accessible again. The missing actors were old now non-existent users that must have been lost in the 1.31 to 1.35 migration somewhere.

81.174.133.236 (talkcontribs)

I should add I needed to run php maintenance/update.php afterwards as well.

Krabina (talkcontribs)

thank you for this! the sql statement saved my wiki :-)

YOUR1 (talkcontribs)

The SQL statement above also fixed issues on our wiki's.

Kghbln (talkcontribs)

The solution posted by 81.174.133.236 only works, however, on file pages the connection of the files to their pages is not reinstated. This means the files are not shown on the wiki. Reuploading with the same name is not possible either.

Note that I went from 1.25 to 1.31, to 1.32, to 1.33, 1.34 and now to 1.35. Needless to say that migrateActorswithNoId.php was useless to run. It detected the actors but did not clean them.

Going the rebuildImages.php path does not help, and neither the cleanupUsersWithNoId.php/migrateActors.php path.

I am clueless as to how to mitigate this. Looks to me like I may have run into this which I will still have to investigate. Edit: No, I do not think this is the issue.

Ciencia Al Poder (talkcontribs)
Kghbln (talkcontribs)

Thanks for the pointer to your patched version of the script "cleanupUsersWithNoId.php." You are a hero! In the case of the current wiki I worked on, it was a lifesaver. Let me share my experience:

Going directly from MW 1.31 to MW 1.35 and applying the script on MW 1.35 after running "update.php" did not work. The wiki was in a disastrous condition after the upgrade and is no longer usable. This outcome is expected since the script had nothing to work on in the MW 1.35 database.

Going from MW 1.31 to MW 1.32, then to MW 1.33, using the patched script you linked to on 1.33 after running "update.php," did work. For some reason, the wiki was still unusable in version MW 1.33; however, upgrading to MW 1.34 with "update.php" and from there to MW 1.35 with "update.php" mitigated the issues emerging on the wiki. As a result, the wiki is working fine as it appears to me (still pending user feedback). This way, I could prevent massive issues, including this one, from occurring if I used core software. I am still determining why the wiki is broken in version 1.33, but it is probably another story.

Directly going from MW 1.31 to MW 1.35 is not recommended for wikis, with issues surfacing after the upgrade. Do it branch by branch and apply the patched script to MW 1.33. Going from MW 1.31 via MW 1.32, MW 1.33, and MW 1.34 to MW 1.35 using the script "cleanupUsersWithNoId.php" provided by core will also not work. The result will be a disaster. On the way, "update.php" will complain for MW 1.33 to MW 1.35 that you need to run "cleanupUsersWithNoId.php," however, it will ultimately do nothing.

185.104.138.31 (talkcontribs)

Hi, I would like to share my experience here since I've been struggling with updating my MW from 1.31 onwards. Originally, I wanted to go from MW 1.31 to 1.32 and then 1.33 some time ago. As 1.33 broke my wiki due to the known actor nightmare (cleanupUsersWithNoId.php does not help), I decided to postpone the update.

Now, I gave everything another shot and followed the instructions posted by @Kghbln.

1) MW 1.31 to MW 1.32 using update.php

2) MW 1.32 to MW 1.33 using the enhanced version of cleanupUsersWithNoId.php by @Ciencia Al Poder first and only then executing update.php

3) MW 1.33 to MW 1.34 using update.php

4) MW 1.34 to MW 1.35 using update.php

Afterwards, my wiki was working fine on MW 1.35. As 1.39 is already out, I decided to continue the update process. To be on the safe side, I performed the update for each version separately (1.36 -> 1.37 -> 1.38 -> 1.39). Everything works fine now under PHP 7.4, my next step is going to push the PHP version to 8.1 but that's not related to this issue.

Btw, the query posted by 81.174.133.236 was not necessary as the enhanced script took care of everything.

Another issue I had facing this update was the removal of Manual:$wgDBmysql5. My database was still using latin1 collations and everything was working fine with $wgDBmysql5 = true; until this setting was removed in MW 1.33. Therefore, I had to adjust my DB accordingly to avoid encoding issues with special characters (some hints are given on the talk page of the setting, also see Topic:Wqktznc6b8nyc29g).

I really would like to thank @Ciencia Al Poder and @Kghbln and everyone involved for sharing the script and their paths for the update!

Want (talkcontribs)

No! I recommend wait to next stable version MW, because upgrade scipt must accept upgrade from PHP 7.x to 8.2 and some distributions as Debian for example, 8.0 and 8.1 skiped. But MW 1.42 with support PHP 8.2 not released. I know that is complication, but another choice isn't for now.

Medwards98020 (talkcontribs)

Say, I appear to have this issue in my wiki. However, I'm unable to use the above solution as my wiki has already been upgraded a few times (currently on 1.41), and I don't think I have access to the older versions anymore.

migrateActors.php --force shows 0 errors currently.

I also note that if I look at the history of a problematic page, I can see, for example, three users with edits. All appear in the list of users, and have other unaffected pages.

Any suggestions on how I might dig myself out of this?

YOUR1 (talkcontribs)

Did you try to run the above SQL statement? That worked for us.

Medwards98020 (talkcontribs)

Yes, it errors out as there is no "revision_actor_temp"

Ciencia Al Poder (talkcontribs)

There's no fix once your wiki database has been upgraded to later versions. The information of those old users is gone. Forever. There are only actor ids now. You'll have to manually select those actor ids and replace them by whatever other actor id that may be the best replacement for them.

Medwards98020 (talkcontribs)

Hmm, well, unfortunately I'm not even sure how to find the old actor IDs. Currently despite being able to see old revisions, I can't find the page in the database, but I may be not searching correctly. I'm certainly no expert on how the database is structured.

Medwards98020 (talkcontribs)

OK, so I think I have a better idea how to fix things now.

I did some reading up on the mediawiki manual that describes the structure of the database. It now made sense that I didn't have a "revision_actor_temp", as that was only around v1.3.1 – v1.3.8, and I've already upgraded past that.

I have been able to identify a few problem pages just by coming across them browsing my wiki. I'm using phpMyAdmn tool at part of the cPanel set up for my instance. Selecting my database and searching for an exmaple page title let me find the page record, and in that, find the page ID number.

The, searching the revision field only (by selecting it and using the search tool), I was able to search for all revision entries for that page ID number. The very last/latest revision had an rev_actor ID of "0". When I had been looking at the list of revisions in the wiki, it shows them up to that point.

After backing up my database (in case I mess things up), I edited the rev_actor ID for that one revision from "0" to "1" (the ID of the main admin account). Now that revision appears with the name of the main admin account as having done the revision, and the page appears normally in the wiki.

Now I certainly can just have folks report when they run into one of these pages, and fix them as I come across them. However, my question is: Is there a legitimate use of the rev_actor ID being "0", or is that most assuredly an indication of a problem with the page? I could easily find/replace them, but I see about 2.5K entries that have a rev_actor ID of "0" currently. The wiki doesn't have anonymous edits, by the way - or at least I though so, I can historically see some edits just have IP addresses, as I look at entries that contain the null ID.

Medwards98020 (talkcontribs)

Further searching (and some exporting and deduping search results) has given me a list of about 1000 pages/categories etc to check.

I'm finding in many cases there are pages that work, but have one or more rev_actor IDs in their edit history. These don't show in the history unless I modify it from "0", then I can see them. I'm assuming that aside from skipping steps in the view of the history, they would only be an issue if one were to attempt to revert to them (or possibly around them), but I'm probably going to fix them in any case.

Medwards98020 (talkcontribs)

So, would doing the following work?

UPDATE `revision`

SET `rev_actor` = 1

WHERE `rev_actor` = 0

Ciencia Al Poder (talkcontribs)

Yes, running that UPDATE would fix those page, but will attribute those edits to the actor "1", which may or may not be the same as the user id "1". The actor table defines which user account (or anon/external) relates to any given actor id.

Medwards98020 (talkcontribs)

So, running that does appear to have fixed the pages. I did create a specific user for the lost actors, and used that instead.

Also, there were some images that had a img_actor of "0" that also needed to be updated, and then have the update.php run, to get the images to work again.

Reply to "Missing pages, but data appear to be intact"