User:Tegnosis/Broken Links

Page about finding broken links used in the wiki and fixing them.

Finding broken links
Using the table from the RottenLinks extension:
 * 1) Update RottenLinks table:
 * 2) *  (Prod)
 * 3) *  (Dev)
 * Note: This can take days to complete.
 * 1) Copy the RottenLinks table:
 * Alternatively, just get the codes that aren't 200 (OK)
 * 1) Clean up the table in Regex101:
 * 2) * Search:
 * 3) * Replace:
 * or
 * 1) * Replace:
 * 2) * Alternatively:
 * Search:
 * Replace:  (no   if using Notepad++)
 * 1) Copy/Paste the new table into Excel
 * 2) To view a page by it's page id:
 * 3) Header is (grab source wikitext):
 * 4) Filter by header and remove Code 200 rows
 * 5) Add a Count (how many pages this bad link shows up) where   is the page ids separated by a comma:
 * 6) Add a hyperlink (Excel) where   is the page ids (ignore multiple for now) and   is the base URL
 * 1) Filter by header and remove Code 200 rows
 * 2) Add a Count (how many pages this bad link shows up) where   is the page ids separated by a comma:
 * 3) Add a hyperlink (Excel) where   is the page ids (ignore multiple for now) and   is the base URL
 * 1) Add a hyperlink (Excel) where   is the page ids (ignore multiple for now) and   is the base URL
 * 1) Add a hyperlink (Excel) where   is the page ids (ignore multiple for now) and   is the base URL

Tips

 * 1) Check if the Rotten Links script has finished (do more than once and check if it's incrementing):
 * 2) Find all the distinct status codes:
 * 3) Write the file (note, not working yet):
 * 4) Copy files out of a K8s container:
 * 1) Write the file (note, not working yet):
 * 2) Copy files out of a K8s container:
 * 1) Copy files out of a K8s container:

Fixing links
Some methods to fix the broken links

http vs. https
It was suspected that some of the 3XX Status Codes would be fixed by switching from http to https. The following Python code searches through a file of URLs where each line has a new URL with https (vs. http), captures it's Status Code, and writes it to a new file with URL, tab, Status Code. All codes with 200 meant that changing to https fixed the link.

Code examples
Example of using import requests:

Example of reading a file "URL_List.txt" (see notes above):

Example of writing to a file "URL_List(Status_Codes).txt" (see notes above):

Redirect URLs
We can probably capture the redirect URLs by using requests to see if they are pointing to a working URL. The following should get the redirect, we just need to update to cover errors, read from file, and write to different file:

Broken revisions
There are also some bad revisions (example) which shows "MediaWiki internal error.", these seemed to happen around the same date range (June-July of 2008). The Maintenance script DeleteOrphanedRevisions should remove these from any use on wiki without needing to find and manually delete each revision ID from the table. This was done and 516 revisions were deleted.