Knowledge

User:GreenC/WaybackMedic 2.5

Source 📝

27: 1133: 1058: 235:
Check all Wayback Machine URLs for response code errors (anything but 200s). If an error code, try for a better URL via the Wayback API – first using accessdate, then using the earliest date available. If none there, check WebCite API. Try Memento API which checks a few dozen other archives. Other
153: 1107:. Libraries were custom made including a string primitives library for regex, a wiki template parsing library, OAuth library (in awk), a MediaWiki API interface library, a soft404 detector. 1110:
Due to the nature of the task, running the bot includes a fair amount of supervisory overhead so it requires operator training, though the steps are documented in the source package.
443: 927: 899: 720: 289:
The wayback template is mangled in a certain way. Action: re-assemble. It won't delete multiple instances if they exist in the same ref (as in the Example).
853: 426: 383: 1197: 749: 1202: 1004: 987: 878: 765: 305: 97: 979: 670: 169: 44: 284: 821: 825: 923: 529: 1046: 410: 817: 1016: 706: 1090:
Additional operating-procedure level checks against network and other errors – bot is semi-supervised in known trouble areas.
353: 1020: 741: 725: 1182: 1012: 326: 230: 1140: 1187: 948: 614: 1078:
Real-time link checks, no link database. However, links are checked over a 24 hour period before final upload of diff.
480: 367: 801: 1192: 533: 357: 1222: 1081:
Supports many APIs including the Internet Archive, Memento, WebCite and "Timemap" APIs at individual services
593: 122: 1008: 1217: 702: 568: 459: 643: 1100: 1167:
is an old public repo. The most current version is not public. The bot is written in Nim and GNU awk.
793: 1152: 1093:
Multiple redundant checks of the APIs using multiple dates to ensure a page really is unavailable
1148: 182:
has content, attempt to find a working archive URL based on the archive date, otherwise add
1061: 8: 963: 907: 434: 1147:
for his contributions to Knowledge. This funding is for the ongoing development of
247: 1144: 1120: 418: 1038: 58: 1164: 1096:
Accepts API results but then verifies by looking at page headers and/or contents
1084:
Multiple HTTP header status code checks at the application (WaybackMedic) layer
1075:
Changes to URLs are checked against the remote site to ensure they are working
145:
path (web.archive.org/2016/ → web.archive.org/web/2016/). In some URLs adding
1211: 1123:
on a per-domain basis. You can request a domain name for the bot to process.
712: 771: 1136: 830:
Remove typical garbage characters found at the end of URLs: .,;:-"l(%XX)(
54: 17: 798:
Convert %20 to +, + to %20, etc.. in URLs that can be repaired this way
1087:
Additional time-out and retries built-in to the web transfer libraries.
337:
2. Ensure date format matches dmy or mdy if set (retain ymd if in use)
198:
has content, generate date value based on timestamp in the archive URL.
777: 45:
list of known web archive services in use on the English Knowledge
984:
Move broken URL to a new working URL and undo previous archives.
310:
The URL was incorrectly encoded. Fully decode URL and re-encode.
26: 1042: 1057: 364:
Convert Freezepage.com URL's from short-form to long-form
1104: 236:
techniques undocumented. If still none found, remove
1025:
Edits that might be cosmetic. Only with other edits.
139:
if missing (archive.org/web/ → web.archive.org/web/)
362:Convert WebCite URL's from short-form to long-form 1103:(compiles to C source) with support utilities in 746:Change "/items/" URLs that are using machine IDs 1209: 1139:, in accordance with the Wikimedia Foundation's 43:is a bot that adds and maintains links from the 883:Repair double URL-encoding eg. %3A -> %253A 50:Edits made after 2018-12-04 are by version 2.5 904:Repair missed URL-encoding of square brackets 131:if protocol missing from the archive.org URL. 598:archive.org URLs are doubled, tripled, etc.. 394:template when an archive exists for the link 133:2. Convert existing protocol http to https. 1143:, discloses that he has been paid by the 858:Open-up commented-out archives and add a 61:. The bot (software) is "WaybackMedic". 1056: 14: 1210: 719:3. Normalize as "archive.today" see 335:matches the snapshot date in the URL 717:2. Fix URL encoding of broken links 208:are empty, remove both and leave a 23: 1158: 1035:5. Convert protocol-relative URLs 1031:3. archive.is --> archive.today 715:URL's from short-form to long-form 684:{{webarchive}} 651:{{webarchive}} 621:{{webarchive}} 577:is 19700101 and/or out-of-bounds. 25: 24: 1234: 549:{{dead link}} 440:Merge completed February 5, 2017 391:{{dead link}} 211:{{dead link}} 185:{{dead link}} 149:breaks the link, test for those. 1131: 1099:The bot is primarily written in 678:{{cite web}} 509: 464:archive url -> |archive-url) 264: 541:{{wayback}} 1126: 932:Restore truncated Wayback URL 774:in URLs (ie. {{!}} and {{=}}) 770:Convert MediaWiki encoding to 552:is embedded in a CS template. 13: 1: 959:|title=Archived copy 544:is embedded in a CS template. 485:Move an archive.org URL from 1171: 64: 7: 1029:2. Del empty archive fields 627:is missing or empty value. 135:3. Add second-level domain 10: 1239: 1119:The bot takes requests at 1114: 1027:1. Del trailing # in URLs 1176: 194:is empty or missing but 178:is empty or missing but 1033:4. Fix double fragments 1064: 955:|title={title 575:|archivedate= 495:|archivedate= 333:|archivedate= 242:|archivedate= 206:|archivedate= 192:|archivedate= 180:|archivedate= 30: 1223:Active Knowledge bots 1060: 491:|archiveurl= 238:|archiveurl= 202:|archiveurl= 196:|archiveurl= 176:|archiveurl= 57:. The bot account is 29: 640:fixdoublewebarchive 53:The bot operator is 875:waytree_x2encoding 726:Archive.today Usage 611:fixemptywebarchive 119:fixmissingprotocol 106:in cite templates. 67: 66:WaybackMedic Fixes 1218:All Knowledge bots 1155:related to books. 1153:InternetArchiveBot 1065: 860:|deadurl= 850:fixcommentarchive 814:waytree_trailgarb 94:fixthespuriousone 65: 31: 1068:Technical details 1055: 1054: 681:is embedded in a 667:fixembwebarchive 648:Remove duplicate 573:Timestamp and/or 41:Wayback Medic 2.5 1230: 1193:WaybackMedic 1.0 1188:WaybackMedic 2.0 1183:WaybackMedic 2.1 1151:and a module of 1145:Internet Archive 1135: 1134: 960: 956: 861: 836: 833: 686: 685: 680: 679: 653: 652: 626: 623: 622: 576: 565:<various> 551: 550: 543: 542: 496: 492: 488: 438: 430: 422: 393: 392: 334: 323:fixdatemismatch 281:fixemptywayback 251: 243: 239: 214:if appropriate. 213: 212: 207: 203: 197: 193: 187: 186: 181: 177: 166:fixemptyarchive 105: 102:Remove spurious 68: 35: 1238: 1237: 1233: 1232: 1231: 1229: 1228: 1227: 1208: 1207: 1179: 1174: 1161: 1159:General sources 1132: 1129: 1117: 1034: 1032: 1030: 1028: 1026: 1019: 1015: 1011: 1007: 1003: 968:September 2018 958: 954: 926: 859: 834: 831: 824: 820: 718: 716: 705: 683: 682: 677: 676: 650: 649: 625:|date= 624: 620: 619: 574: 548: 547: 545: 540: 539: 532: 494: 490: 486: 439: 432: 424: 416: 390: 389: 363: 356: 336: 332: 245: 241: 237: 210: 209: 205: 201: 199: 195: 191: 189: 188:if appropriate. 184: 183: 179: 175: 140: 134: 132: 103: 63: 59:User:GreenC bot 38: 36: 33: 22: 21: 20: 12: 11: 5: 1236: 1226: 1225: 1220: 1206: 1205: 1200: 1195: 1190: 1185: 1178: 1175: 1173: 1170: 1169: 1168: 1160: 1157: 1128: 1125: 1116: 1113: 1112: 1111: 1108: 1097: 1094: 1091: 1088: 1085: 1082: 1079: 1076: 1072: 1071: 1069: 1053: 1052: 1049: 1036: 1023: 1001: 998: 994: 993: 992:November 2018 990: 985: 982: 977: 974: 970: 969: 966: 961: 951: 946: 943: 939: 938: 937:February 2018 935: 933: 930: 921: 918: 914: 913: 912:February 2018 910: 905: 902: 897: 894: 890: 889: 888:February 2018 886: 884: 881: 876: 873: 869: 868: 867:February 2018 865: 863: 862:"yes" or "no" 856: 851: 848: 844: 843: 842:February 2018 840: 838: 828: 815: 812: 808: 807: 804: 799: 796: 791: 788: 784: 783: 780: 775: 768: 763: 760: 756: 755: 752: 747: 744: 739: 736: 732: 731: 728: 723: 709: 700: 697: 693: 692: 689: 687: 673: 668: 665: 661: 660: 657: 655: 646: 641: 638: 634: 633: 630: 628: 617: 612: 609: 605: 604: 601: 599: 596: 591: 588: 584: 583: 580: 578: 571: 566: 563: 559: 558: 555: 553: 536: 527: 524: 520: 519: 517: 515: 513: 511: 508: 504: 503: 500: 498: 487:|url= 483: 478: 475: 471: 470: 467: 465: 462: 457: 454: 450: 449: 446: 444:Webarchive TfM 441: 413: 408: 405: 401: 400: 397: 395: 386: 381: 378: 374: 373: 370: 365: 360: 351: 350:fixwebcitlong 348: 344: 343: 340: 338: 329: 324: 321: 317: 316: 313: 311: 308: 303: 302:fixencodedurl 300: 296: 295: 292: 290: 287: 282: 279: 275: 274: 272: 270: 268: 266: 263: 259: 258: 255: 253: 233: 228: 225: 221: 220: 217: 215: 172: 167: 164: 160: 159: 156: 150: 125: 120: 117: 113: 112: 109: 107: 100: 95: 92: 88: 87: 84: 81: 78: 75: 74:Function name 72: 32: 15: 9: 6: 4: 3: 2: 1235: 1224: 1221: 1219: 1216: 1215: 1213: 1204: 1201: 1199: 1196: 1194: 1191: 1189: 1186: 1184: 1181: 1180: 1166: 1163: 1162: 1156: 1154: 1150: 1146: 1142: 1138: 1124: 1122: 1109: 1106: 1102: 1098: 1095: 1092: 1089: 1086: 1083: 1080: 1077: 1074: 1073: 1070: 1067: 1066: 1063: 1059: 1051:January 2019 1050: 1048: 1047:Archive.today 1044: 1040: 1037: 1024: 1022: 1018: 1014: 1010: 1006: 1002: 999: 996: 995: 991: 989: 986: 983: 981: 978: 975: 972: 971: 967: 965: 962: 952: 950: 947: 944: 941: 940: 936: 934: 931: 929: 925: 922: 919: 916: 915: 911: 909: 906: 903: 901: 898: 896:fixencodebug 895: 892: 891: 887: 885: 882: 880: 877: 874: 871: 870: 866: 864: 857: 855: 852: 849: 846: 845: 841: 839: 829: 827: 823: 819: 816: 813: 810: 809: 805: 803: 800: 797: 795: 792: 789: 786: 785: 782:January 2017 781: 779: 776: 773: 769: 767: 764: 761: 758: 757: 754:January 2017 753: 751: 748: 745: 743: 740: 737: 734: 733: 730:January 2017 729: 727: 724: 722: 714: 713:Archive.today 710: 708: 704: 701: 699:fixarchiveis 698: 695: 694: 691:January 2017 690: 688: 674: 672: 669: 666: 663: 662: 659:January 2017 658: 656: 647: 645: 642: 639: 636: 635: 632:January 2017 631: 629: 618: 616: 613: 610: 607: 606: 603:January 2017 602: 600: 597: 595: 592: 590:fixdoubleurl 589: 586: 585: 582:January 2017 581: 579: 572: 570: 567: 564: 561: 560: 557:January 2017 556: 554: 537: 535: 531: 528: 525: 522: 521: 518: 516: 514: 512: 506: 505: 502:January 2017 501: 499: 484: 482: 479: 477:fixswitchurl 476: 473: 472: 469:January 2017 468: 466: 463: 461: 458: 455: 452: 451: 448:January 2017 447: 445: 442: 436: 428: 420: 414: 412: 409: 406: 403: 402: 399:January 2017 398: 396: 388:Remove stray 387: 385: 382: 379: 376: 375: 372:January 2017 371: 369: 368:WebCite Usage 366: 361: 359: 355: 352: 349: 346: 345: 341: 339: 330: 328: 325: 322: 319: 318: 314: 312: 309: 307: 304: 301: 298: 297: 293: 291: 288: 286: 283: 280: 277: 276: 273: 271: 269: 267: 261: 260: 256: 254: 249: 234: 232: 229: 227:fixbadstatus 226: 223: 222: 218: 216: 173: 171: 168: 165: 162: 161: 157: 155: 151: 148: 144: 138: 130: 126: 124: 121: 118: 115: 114: 110: 108: 101: 99: 96: 93: 90: 89: 85: 82: 79: 77:Example edit 76: 73: 70: 69: 62: 60: 56: 51: 48: 46: 42: 28: 19: 1198:Bot Approval 1149:WaybackMedic 1141:Terms of Use 1130: 1118: 790:decodespace 772:url encoding 497:if missing. 437:}} 433:{{ 429:}} 425:{{ 421:}} 417:{{ 342:August 2016 315:August 2016 294:August 2016 257:August 2016 250:}} 246:{{ 219:August 2016 158:August 2016 146: 142: 136: 128: 111:August 2016 104:|1= 80:Description 52: 49: 40: 39: 34:WaybackMedic 1127:Paid editor 976:urlchanger 711:1. Convert 654:instances. 380:fixstraydt 86:Date added 71:Fix number 55:User:GreenC 18:User:GreenC 1212:Categories 1203:Trial runs 1062:BotWikiAwk 806:June 2017 762:encodemag 526:fixembway 435:webarchive 331:1. Ensure 1172:Citations 1121:WP:URLREQ 1000:cosmetic 738:fixitems 248:dead link 37:by GreenC 1039:WP:PRURL 957:} -> 953:Convert 945:fixiats 920:fixiats 802:See also 510:Retired 493:and add 456:fixiats 265:Retired 244:and add 1115:Running 1043:T214855 1021:Example 1017:Example 1013:Example 1009:Example 1005:Example 980:Example 964:T203865 949:Example 928:Example 924:Example 908:T186417 900:Example 879:Example 854:Example 826:Example 822:Example 818:Example 794:Example 778:RFC3986 766:Example 742:Example 707:Example 703:Example 671:Example 644:Example 615:Example 594:Example 569:Example 534:Example 530:Example 481:Example 460:Example 431:--> 427:webcite 419:wayback 411:Example 407:fixwam 384:Example 358:Example 354:Example 327:Example 306:Example 285:Example 231:Example 170:Example 154:per RFC 141:4. Add 127:1. Add 123:Example 98:Example 1165:GitHub 1137:GreenC 988:BOTREQ 415:Merge 200:3. If 190:2. If 174:1. If 152:HTTPS 83:Notes 1177:Links 835:' 832:' 546:2. A 538:1. A 147:/web/ 143:/web/ 129:https 16:< 750:BRFA 721:note 423:and 240:and 204:and 1105:Awk 1101:Nim 997:32 973:31 942:30 917:29 893:28 872:27 847:26 811:25 787:24 759:23 735:22 696:21 664:20 637:19 608:18 587:17 562:16 523:15 507:14 489:to 474:13 453:12 404:11 377:10 137:web 1214:: 1045:, 1041:, 837:) 675:A 347:9 320:8 299:7 278:6 262:5 252:. 224:4 163:3 116:2 91:1 47:.

Index

User:GreenC

list of known web archive services in use on the English Knowledge
User:GreenC
User:GreenC bot
Example
Example
per RFC
Example
Example
dead link
Example
Example
Example
Example
Example
WebCite Usage
Example
Example
wayback
webcite
webarchive
Webarchive TfM
Example
Example
Example
Example
Example
Example
Example

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.