27:
1133:
1058:
235:
Check all
Wayback Machine URLs for response code errors (anything but 200s). If an error code, try for a better URL via the Wayback API – first using accessdate, then using the earliest date available. If none there, check WebCite API. Try Memento API which checks a few dozen other archives. Other
153:
1107:. Libraries were custom made including a string primitives library for regex, a wiki template parsing library, OAuth library (in awk), a MediaWiki API interface library, a soft404 detector.
1110:
Due to the nature of the task, running the bot includes a fair amount of supervisory overhead so it requires operator training, though the steps are documented in the source package.
443:
927:
899:
720:
289:
The wayback template is mangled in a certain way. Action: re-assemble. It won't delete multiple instances if they exist in the same ref (as in the
Example).
853:
426:
383:
1197:
749:
1202:
1004:
987:
878:
765:
305:
97:
979:
670:
169:
44:
284:
821:
825:
923:
529:
1046:
410:
817:
1016:
706:
1090:
Additional operating-procedure level checks against network and other errors – bot is semi-supervised in known trouble areas.
353:
1020:
741:
725:
1182:
1012:
326:
230:
1140:
1187:
948:
614:
1078:
Real-time link checks, no link database. However, links are checked over a 24 hour period before final upload of diff.
480:
367:
801:
1192:
533:
357:
1222:
1081:
Supports many APIs including the
Internet Archive, Memento, WebCite and "Timemap" APIs at individual services
593:
122:
1008:
1217:
702:
568:
459:
643:
1100:
1167:
is an old public repo. The most current version is not public. The bot is written in Nim and GNU awk.
793:
1152:
1093:
Multiple redundant checks of the APIs using multiple dates to ensure a page really is unavailable
1148:
182:
has content, attempt to find a working archive URL based on the archive date, otherwise add
1061:
8:
963:
907:
434:
1147:
for his contributions to
Knowledge. This funding is for the ongoing development of
247:
1144:
1120:
418:
1038:
58:
1164:
1096:
Accepts API results but then verifies by looking at page headers and/or contents
1084:
Multiple HTTP header status code checks at the application (WaybackMedic) layer
1075:
Changes to URLs are checked against the remote site to ensure they are working
145:
path (web.archive.org/2016/ → web.archive.org/web/2016/). In some URLs adding
1211:
1123:
on a per-domain basis. You can request a domain name for the bot to process.
712:
771:
1136:
830:
Remove typical garbage characters found at the end of URLs: .,;:-"l(%XX)(
54:
17:
798:
Convert %20 to +, + to %20, etc.. in URLs that can be repaired this way
1087:
Additional time-out and retries built-in to the web transfer libraries.
337:
2. Ensure date format matches dmy or mdy if set (retain ymd if in use)
198:
has content, generate date value based on timestamp in the archive URL.
777:
45:
list of known web archive services in use on the
English Knowledge
984:
Move broken URL to a new working URL and undo previous archives.
310:
The URL was incorrectly encoded. Fully decode URL and re-encode.
26:
1042:
1057:
364:
Convert
Freezepage.com URL's from short-form to long-form
1104:
236:
techniques undocumented. If still none found, remove
1025:
Edits that might be cosmetic. Only with other edits.
139:
if missing (archive.org/web/ → web.archive.org/web/)
362:Convert WebCite URL's from short-form to long-form
1103:(compiles to C source) with support utilities in
746:Change "/items/" URLs that are using machine IDs
1209:
1139:, in accordance with the Wikimedia Foundation's
43:is a bot that adds and maintains links from the
883:Repair double URL-encoding eg. %3A -> %253A
50:Edits made after 2018-12-04 are by version 2.5
904:Repair missed URL-encoding of square brackets
131:if protocol missing from the archive.org URL.
598:archive.org URLs are doubled, tripled, etc..
394:template when an archive exists for the link
133:2. Convert existing protocol http to https.
1143:, discloses that he has been paid by the
858:Open-up commented-out archives and add a
61:. The bot (software) is "WaybackMedic".
1056:
14:
1210:
719:3. Normalize as "archive.today" see
335:matches the snapshot date in the URL
717:2. Fix URL encoding of broken links
208:are empty, remove both and leave a
23:
1158:
1035:5. Convert protocol-relative URLs
1031:3. archive.is --> archive.today
715:URL's from short-form to long-form
684:{{webarchive}}
651:{{webarchive}}
621:{{webarchive}}
577:is 19700101 and/or out-of-bounds.
25:
24:
1234:
549:{{dead link}}
440:Merge completed February 5, 2017
391:{{dead link}}
211:{{dead link}}
185:{{dead link}}
149:breaks the link, test for those.
1131:
1099:The bot is primarily written in
678:{{cite web}}
509:
464:archive url -> |archive-url)
264:
541:{{wayback}}
1126:
932:Restore truncated Wayback URL
774:in URLs (ie. {{!}} and {{=}})
770:Convert MediaWiki encoding to
552:is embedded in a CS template.
13:
1:
959:|title=Archived copy
544:is embedded in a CS template.
485:Move an archive.org URL from
1171:
64:
7:
1029:2. Del empty archive fields
627:is missing or empty value.
135:3. Add second-level domain
10:
1239:
1119:The bot takes requests at
1114:
1027:1. Del trailing # in URLs
1176:
194:is empty or missing but
178:is empty or missing but
1033:4. Fix double fragments
1064:
955:|title={title
575:|archivedate=
495:|archivedate=
333:|archivedate=
242:|archivedate=
206:|archivedate=
192:|archivedate=
180:|archivedate=
30:
1223:Active Knowledge bots
1060:
491:|archiveurl=
238:|archiveurl=
202:|archiveurl=
196:|archiveurl=
176:|archiveurl=
57:. The bot account is
29:
640:fixdoublewebarchive
53:The bot operator is
875:waytree_x2encoding
726:Archive.today Usage
611:fixemptywebarchive
119:fixmissingprotocol
106:in cite templates.
67:
66:WaybackMedic Fixes
1218:All Knowledge bots
1155:related to books.
1153:InternetArchiveBot
1065:
860:|deadurl=
850:fixcommentarchive
814:waytree_trailgarb
94:fixthespuriousone
65:
31:
1068:Technical details
1055:
1054:
681:is embedded in a
667:fixembwebarchive
648:Remove duplicate
573:Timestamp and/or
41:Wayback Medic 2.5
1230:
1193:WaybackMedic 1.0
1188:WaybackMedic 2.0
1183:WaybackMedic 2.1
1151:and a module of
1145:Internet Archive
1135:
1134:
960:
956:
861:
836:
833:
686:
685:
680:
679:
653:
652:
626:
623:
622:
576:
565:<various>
551:
550:
543:
542:
496:
492:
488:
438:
430:
422:
393:
392:
334:
323:fixdatemismatch
281:fixemptywayback
251:
243:
239:
214:if appropriate.
213:
212:
207:
203:
197:
193:
187:
186:
181:
177:
166:fixemptyarchive
105:
102:Remove spurious
68:
35:
1238:
1237:
1233:
1232:
1231:
1229:
1228:
1227:
1208:
1207:
1179:
1174:
1161:
1159:General sources
1132:
1129:
1117:
1034:
1032:
1030:
1028:
1026:
1019:
1015:
1011:
1007:
1003:
968:September 2018
958:
954:
926:
859:
834:
831:
824:
820:
718:
716:
705:
683:
682:
677:
676:
650:
649:
625:|date=
624:
620:
619:
574:
548:
547:
545:
540:
539:
532:
494:
490:
486:
439:
432:
424:
416:
390:
389:
363:
356:
336:
332:
245:
241:
237:
210:
209:
205:
201:
199:
195:
191:
189:
188:if appropriate.
184:
183:
179:
175:
140:
134:
132:
103:
63:
59:User:GreenC bot
38:
36:
33:
22:
21:
20:
12:
11:
5:
1236:
1226:
1225:
1220:
1206:
1205:
1200:
1195:
1190:
1185:
1178:
1175:
1173:
1170:
1169:
1168:
1160:
1157:
1128:
1125:
1116:
1113:
1112:
1111:
1108:
1097:
1094:
1091:
1088:
1085:
1082:
1079:
1076:
1072:
1071:
1069:
1053:
1052:
1049:
1036:
1023:
1001:
998:
994:
993:
992:November 2018
990:
985:
982:
977:
974:
970:
969:
966:
961:
951:
946:
943:
939:
938:
937:February 2018
935:
933:
930:
921:
918:
914:
913:
912:February 2018
910:
905:
902:
897:
894:
890:
889:
888:February 2018
886:
884:
881:
876:
873:
869:
868:
867:February 2018
865:
863:
862:"yes" or "no"
856:
851:
848:
844:
843:
842:February 2018
840:
838:
828:
815:
812:
808:
807:
804:
799:
796:
791:
788:
784:
783:
780:
775:
768:
763:
760:
756:
755:
752:
747:
744:
739:
736:
732:
731:
728:
723:
709:
700:
697:
693:
692:
689:
687:
673:
668:
665:
661:
660:
657:
655:
646:
641:
638:
634:
633:
630:
628:
617:
612:
609:
605:
604:
601:
599:
596:
591:
588:
584:
583:
580:
578:
571:
566:
563:
559:
558:
555:
553:
536:
527:
524:
520:
519:
517:
515:
513:
511:
508:
504:
503:
500:
498:
487:|url=
483:
478:
475:
471:
470:
467:
465:
462:
457:
454:
450:
449:
446:
444:Webarchive TfM
441:
413:
408:
405:
401:
400:
397:
395:
386:
381:
378:
374:
373:
370:
365:
360:
351:
350:fixwebcitlong
348:
344:
343:
340:
338:
329:
324:
321:
317:
316:
313:
311:
308:
303:
302:fixencodedurl
300:
296:
295:
292:
290:
287:
282:
279:
275:
274:
272:
270:
268:
266:
263:
259:
258:
255:
253:
233:
228:
225:
221:
220:
217:
215:
172:
167:
164:
160:
159:
156:
150:
125:
120:
117:
113:
112:
109:
107:
100:
95:
92:
88:
87:
84:
81:
78:
75:
74:Function name
72:
32:
15:
9:
6:
4:
3:
2:
1235:
1224:
1221:
1219:
1216:
1215:
1213:
1204:
1201:
1199:
1196:
1194:
1191:
1189:
1186:
1184:
1181:
1180:
1166:
1163:
1162:
1156:
1154:
1150:
1146:
1142:
1138:
1124:
1122:
1109:
1106:
1102:
1098:
1095:
1092:
1089:
1086:
1083:
1080:
1077:
1074:
1073:
1070:
1067:
1066:
1063:
1059:
1051:January 2019
1050:
1048:
1047:Archive.today
1044:
1040:
1037:
1024:
1022:
1018:
1014:
1010:
1006:
1002:
999:
996:
995:
991:
989:
986:
983:
981:
978:
975:
972:
971:
967:
965:
962:
952:
950:
947:
944:
941:
940:
936:
934:
931:
929:
925:
922:
919:
916:
915:
911:
909:
906:
903:
901:
898:
896:fixencodebug
895:
892:
891:
887:
885:
882:
880:
877:
874:
871:
870:
866:
864:
857:
855:
852:
849:
846:
845:
841:
839:
829:
827:
823:
819:
816:
813:
810:
809:
805:
803:
800:
797:
795:
792:
789:
786:
785:
782:January 2017
781:
779:
776:
773:
769:
767:
764:
761:
758:
757:
754:January 2017
753:
751:
748:
745:
743:
740:
737:
734:
733:
730:January 2017
729:
727:
724:
722:
714:
713:Archive.today
710:
708:
704:
701:
699:fixarchiveis
698:
695:
694:
691:January 2017
690:
688:
674:
672:
669:
666:
663:
662:
659:January 2017
658:
656:
647:
645:
642:
639:
636:
635:
632:January 2017
631:
629:
618:
616:
613:
610:
607:
606:
603:January 2017
602:
600:
597:
595:
592:
590:fixdoubleurl
589:
586:
585:
582:January 2017
581:
579:
572:
570:
567:
564:
561:
560:
557:January 2017
556:
554:
537:
535:
531:
528:
525:
522:
521:
518:
516:
514:
512:
506:
505:
502:January 2017
501:
499:
484:
482:
479:
477:fixswitchurl
476:
473:
472:
469:January 2017
468:
466:
463:
461:
458:
455:
452:
451:
448:January 2017
447:
445:
442:
436:
428:
420:
414:
412:
409:
406:
403:
402:
399:January 2017
398:
396:
388:Remove stray
387:
385:
382:
379:
376:
375:
372:January 2017
371:
369:
368:WebCite Usage
366:
361:
359:
355:
352:
349:
346:
345:
341:
339:
330:
328:
325:
322:
319:
318:
314:
312:
309:
307:
304:
301:
298:
297:
293:
291:
288:
286:
283:
280:
277:
276:
273:
271:
269:
267:
261:
260:
256:
254:
249:
234:
232:
229:
227:fixbadstatus
226:
223:
222:
218:
216:
173:
171:
168:
165:
162:
161:
157:
155:
151:
148:
144:
138:
130:
126:
124:
121:
118:
115:
114:
110:
108:
101:
99:
96:
93:
90:
89:
85:
82:
79:
77:Example edit
76:
73:
70:
69:
62:
60:
56:
51:
48:
46:
42:
28:
19:
1198:Bot Approval
1149:WaybackMedic
1141:Terms of Use
1130:
1118:
790:decodespace
772:url encoding
497:if missing.
437:}}
433:{{
429:}}
425:{{
421:}}
417:{{
342:August 2016
315:August 2016
294:August 2016
257:August 2016
250:}}
246:{{
219:August 2016
158:August 2016
146:
142:
136:
128:
111:August 2016
104:|1=
80:Description
52:
49:
40:
39:
34:WaybackMedic
1127:Paid editor
976:urlchanger
711:1. Convert
654:instances.
380:fixstraydt
86:Date added
71:Fix number
55:User:GreenC
18:User:GreenC
1212:Categories
1203:Trial runs
1062:BotWikiAwk
806:June 2017
762:encodemag
526:fixembway
435:webarchive
331:1. Ensure
1172:Citations
1121:WP:URLREQ
1000:cosmetic
738:fixitems
248:dead link
37:by GreenC
1039:WP:PRURL
957:} ->
953:Convert
945:fixiats
920:fixiats
802:See also
510:Retired
493:and add
456:fixiats
265:Retired
244:and add
1115:Running
1043:T214855
1021:Example
1017:Example
1013:Example
1009:Example
1005:Example
980:Example
964:T203865
949:Example
928:Example
924:Example
908:T186417
900:Example
879:Example
854:Example
826:Example
822:Example
818:Example
794:Example
778:RFC3986
766:Example
742:Example
707:Example
703:Example
671:Example
644:Example
615:Example
594:Example
569:Example
534:Example
530:Example
481:Example
460:Example
431:-->
427:webcite
419:wayback
411:Example
407:fixwam
384:Example
358:Example
354:Example
327:Example
306:Example
285:Example
231:Example
170:Example
154:per RFC
141:4. Add
127:1. Add
123:Example
98:Example
1165:GitHub
1137:GreenC
988:BOTREQ
415:Merge
200:3. If
190:2. If
174:1. If
152:HTTPS
83:Notes
1177:Links
835:'
832:'
546:2. A
538:1. A
147:/web/
143:/web/
129:https
16:<
750:BRFA
721:note
423:and
240:and
204:and
1105:Awk
1101:Nim
997:32
973:31
942:30
917:29
893:28
872:27
847:26
811:25
787:24
759:23
735:22
696:21
664:20
637:19
608:18
587:17
562:16
523:15
507:14
489:to
474:13
453:12
404:11
377:10
137:web
1214::
1045:,
1041:,
837:)
675:A
347:9
320:8
299:7
278:6
262:5
252:.
224:4
163:3
116:2
91:1
47:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.