156:
In 2017, the AGWA and the PANDORA archive were amalgamated with the other web archive collections, to form the Trove web archive collection. After further development and the creation of the
Australia Web Archive, government websites archived via AGWA and now included in AWA can still be searched
235:
With many of the earlier websites from the 1990s now lost, mainly because of the frequent change of web platforms, the
Australian Web Archive is a significant initiative that will help to save current and future web pages, especially Australian content. Material will continue to be added to the
165:
A web archive is described by the NLA as a "collection of snapshots of websites captured while they are accessible on the web, and then preserved in a static copy". The collection archived in the AWA is "relevant to the cultural, social, political, research and commercial life and activities of
89:
The PANDORA infrastructure, which works well for a selective small scale archiving, does not adapt to large scale "bulk harvesting" of web content, so a new technical system had to be developed whereby a web archiving service which would integrate the delivery of archived websites within a live
112:
websites. The NLA began regular harvests of the websites in June 2011, after a significant obstacle had been overcome with an administrative agreement made in May 2010 allowing the NLA to collect, preserve and make accessible government websites without having to seek prior permission for each
231:
There is a "Limit to the gov.au web domain" option before searching, and government websites archived via AGWA can still be searched separately using the "Advanced Search" option. Other options in
Advanced Search are to limit by timespan of the snapshots, domain and file type.
121:
for storage and Open
Wayback for delivery of the service. There is a huge amount of publishing by the government, but many challenges to overcome trying to preserve content, such as its sudden disappearance. In March 2014, the AGWA was made publicly accessible.
148:. It only included Commonwealth Government websites collected through bulk harvests of nearly 1000 seed URLs. The scheduling of the harvests was not yet routinely established, but harvests were being conducted roughly three times per year.
567:
351:
55:
collections. Access is through a single interface in Trove, which is publicly available. The
Australian Web Archive was created in March 2019, and is one of the biggest
600:
197:
is envisaged in the future, as content grows. Usability by a wide range of users, and in particular the search functionality, were major focuses during development.
387:
420:
193:
built in-house. The developers also devised techniques to filter out unwanted "noise". The data remains on the
Library servers, although a move to the
208:’s page ranking algorithm (based frequency of clicks on a page), modified to lead to better, high-quality resources. Other technologies include a
531:
166:
Australia and
Australians". It collects web material via both scheduled archiving of selected websites and publications as well as some
575:
359:
200:
The archive is fully searchable, based on a combination of techniques used by the developers. Each team created a unique and complex
125:
The AGWA meets the preservation and retention requirements for websites as "retain as national archives" (RNA) material under the
251:
794:
468:
789:
90:
website interface delivering the archived websites seamlessly to the user, which is difficult to achieve technically.
144:
As of early 2015, the AGWA included content dating from 2005, which amounted to about 144 million files occupying 15
44:
568:"The Australian Government Web Archive: Collecting the government's online documentary heritage goes large scale"
784:
270:
to collect and preserve "selected Asia/Pacific websites related to specific events or socio-political groups".
48:
32:
59:
in the world. Its purpose is to provide a resource for historians and researchers, now and into the future.
660:
101:
websites are
Commonwealth records, and are therefore publications to be managed in accordance with the
799:
82:. Later, the earliest websites from the .au web domain, dating back to 1996, were obtained from the
539:
109:
388:"The Australian Web Archive is a momentous achievement – but things will get harder from here"
98:
736:
625:
324:
8:
263:
70:
In 2005, the NLA started archiving annual snapshots of the entire
Australian web domain (
246:
445:
279:
217:
267:
225:
201:
186:
182:
83:
209:
178:
40:
28:
469:"Preserving Australia's Web History:The beginning of the Australian Web Archive"
711:
778:
241:
194:
190:
138:
39:
platform, an online library database aggregator. It comprises the NLA's own
108:
The
Australian Government Web Archive (AGWA) consists of bulk archiving of
213:
177:
of data, with 9 billion records. It contains more functionality than the
86:. In 2019 this content was first made publicly accessible through Trove.
79:
56:
300:"Preserving and Accessing Networked DOcumentary Resources of Australia"
113:
website or document, as was the case before that. The service uses the
52:
421:"National Library launches 'enormous' archive of Australia's Internet"
118:
236:
Archive, and other online material collected in accordance with the
767:
686:
493:
174:
145:
114:
729:
173:
As of March 2019, when it began, AWA already contained around 600
67:
The PANDORA service started archiving websites in October 1996.
205:
167:
75:
299:
221:
130:
36:
141:) are not always captured, so must be managed separately.
134:
71:
266:
are not included in the AWA, but NLA partners with the
16:
Open online database of archived Australian websites
650:NOTE: AWA help page says 400 tb, 8 billion records
62:
776:
157:separately using the "Advanced Search" option.
31:of archived Australian websites, hosted by the
561:
559:
557:
554:
526:
524:
522:
520:
518:
516:
514:
170:harvesting relating to significant events.
712:"Australian Web Archive - Advanced Search"
601:"Archiving Australian Government websites"
679:
511:
466:
257:
565:
460:
381:
379:
377:
352:"The Australian Government Web Archive"
349:
777:
595:
593:
414:
412:
410:
408:
653:
385:
160:
418:
374:
252:digital collections selection policy
661:"Check Out Australia's Web Archive"
590:
438:
405:
13:
566:Koerbin, Paul (11 February 2015).
532:"About the Australian Web Archive"
486:
467:McKenzie, Amelia (12 March 2019).
350:Koerbin, Paul (11 February 2015).
14:
811:
759:
45:Australian Government Web Archive
494:"Archived websites (1996 – now)"
704:
644:
618:
151:
63:History of the three components
605:National Archives of Australia
419:Nott, George (11 March 2019).
343:
317:
292:
78:. ".au"), collected via large
1:
741:National Library of Australia
630:National Library of Australia
572:National Library of Australia
473:National Library of Australia
386:Bruns, Axel (14 March 2019).
356:National Library of Australia
329:National Library of Australia
285:
133:and document files ( such as
49:National Library of Australia
33:National Library of Australia
795:Australian digital libraries
117:web crawler for harvesting,
7:
448:. PANDORA. 18 February 2009
273:
204:, by adapting a version of
27:) is an publicly available
10:
816:
446:"History and Achievements"
790:Web archiving initiatives
238:National Library Act 1960
687:"Australian Web Archive"
110:Commonwealth Government
93:
21:Australian Web Archive
785:Archives in Australia
258:Asia/Pacific websites
99:Australian Government
737:"Archived websites"
626:"Archived websites"
325:"Archived websites"
264:Asia Pacific region
187:full-text searching
247:Copyright Act 1968
244:provisions of the
161:Description of AWA
632:. 7 December 2018
536:Trove Help Centre
280:National edeposit
218:Not Safe For Work
103:Archives Act 1983
807:
800:Online databases
771:
770:
768:Official website
753:
752:
750:
748:
733:
727:
726:
724:
722:
708:
702:
701:
699:
697:
683:
677:
676:
674:
672:
657:
651:
648:
642:
641:
639:
637:
622:
616:
615:
613:
611:
597:
588:
587:
585:
583:
574:. Archived from
563:
552:
551:
549:
547:
542:on 17 March 2020
538:. Archived from
528:
509:
508:
506:
504:
490:
484:
483:
481:
479:
464:
458:
457:
455:
453:
442:
436:
435:
433:
431:
416:
403:
402:
400:
398:
392:The Conversation
383:
372:
371:
369:
367:
362:on 30 April 2020
358:. Archived from
347:
341:
340:
338:
336:
321:
315:
314:
312:
310:
296:
268:Internet Archive
262:Websites in the
226:machine learning
220:classifier from
202:search algorithm
183:Internet Archive
181:, hosted by the
84:Internet Archive
815:
814:
810:
809:
808:
806:
805:
804:
775:
774:
766:
765:
762:
757:
756:
746:
744:
743:. 23 March 2020
735:
734:
730:
720:
718:
710:
709:
705:
695:
693:
685:
684:
680:
670:
668:
667:. 11 April 2019
659:
658:
654:
649:
645:
635:
633:
624:
623:
619:
609:
607:
599:
598:
591:
581:
579:
564:
555:
545:
543:
530:
529:
512:
502:
500:
492:
491:
487:
477:
475:
465:
461:
451:
449:
444:
443:
439:
429:
427:
417:
406:
396:
394:
384:
375:
365:
363:
348:
344:
334:
332:
331:. 23 March 2020
323:
322:
318:
308:
306:
304:Pandora Archive
298:
297:
293:
288:
276:
260:
212:(effectively a
210:Bayesian filter
179:Wayback Machine
163:
154:
96:
65:
47:(AGWA) and the
41:PANDORA archive
29:online database
17:
12:
11:
5:
813:
803:
802:
797:
792:
787:
773:
772:
761:
760:External links
758:
755:
754:
728:
703:
678:
665:Southern Phone
652:
643:
617:
589:
553:
510:
485:
459:
437:
404:
373:
342:
316:
290:
289:
287:
284:
283:
282:
275:
272:
259:
256:
250:and the NLA's
162:
159:
153:
150:
139:Word documents
95:
92:
80:crawl harvests
64:
61:
15:
9:
6:
4:
3:
2:
812:
801:
798:
796:
793:
791:
788:
786:
783:
782:
780:
769:
764:
763:
742:
738:
732:
717:
713:
707:
692:
688:
682:
666:
662:
656:
647:
631:
627:
621:
606:
602:
596:
594:
578:on 1 May 2020
577:
573:
569:
562:
560:
558:
541:
537:
533:
527:
525:
523:
521:
519:
517:
515:
499:
495:
489:
474:
470:
463:
447:
441:
426:
425:Computerworld
422:
415:
413:
411:
409:
393:
389:
382:
380:
378:
361:
357:
353:
346:
330:
326:
320:
305:
301:
295:
291:
281:
278:
277:
271:
269:
265:
255:
253:
249:
248:
243:
242:legal deposit
239:
233:
229:
227:
223:
219:
215:
211:
207:
203:
198:
196:
192:
191:search engine
188:
184:
180:
176:
171:
169:
158:
149:
147:
142:
140:
136:
132:
128:
123:
120:
116:
111:
106:
104:
100:
91:
87:
85:
81:
77:
73:
68:
60:
58:
54:
50:
46:
42:
38:
35:(NLA) on its
34:
30:
26:
22:
745:. Retrieved
740:
731:
719:. Retrieved
715:
706:
694:. Retrieved
690:
681:
669:. Retrieved
664:
655:
646:
634:. Retrieved
629:
620:
608:. Retrieved
604:
580:. Retrieved
576:the original
571:
544:. Retrieved
540:the original
535:
501:. Retrieved
497:
488:
476:. Retrieved
472:
462:
450:. Retrieved
440:
428:. Retrieved
424:
395:. Retrieved
391:
364:. Retrieved
360:the original
355:
345:
333:. Retrieved
328:
319:
307:. Retrieved
303:
294:
261:
245:
237:
234:
230:
199:
172:
164:
155:
152:Amalgamation
143:
127:Archives Act
126:
124:
107:
102:
97:
88:
69:
66:
57:web archives
24:
20:
18:
214:spam filter
185:, allowing
779:Categories
286:References
129:; however
119:WARC files
175:terabytes
146:terabytes
74:with the
51:'s ".au"
397:30 April
366:30 April
335:30 April
309:30 April
274:See also
189:using a
115:Heritrix
240:, the
224:, and
206:Google
168:ad hoc
131:videos
76:suffix
53:domain
43:, the
747:8 May
721:8 May
716:Trove
696:8 May
691:Trove
671:8 May
636:6 May
610:8 May
582:6 May
546:8 May
503:6 May
498:Trove
478:6 May
452:6 May
430:6 May
222:Yahoo
216:), a
195:cloud
37:Trove
749:2020
723:2020
698:2020
673:2020
638:2020
612:2020
584:2020
548:2020
505:2020
480:2020
454:2020
432:2020
399:2020
368:2020
337:2020
311:2020
135:PDFs
94:AGWA
72:URLs
19:The
137:or
25:AWA
781::
739:.
714:.
689:.
663:.
628:.
603:.
592:^
570:.
556:^
534:.
513:^
496:.
471:.
423:.
407:^
390:.
376:^
354:.
327:.
302:.
254:.
228:.
105:.
751:.
725:.
700:.
675:.
640:.
614:.
586:.
550:.
507:.
482:.
456:.
434:.
401:.
370:.
339:.
313:.
23:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.