25:
245:
et al. developed the
Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.
240:
programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts,
216:
table aggregators have not been released, the open-sourced runtime is not useful for large-scale data analysis of multiple log files off the shelf. Sawzall has been replaced by Lingo (logs in
249:
Currently, only the language runtime (which runs a
Sawzall script once over a single input) has been open-sourced; the supporting program built on MapReduce has not been released.
359:
count: table sum of int; total: table sum of float; sum_of_squares: table sum of float; x: float = input; emit count <- 1; emit total <- x; emit sum_of_squares <- x * x;
460:
S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on
Operating Systems Principles, Proceedings, 17 ACM Press, 2003, pp. 29–43.
356:
This complete
Sawzall program will read the input and produce three results: the number of records, the sum of the values, and the sum of the squares of the values.
397:
287:
In addition, there are several statistical table types that give inexact results. The higher the parameter n, the more accurate the estimates are.
475:
42:
1015:
1000:
332:
lists, maps, and structs. However, there are no references or pointers. All assignments and function arguments create copies. This means that
89:
61:
261:
A Sawzall script has a single input (a log record) and can output only by emitting to tables. The script can have no other side-effects.
1020:
1005:
68:
436:
75:
681:
57:
968:
502:
212:
records. Sawzall was first described in 2003, and the szl runtime was open-sourced in August 2010. However, since the
108:
621:
423:
974:
46:
743:
551:
82:
922:
748:
614:
932:
530:
753:
674:
561:
217:
1010:
776:
582:
333:
198:
781:
717:
35:
906:
712:
592:
495:
368:
979:
525:
881:
806:
727:
667:
201:
8:
821:
659:
577:
377:
329:
233:
162:
127:
891:
831:
587:
488:
896:
826:
796:
229:
157:
340:
638:
962:
643:
344:
597:
994:
372:
856:
836:
236:. In order to perform calculations involving the logs, engineers can write
816:
791:
470:
305:
gives n values that are probably the most frequent of the emitted values.
173:
471:
Google Code
Archive - Long-term storage for Google Code Project Hosting.
927:
901:
866:
722:
299:
calculates a cumulative probability distribution of the given numbers.
264:
A script can define any number of output tables. Table types include:
786:
546:
317:
Sawzall's design favors efficiency and engine simplicity over power:
237:
213:
24:
942:
886:
511:
321:
Sawzall is statically typed, and the engine compiles the script to
242:
209:
871:
851:
811:
801:
408:
228:
Google's server logs are stored as large collections of records (
937:
876:
861:
690:
205:
132:
648:
293:
gives a random sample of n values from all the emitted values
846:
841:
480:
322:
689:
396:
Rob Pike, Sean
Dorward, Robert Griesemer, Sean Quinlan.
398:
Interpreting the Data: Parallel
Analysis with Sawzall
424:
Discussion on which parts of
Sawzall are open-source
419:
417:
49:. Unsourced material may be challenged and removed.
282:saves only the highest n values on a given weight.
992:
414:
311:estimates the number of unique values emitted.
232:) that are partitioned over many disks within
675:
496:
409:Sawzall's open source project at Google Code
682:
668:
503:
489:
371:– similar tool and language for use with
109:Learn how and when to remove this message
208:to process large numbers of individual
993:
1016:Programming languages created in 2003
1001:Domain-specific programming languages
663:
484:
276:saves the sum of every emitted value
47:adding citations to reliable sources
18:
257:Some interesting features include:
220:) for most purposes within Google.
58:"Sawzall" programming language
13:
969:Google LLC v. Oracle America, Inc.
454:
14:
1032:
1021:Software using the Apache license
464:
1006:Procedural programming languages
622:The Unix Programming Environment
23:
975:Open Source Security Foundation
351:
34:needs additional citations for
429:
402:
390:
1:
693:free and open-source software
383:
339:Like C, functions can modify
223:
510:
7:
615:The Practice of Programming
362:
252:
144:; 21 years ago
10:
1037:
336:and cycles are impossible.
955:
915:
767:
736:
705:
698:
631:
606:
570:
539:
518:
334:recursive data structures
270:saves every value emitted
168:
156:
138:
126:
737:Programming languages
540:Programming languages
526:Plan 9 from Bell Labs
347:but are not closures.
328:Sawzall supports the
202:programming language
43:improve this article
16:Programming language
437:"Replacing Sawzall"
330:compound data types
139:First appeared
123:
378:Sawmill (software)
325:before running it.
163:Apache License 2.0
121:
988:
987:
951:
950:
916:Operating systems
770:development tools
657:
656:
519:Operating systems
192:
191:
119:
118:
111:
93:
1028:
897:Protocol Buffers
703:
702:
684:
677:
670:
661:
660:
505:
498:
491:
482:
481:
448:
447:
445:
444:
433:
427:
421:
412:
406:
400:
394:
341:global variables
310:
304:
298:
292:
281:
275:
269:
230:Protocol Buffers
197:is a procedural
188:
185:
183:
181:
179:
177:
175:
152:
150:
145:
124:
120:
114:
107:
103:
100:
94:
92:
51:
27:
19:
1036:
1035:
1031:
1030:
1029:
1027:
1026:
1025:
1011:Google software
991:
990:
989:
984:
947:
911:
769:
763:
732:
694:
688:
658:
653:
627:
602:
566:
535:
514:
509:
467:
457:
455:Further reading
452:
451:
442:
440:
435:
434:
430:
422:
415:
407:
403:
395:
391:
386:
365:
360:
354:
345:local variables
308:
302:
296:
290:
279:
273:
267:
255:
226:
199:domain-specific
172:
148:
146:
143:
115:
104:
98:
95:
52:
50:
40:
28:
17:
12:
11:
5:
1034:
1024:
1023:
1018:
1013:
1008:
1003:
986:
985:
983:
982:
980:Summer of Code
977:
972:
965:
959:
957:
953:
952:
949:
948:
946:
945:
940:
935:
930:
925:
919:
917:
913:
912:
910:
909:
904:
899:
894:
889:
884:
879:
874:
869:
864:
859:
854:
849:
844:
839:
834:
829:
824:
819:
814:
809:
804:
799:
794:
789:
784:
779:
773:
771:
768:Frameworks and
765:
764:
762:
761:
756:
751:
746:
740:
738:
734:
733:
731:
730:
725:
720:
715:
709:
707:
700:
696:
695:
687:
686:
679:
672:
664:
655:
654:
652:
651:
646:
644:Mark V. Shaney
641:
635:
633:
629:
628:
626:
625:
618:
610:
608:
604:
603:
601:
600:
595:
590:
585:
580:
574:
572:
568:
567:
565:
564:
559:
554:
549:
543:
541:
537:
536:
534:
533:
528:
522:
520:
516:
515:
508:
507:
500:
493:
485:
479:
478:
473:
466:
465:External links
463:
462:
461:
456:
453:
450:
449:
428:
413:
401:
388:
387:
385:
382:
381:
380:
375:
364:
361:
358:
353:
350:
349:
348:
337:
326:
315:
314:
313:
312:
306:
300:
294:
285:
284:
283:
277:
271:
262:
254:
251:
225:
222:
190:
189:
170:
166:
165:
160:
154:
153:
140:
136:
135:
130:
117:
116:
31:
29:
22:
15:
9:
6:
4:
3:
2:
1033:
1022:
1019:
1017:
1014:
1012:
1009:
1007:
1004:
1002:
999:
998:
996:
981:
978:
976:
973:
971:
970:
966:
964:
961:
960:
958:
954:
944:
941:
939:
936:
934:
931:
929:
926:
924:
921:
920:
918:
914:
908:
905:
903:
900:
898:
895:
893:
890:
888:
885:
883:
880:
878:
875:
873:
870:
868:
865:
863:
860:
858:
855:
853:
850:
848:
845:
843:
840:
838:
835:
833:
830:
828:
825:
823:
820:
818:
815:
813:
810:
808:
807:Closure Tools
805:
803:
800:
798:
795:
793:
790:
788:
785:
783:
780:
778:
775:
774:
772:
766:
760:
757:
755:
752:
750:
747:
745:
742:
741:
739:
735:
729:
726:
724:
721:
719:
716:
714:
711:
710:
708:
704:
701:
697:
692:
685:
680:
678:
673:
671:
666:
665:
662:
650:
647:
645:
642:
640:
637:
636:
634:
630:
624:
623:
619:
617:
616:
612:
611:
609:
605:
599:
596:
594:
591:
589:
586:
584:
581:
579:
576:
575:
573:
569:
563:
560:
558:
555:
553:
550:
548:
545:
544:
542:
538:
532:
529:
527:
524:
523:
521:
517:
513:
506:
501:
499:
494:
492:
487:
486:
483:
477:
474:
472:
469:
468:
459:
458:
438:
432:
425:
420:
418:
410:
405:
399:
393:
389:
379:
376:
374:
373:Apache Hadoop
370:
367:
366:
357:
346:
342:
338:
335:
331:
327:
324:
320:
319:
318:
307:
301:
295:
289:
288:
286:
278:
272:
266:
265:
263:
260:
259:
258:
250:
247:
244:
239:
235:
231:
221:
219:
215:
211:
207:
203:
200:
196:
187:
171:
167:
164:
161:
159:
155:
141:
137:
134:
131:
129:
125:
113:
110:
102:
91:
88:
84:
81:
77:
74:
70:
67:
63:
60: –
59:
55:
54:Find sources:
48:
44:
38:
37:
32:This article
30:
26:
21:
20:
967:
758:
706:Applications
639:Renée French
620:
613:
607:Publications
556:
441:. Retrieved
439:. 2015-12-04
431:
404:
392:
355:
352:Sawzall code
316:
256:
248:
227:
194:
193:
105:
96:
86:
79:
72:
65:
53:
41:Please help
36:verification
33:
817:FlatBuffers
297:quantile(n)
995:Categories
928:ChromiumOS
902:TensorFlow
867:Kubernetes
723:OpenRefine
443:2018-06-18
384:References
280:maximum(n)
268:collection
224:Motivation
204:, used by
99:April 2011
69:newspapers
787:AngularJS
728:Tesseract
547:Newsqueak
476:MapReduce
309:unique(n)
291:sample(n)
238:MapReduce
214:MapReduce
128:Developer
943:Goobuntu
887:OR-Tools
713:Chromium
699:Software
571:Software
512:Rob Pike
363:See also
253:Features
243:Rob Pike
180:/archive
963:Code-in
956:Related
933:Fuchsia
892:Polymer
872:LevelDB
852:Guetzli
822:Flutter
812:Cpplint
802:Blockly
782:Angular
759:Sawzall
557:Sawzall
531:Inferno
195:Sawzall
176:.google
169:Website
158:License
147: (
122:Sawzall
83:scholar
938:gLinux
877:libvpx
862:gVisor
832:Gerrit
744:Carbon
691:Google
303:top(n)
206:Google
133:Google
85:
78:
71:
64:
56:
857:Guice
837:Guava
827:Gears
797:Bazel
718:Gemma
649:UTF-8
632:Other
552:Limbo
90:JSTOR
76:books
923:AOSP
882:NaCl
847:Gson
842:gRPC
792:Beam
749:Dart
583:Blit
578:acme
343:and
184:/szl
178:.com
174:code
149:2003
142:2003
62:news
777:AMP
593:rio
588:sam
369:Pig
323:x86
274:sum
234:GFS
210:log
45:by
997::
907:V8
754:Go
598:8½
562:Go
416:^
218:Go
182:/p
683:e
676:t
669:v
504:e
497:t
490:v
446:.
426:.
411:.
186:/
151:)
112:)
106:(
101:)
97:(
87:·
80:·
73:·
66:·
39:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.