863:, the Count–min sketch is a linear sketch. That is, given two streams, constructing a sketch on each stream and summing the sketches yields the same result as concatenating the streams and constructing a sketch on the concatenated streams. This makes the sketch mergeable and appropriate for use in distributed settings in addition to streaming ones.
890:(MLE) was derived in Ting. By using the MLE, the estimator is always able to match or better the min estimator and works well even if the distribution is not skewed. This paper also showed the hCount* debiasing operation is a bootstrapping procedure that can be efficiently computed without random sampling and can be generalized to any estimator.
893:
Since errors arise from hash collisions with unknown items from the universe, several approaches correct for the collisions when multiple elements of the universe are known or queried for simultaneously . For each of these, a large proportion of the universe must be known to observe a significant
875:
of the true frequency of events: they may overestimate, but never underestimate the true count in a point query. Furthermore, while the min estimator works well when the distribution is highly skewed, other sketches such as the Count sketch based on means are more accurate when the distribution is
218:. However, they are used differently and therefore sized differently: a count–min sketch typically has a sublinear number of cells, related to the desired approximation quality of the sketch, while a counting Bloom filter is more typically sized to match the number of elements in the set.
226:
The goal of the basic version of the count–min sketch is to consume a stream of events, one at a time, and count the frequency of the different types of events in the stream. At any time, the sketch can be queried for the frequency of a particular event type
1200:
The following discussion assumes that only "positive" events occur, i.e., the frequency of the various types cannot decrease over time. Modifications of the following algorithms exist for the more general case where frequencies are allowed to
1156:
1003:
494:
677:
756:
585:
850:
809:
528:
253:
703:
612:
883:
estimator repeatedly randomly selects d random entries in the sketch and takes the minimum to obtain an unbiased estimate of the bias and subtracts it off.
1008:
215:
274:
are fixed when the sketch is created, and determine the time and space needs and the probability of error when the sketch is queried for a frequency or
192:
207:
1248:
255:, and will return an estimate of this frequency that is within a certain distance of the true frequency, with a certain probability.
139:
1547:
23:
911:
402:
624:
876:
not sufficiently skewed. Several variations on the sketch have been proposed to reduce error and reduce or eliminate bias.
1552:
708:
1404:
1444:
Lu, Yi; Montanari, Andrea; Prabhakar, Balaji; Dharmapurikar, Sarang; Kabbani, Abdul (2008). "Counter braids".
541:
887:
92:
814:
773:
132:
1176:
499:
1512:
1493:
1430:
1387:
Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining
1368:
234:
1310:
856:
Small modifications to the data structure can be used to sketch other different stream statistics.
1504:
Popularity is everything: A new approach to protecting passwords from statistical-guessing attacks
1507:
1488:
1425:
1363:
1305:
682:
196:
125:
283:
211:
1271:
1226:
1162:. While this update procedure makes the sketch not a linear sketch, it is still mergeable.
872:
871:
One potential problem with the usual min estimator for count–min sketches is that they are
590:
165:
108:
31:
8:
1483:
Dwork, Cynthia; Naor, Moni; Pitassi, Toniann; Rothblum, Guy N.; Yekhanin, Sergey (2010).
172:
1527:
1323:
1289:
82:
1461:
1400:
338:
1151:{\displaystyle \mathrm {count} \leftarrow \max\{\mathrm {count} ,{\hat {a_{i}}}+c\}}
1542:
1453:
1390:
1327:
1315:
1263:
184:
1267:
1171:
59:
188:
168:
35:
1536:
1465:
1293:
767:
352:
of the table, apply the corresponding hash function to obtain a column index
275:
176:
1457:
1395:
1249:"An Improved Data Stream Summary: The Count-Min Sketch and its Applications"
1342:
860:
203:
49:
44:
1296:(2000), "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol",
113:
73:
758:
is the stream size, i.e. the total number of items seen by the sketch.
320:, where the error in answering a query is within an additive factor of
180:
1319:
153:
64:
1422:
New estimation algorithms for streaming data: Count-min can do more
395:. The estimated count is given by the least value in the table for
1502:
Schechter, Stuart; Herley, Cormac; Mitzenmacher, Michael (2010).
1443:
1181:
258:
The actual sketch data structure is a two-dimensional array of
770:
between the histograms represented by two count–min sketches,
282:
rows is a separate hash function; the hash functions must be
87:
1501:
900:
changes the update, but not the query algorithms. To count
1360:
Dynamically maintaining frequent items over a data stream
1482:
998:{\displaystyle {\hat {a}}_{i}=\min _{j}\mathrm {count} }
489:{\displaystyle {\hat {a}}_{i}=\min _{j}\mathrm {count} }
1246:
672:{\displaystyle {\hat {a}}_{i}\leq a_{i}+\varepsilon N}
1344:
Sketch algorithms for estimating point queries in NLP
1341:
Goyal, Amit; Daumé, Hal III; Cormode, Graham (2012).
1011:
914:
817:
776:
711:
685:
627:
593:
544:
502:
405:
383:
Several types of queries are possible on the stream.
237:
187:, at the expense of overcounting some events due to
1287:
621:Additionally, this estimate has the guarantee that
1150:
997:
844:
803:
751:{\displaystyle N=\sum _{i\in {\mathcal {U}}}a_{i}}
750:
697:
671:
606:
579:
522:
488:
247:
1340:
1534:
1331:. A preliminary version appeared at SIGCOMM '98.
1063:
938:
429:
171:that serves as a frequency table of events in a
16:Probabilistic data structure in computer science
191:. The count–min sketch was invented in 2003 by
1437:
1506:. USENIX Workshop on Hot Topics in Security.
1357:
210:and can be considered an implementation of a
133:
1446:ACM SIGMETRICS Performance Evaluation Review
1358:Jin, C.; Qian, W.; Xu, X.; Zhou, A. (2003),
1145:
1066:
391:, which asks for the count of an event type
348:arrives we update as follows: for each row
179:to map events to frequencies, but unlike a
1419:
1380:
1378:
1247:Cormode, Graham; S. Muthukrishnan (2005).
866:
140:
126:
1511:
1492:
1429:
1394:
1367:
1309:
580:{\displaystyle a_{i}\leq {\hat {a}}_{i}}
1375:
1224:
199:and described by them in a 2005 paper.
1535:
1220:
1218:
202:Count–min sketch is an alternative to
1384:
1334:
845:{\displaystyle \mathrm {count} _{b}}
804:{\displaystyle \mathrm {count} _{a}}
1298:IEEE/ACM Transactions on Networking
1215:
13:
1476:
1420:Deng, Fan; Rafiei, Davood (2007),
1385:Ting, Daniel (2018). "Count-Min".
1082:
1079:
1076:
1073:
1070:
1025:
1022:
1019:
1016:
1013:
960:
957:
954:
951:
948:
832:
829:
826:
823:
820:
791:
788:
785:
782:
779:
731:
516:
513:
510:
507:
504:
451:
448:
445:
442:
439:
372:. Then increment the value in row
240:
14:
1564:
1521:
908:, one first computes an estimate
614:is the true frequency with which
221:
1485:Pan-private streaming algorithms
1234:Encyclopedia of Database Systems
523:{\displaystyle \mathrm {count} }
231:from a universe of event types
1413:
1351:
1281:
1240:
1194:
1133:
1114:
1111:
1105:
1086:
1060:
1057:
1054:
1048:
1029:
992:
989:
983:
964:
922:
635:
565:
483:
480:
474:
455:
413:
278:. Associated with each of the
248:{\displaystyle {\mathcal {U}}}
1:
1548:Probabilistic data structures
1236:. Springer. pp. 511–516.
1208:
93:Rapidly exploring random tree
1268:10.1016/j.jalgor.2003.12.001
888:maximum likelihood estimator
7:
1165:
10:
1569:
1553:Hash-based data structures
1177:Locality-sensitive hashing
880:
698:{\displaystyle 1-\delta }
344:When a new event of type
294:can be chosen by setting
1225:Cormode, Graham (2009).
1187:
904:instances of event type
618:occurred in the stream.
1458:10.1145/1384529.1375472
1396:10.1145/3219819.3219975
867:Reducing bias and error
1389:. pp. 2319–2328.
1152:
999:
846:
805:
752:
699:
673:
608:
581:
524:
490:
249:
214:(Fan et al., 1998) or
197:S. Muthu Muthukrishnan
1153:
1000:
898:Conservative updating
847:
806:
753:
700:
674:
609:
607:{\displaystyle a_{i}}
582:
525:
491:
266:rows. The parameters
250:
212:counting Bloom filter
1347:. Proc. EMNLP/CoNLL.
1009:
912:
879:To remove bias, the
815:
774:
709:
683:
625:
591:
542:
534:Obviously, for each
500:
403:
387:The simplest is the
284:pairwise independent
235:
109:Randomized algorithm
1288:Fan, Li; Cao, Pei;
764:inner product query
1227:"Count-min sketch"
1148:
995:
946:
842:
801:
748:
737:
695:
669:
604:
577:
520:
486:
437:
245:
83:Random binary tree
1320:10.1109/90.851975
1136:
937:
925:
873:biased estimators
718:
679:with probability
638:
568:
428:
416:
331:(see below), and
324:with probability
286:. The parameters
216:multistage-filter
150:
149:
1560:
1517:
1515:
1498:
1496:
1470:
1469:
1441:
1435:
1434:
1433:
1417:
1411:
1410:
1398:
1382:
1373:
1372:
1371:
1355:
1349:
1348:
1338:
1332:
1330:
1313:
1290:Almeida, Jussara
1285:
1279:
1278:
1276:
1270:. Archived from
1253:
1244:
1238:
1237:
1231:
1222:
1202:
1198:
1161:
1157:
1155:
1154:
1149:
1138:
1137:
1132:
1131:
1122:
1104:
1103:
1085:
1047:
1046:
1028:
1004:
1002:
1001:
996:
982:
981:
963:
945:
933:
932:
927:
926:
918:
907:
903:
882:
851:
849:
848:
843:
841:
840:
835:
810:
808:
807:
802:
800:
799:
794:
757:
755:
754:
749:
747:
746:
736:
735:
734:
704:
702:
701:
696:
678:
676:
675:
670:
659:
658:
646:
645:
640:
639:
631:
617:
613:
611:
610:
605:
603:
602:
586:
584:
583:
578:
576:
575:
570:
569:
561:
554:
553:
537:
529:
527:
526:
521:
519:
495:
493:
492:
487:
473:
472:
454:
436:
424:
423:
418:
417:
409:
398:
394:
379:
375:
371:
351:
347:
336:
330:
323:
319:
308:
293:
289:
281:
273:
269:
265:
261:
254:
252:
251:
246:
244:
243:
230:
185:sub-linear space
158:count–min sketch
142:
135:
128:
55:Count–min sketch
19:
18:
1568:
1567:
1563:
1562:
1561:
1559:
1558:
1557:
1533:
1532:
1524:
1513:10.1.1.170.9356
1494:10.1.1.165.5923
1479:
1477:Further reading
1474:
1473:
1442:
1438:
1431:10.1.1.552.1283
1418:
1414:
1407:
1383:
1376:
1369:10.1.1.151.5909
1356:
1352:
1339:
1335:
1286:
1282:
1274:
1251:
1245:
1241:
1229:
1223:
1216:
1211:
1206:
1205:
1199:
1195:
1190:
1172:Feature hashing
1168:
1159:
1127:
1123:
1121:
1120:
1099:
1095:
1069:
1042:
1038:
1012:
1010:
1007:
1006:
1005:, then updates
977:
973:
947:
941:
928:
917:
916:
915:
913:
910:
909:
905:
901:
869:
836:
819:
818:
816:
813:
812:
795:
778:
777:
775:
772:
771:
742:
738:
730:
729:
722:
710:
707:
706:
684:
681:
680:
654:
650:
641:
630:
629:
628:
626:
623:
622:
615:
598:
594:
592:
589:
588:
571:
560:
559:
558:
549:
545:
543:
540:
539:
535:
503:
501:
498:
497:
468:
464:
438:
432:
419:
408:
407:
406:
404:
401:
400:
396:
392:
377:
373:
365:
353:
349:
345:
332:
325:
321:
310:
295:
291:
287:
279:
271:
267:
263:
259:
239:
238:
236:
233:
232:
228:
224:
146:
60:Quotient filter
36:data structures
34:
17:
12:
11:
5:
1566:
1556:
1555:
1550:
1545:
1531:
1530:
1523:
1522:External links
1520:
1519:
1518:
1499:
1478:
1475:
1472:
1471:
1452:(1): 121–132.
1436:
1412:
1405:
1374:
1350:
1333:
1311:10.1.1.41.1487
1304:(3): 281–293,
1294:Broder, Andrei
1280:
1277:on 2023-05-25.
1239:
1213:
1212:
1210:
1207:
1204:
1203:
1192:
1191:
1189:
1186:
1185:
1184:
1179:
1174:
1167:
1164:
1147:
1144:
1141:
1135:
1130:
1126:
1119:
1116:
1113:
1110:
1107:
1102:
1098:
1094:
1091:
1088:
1084:
1081:
1078:
1075:
1072:
1068:
1065:
1062:
1059:
1056:
1053:
1050:
1045:
1041:
1037:
1034:
1031:
1027:
1024:
1021:
1018:
1015:
994:
991:
988:
985:
980:
976:
972:
969:
966:
962:
959:
956:
953:
950:
944:
940:
936:
931:
924:
921:
868:
865:
854:
853:
839:
834:
831:
828:
825:
822:
798:
793:
790:
787:
784:
781:
745:
741:
733:
728:
725:
721:
717:
714:
694:
691:
688:
668:
665:
662:
657:
653:
649:
644:
637:
634:
601:
597:
574:
567:
564:
557:
552:
548:
532:
531:
518:
515:
512:
509:
506:
485:
482:
479:
476:
471:
467:
463:
460:
457:
453:
450:
447:
444:
441:
435:
431:
427:
422:
415:
412:
361:
339:Euler's number
242:
223:
222:Data structure
220:
193:Graham Cormode
177:hash functions
173:stream of data
169:data structure
148:
147:
145:
144:
137:
130:
122:
119:
118:
117:
116:
111:
103:
102:
98:
97:
96:
95:
90:
85:
77:
76:
70:
69:
68:
67:
62:
57:
52:
47:
39:
38:
28:
27:
15:
9:
6:
4:
3:
2:
1565:
1554:
1551:
1549:
1546:
1544:
1541:
1540:
1538:
1529:
1528:Count–min FAQ
1526:
1525:
1514:
1509:
1505:
1500:
1495:
1490:
1487:. Proc. ICS.
1486:
1481:
1480:
1467:
1463:
1459:
1455:
1451:
1447:
1440:
1432:
1427:
1423:
1416:
1408:
1406:9781450355520
1402:
1397:
1392:
1388:
1381:
1379:
1370:
1365:
1361:
1354:
1346:
1345:
1337:
1329:
1325:
1321:
1317:
1312:
1307:
1303:
1299:
1295:
1291:
1284:
1273:
1269:
1265:
1261:
1257:
1256:J. Algorithms
1250:
1243:
1235:
1228:
1221:
1219:
1214:
1197:
1193:
1183:
1180:
1178:
1175:
1173:
1170:
1169:
1163:
1158:for each row
1142:
1139:
1128:
1124:
1117:
1108:
1100:
1096:
1092:
1089:
1051:
1043:
1039:
1035:
1032:
986:
978:
974:
970:
967:
942:
934:
929:
919:
899:
895:
891:
889:
884:
877:
874:
864:
862:
857:
837:
796:
769:
768:inner product
766:asks for the
765:
761:
760:
759:
743:
739:
726:
723:
719:
715:
712:
692:
689:
686:
666:
663:
660:
655:
651:
647:
642:
632:
619:
599:
595:
572:
562:
555:
550:
546:
530:is the table.
477:
469:
465:
461:
458:
433:
425:
420:
410:
390:
386:
385:
384:
381:
369:
364:
360:
356:
342:
340:
335:
329:
317:
313:
306:
302:
298:
285:
277:
276:inner product
256:
219:
217:
213:
209:
205:
200:
198:
194:
190:
186:
182:
178:
174:
170:
167:
166:probabilistic
163:
159:
155:
143:
138:
136:
131:
129:
124:
123:
121:
120:
115:
112:
110:
107:
106:
105:
104:
100:
99:
94:
91:
89:
86:
84:
81:
80:
79:
78:
75:
72:
71:
66:
63:
61:
58:
56:
53:
51:
48:
46:
43:
42:
41:
40:
37:
33:
32:Probabilistic
30:
29:
25:
21:
20:
1503:
1484:
1449:
1445:
1439:
1421:
1415:
1386:
1359:
1353:
1343:
1336:
1301:
1297:
1283:
1272:the original
1259:
1255:
1242:
1233:
1196:
897:
896:
892:
885:
878:
870:
861:count sketch
858:
855:
763:
620:
533:
388:
382:
367:
362:
358:
354:
343:
333:
327:
315:
311:
304:
300:
296:
262:columns and
257:
225:
204:count sketch
201:
161:
157:
151:
74:Random trees
54:
50:Count sketch
45:Bloom filter
389:point query
114:HyperLogLog
1537:Categories
1209:References
538:, one has
326:1 −
208:AMS sketch
189:collisions
183:uses only
181:hash table
175:. It uses
1508:CiteSeerX
1489:CiteSeerX
1466:0163-5999
1426:CiteSeerX
1364:CiteSeerX
1306:CiteSeerX
1262:: 29–38.
1201:decrease.
1134:^
1061:←
923:^
894:benefit.
859:Like the
727:∈
720:∑
693:δ
690:−
664:ε
648:≤
636:^
566:^
556:≤
414:^
399:, namely
376:, column
162:CM sketch
154:computing
65:Skip list
1166:See also
705:, where
587:, where
496:, where
380:by one.
314:= ⌈ln 1/
24:a series
22:Part of
1543:Hashing
1328:4779754
1182:MinHash
881:hCount*
164:) is a
101:Related
1510:
1491:
1464:
1428:
1403:
1366:
1326:
1308:
328:δ
156:, the
1324:S2CID
1275:(PDF)
1252:(PDF)
1230:(PDF)
1188:Notes
88:Treap
1462:ISSN
1401:ISBN
811:and
309:and
290:and
270:and
206:and
195:and
1454:doi
1391:doi
1316:doi
1264:doi
1064:max
939:min
762:An
430:min
337:is
299:= ⌈
152:In
1539::
1460:.
1450:36
1448:.
1424:,
1399:.
1377:^
1362:,
1322:,
1314:,
1300:,
1292:;
1260:55
1258:.
1254:.
1232:.
1217:^
886:A
357:=
341:.
26:on
1516:.
1497:.
1468:.
1456::
1409:.
1393::
1318::
1302:8
1266::
1160:j
1146:}
1143:c
1140:+
1129:i
1125:a
1118:,
1115:]
1112:)
1109:i
1106:(
1101:j
1097:h
1093:,
1090:j
1087:[
1083:t
1080:n
1077:u
1074:o
1071:c
1067:{
1058:]
1055:)
1052:i
1049:(
1044:j
1040:h
1036:,
1033:j
1030:[
1026:t
1023:n
1020:u
1017:o
1014:c
993:]
990:)
987:i
984:(
979:j
975:h
971:,
968:j
965:[
961:t
958:n
955:u
952:o
949:c
943:j
935:=
930:i
920:a
906:i
902:c
852:.
838:b
833:t
830:n
827:u
824:o
821:c
797:a
792:t
789:n
786:u
783:o
780:c
744:i
740:a
732:U
724:i
716:=
713:N
687:1
667:N
661:+
656:i
652:a
643:i
633:a
616:i
600:i
596:a
573:i
563:a
551:i
547:a
536:i
517:t
514:n
511:u
508:o
505:c
484:]
481:)
478:i
475:(
470:j
466:h
462:,
459:j
456:[
452:t
449:n
446:u
443:o
440:c
434:j
426:=
421:i
411:a
397:i
393:i
378:k
374:j
370:)
368:i
366:(
363:j
359:h
355:k
350:j
346:i
334:e
322:ε
318:⌉
316:δ
312:d
307:⌉
305:ε
303:/
301:e
297:w
292:d
288:w
280:d
272:d
268:w
264:d
260:w
241:U
229:i
160:(
141:e
134:t
127:v
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.