89:
204:
small because they stored tiny amounts of data. In modern disks the probability is much larger because they store much more data, whilst not being safer. That way, silent data corruption has not been a serious concern while storage devices remained relatively small and slow. In modern times and with the advent of larger drives and very fast RAID setups, users are capable of transferring 10 bits in a reasonably short time, thus easily reaching the data corruption thresholds.
27:
97:
203:
One problem is that hard disk drive capacities have increased substantially, but their error rates remain unchanged. The data corruption rate has always been roughly constant in time, meaning that modern disks are not much safer than old disks. In old disks the probability of data corruption was very
122:
Data corruption can occur at any level in a system, from the host to the storage medium. Modern systems attempt to detect corruption at many layers and then recover or correct the corruption; this is almost always successful but very rarely the information arriving in the systems memory is corrupted
99:
370:
is another method to reduce the likelihood of data corruption, as disk errors are caught and recovered from before multiple errors accumulate and overwhelm the number of parity bits. Instead of parity being checked on each read, the parity is checked during a regular scan of the disk, often done as
322:
Many errors are detected and corrected by the hard disk drives using the ECC codes which are stored on disk for each sector. If the disk drive detects multiple read errors on a sector it may make a copy of the failing sector on another part of the disk, by remapping the failed sector of the disk to
103:
102:
98:
119:, results in the most dangerous errors as there is no indication that the data is incorrect. Detected data corruption may be permanent with the loss of data, or may be temporary when some part of the system is able to detect and correct the error; there is no data corruption in the latter case.
104:
371:
a low priority background process. The "data scrubbing" operation activates a parity check. If a user simply runs a normal program that reads data from the disk, then the parity would not be checked unless parity-check-on-read was both supported and enabled on the disk subsystem.
79:
Some programs can give a suggestion to repair the file automatically (after the error), and some programs cannot repair it. It depends on the level of corruption, and the built-in functionality of the application to handle the error. There are various causes of the corruption.
101:
130:. Environmental conditions can interfere with data transmission, especially when dealing with wireless transmission methods. Heavy clouds can block satellite transmissions. Wireless networks are susceptible to interference from devices such as microwave ovens.
199:
has acknowledged similar high data corruption rates in their systems. In 2021, faulty processor cores were identified as an additional cause in publications by Google and
Facebook; cores were found to be faulty at a rate of several in thousands of cores.
61:, thus the file might not be opened or might open with some of the data corrupted (or in some cases, completely corrupted, leaving the document unintelligible). The adjacent image is a corrupted image file in which most of the information has been lost.
182:
There are many error sources beyond the disk storage subsystem itself. For instance, cables might be slightly loose, the power supply might be unreliable, external vibrations such as a loud sound, the network might introduce undetected corruption,
897:; Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of 6th Usenix Conference on File and Storage Technologies.
41:
that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end
979:
100:
362:, compared with other data integrity approaches that do not span different layers in the storage stack and allow data corruption to occur while the data passes boundaries between the different layers.
533:
Silent Data
Corruption (SDC), sometimes referred to as Silent Data Error (SDE), is an industry-wide issue impacting not only long-protected memory, storage, and networking, but also computer CPUs.
219:
on more than 1.5 million HDDs over 41 months found more than 400,000 silent data corruptions, out of which more than 30,000 were not detected by the hardware RAID controller (only detected during
690:
242:, in which the system may run for a period of time with undetected initial error causing increasingly more problems until it is ultimately detected. For example, a failure affecting file system
358:, such file systems can also reconstruct corrupted data in a transparent way. This approach allows improved data integrity protection covering the entire data paths, which is usually known as
215:, which is a database software company specializing in large-scale data warehousing and analytics, faces silent corruption every 15 minutes. As another example, a real-life study performed by
53:
containing that data will produce unexpected results when accessed by the system or the related application. Results could range from a minor loss of data to a system crash. For example, if a
76:
with this payload method manages to alter files critical to the running of the computer's operating system software or physical hardware, the entire system may be rendered unusable.
716:
323:
a spare sector without the involvement of the operating system (though this may be delayed until the next write to the sector). This "silent correction" can be monitored using
374:
If appropriate mechanisms are employed to detect and remedy data corruption, data integrity can be maintained. This is particularly important in commercial applications (e.g.
968:
520:
1038:
295:
for data across a set of hard disks and can reconstruct corrupted data upon the failure of a single or multiple disks, depending on the level of RAID implemented. Some
72:, usually by overwriting them with inoperative or garbage code, while a non-malicious virus may also unintentionally corrupt files when it accesses them. If a virus or
327:
and tools available for most operating systems to automatically check the disk drive for impending failures by watching for deteriorating SMART parameters.
768:
730:
Hochschild, Peter H.; Turner, Paul Jack; Mogul, Jeffrey C.; Govindaraju, Rama
Krishna; Ranganathan, Parthasarathy; Culler, David E.; Vahdat, Amin (2021).
354:
checksumming to detect silent data corruption. In addition, if a corruption is detected and the file system uses integrated RAID mechanisms that provide
908:
654:
191:, etc. In 39,000 storage systems that were analyzed, firmware bugs accounted for 5–10% of storage failures. All in all, the error rates as observed by a
682:
1154:
324:
1148:
499:
818:
625:
378:), where an undetected error could either corrupt a database index or change data to drastically affect an account balance, and in the use of
115:
There are two types of data corruption associated with computer systems: undetected and detected. Undetected data corruption, also known as
712:
246:
can result in multiple files being partially damaged or made completely inaccessible as the file system is used in its corrupted state.
546:
170:
1007:
1030:
175:
Some errors go unnoticed, without being detected by the disk firmware or the host operating system; these errors are known as
943:
648:"Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics"
288:
885:
851:
591:
268:
of data has an independently low probability of being changed, data corruption can generally be detected by the use of
754:
465:
1119:
283:
If an uncorrectable data corruption is detected, procedures such as automatic retransmission or restoration from
731:
1182:
450:
273:
255:
927:
David Fiala; Frank
Mueller; Christian Engelmann; Rolf Riesen; Kurt Ferreira; Ron Brightwell (November 2012).
149:
or wear of the storage device fall into the former category, while software failure typically occurs due to
491:
1469:
1134:
1064:
1058:
146:
1144:
647:
126:
Data corruption during transmission has a variety of causes. Interruption of data transmission causes
822:
614:
1080:
1266:
1262:
445:
73:
1191:
798:
495:
1293:
1075:
929:"Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing"
969:"Rachet Up Reliability for Mission-Critical Applications: Intel Instruction Replay Technology"
30:
Photo data corruption; in this case, a result of a failed data recovery from a hard disk drive
1464:
1130:
Detection and
Correction of Silent Data Corruption for Large-Scale High-Performance Computing
1060:
873:
277:
235:
of data became permanently corrupted silently somewhere in the pathway from network to disk.
184:
1139:
1423:
554:
435:
339:
308:
138:
57:
is corrupted, when a person tries to open that file with a document editor they may get an
54:
299:
architectures employ various transparent checks to detect and mitigate data corruption in
8:
1363:
1348:
1276:
487:
420:
69:
1175:
1085:
1034:
760:
415:
926:
1353:
1343:
1207:
764:
750:
239:
1303:
1252:
1237:
1217:
1202:
1093:
1089:
1003:
742:
405:
399:
383:
20:
1129:
928:
1433:
1368:
1358:
1328:
1271:
1232:
355:
261:
88:
1443:
1438:
1403:
1383:
1378:
1333:
1308:
1227:
1124:
410:
366:
220:
43:
792:
580:
16:
Errors in computer data that introduce unintended changes to the original data
1458:
1408:
1398:
1373:
1247:
1212:
1168:
1125:
A Tunable, Software-based DRAM Error
Detection and Correction Library for HPC
877:
847:
316:
195:
study on silent corruption are far higher than one in every 10 bits. Webshop
58:
50:
38:
1097:
746:
1418:
1413:
1393:
1388:
1318:
1313:
1288:
1281:
1257:
425:
331:
150:
26:
1338:
1298:
304:
713:"Observations on Errors, Corrections, & Trust of Dependent Systems"
440:
379:
292:
196:
188:
158:
154:
142:
1428:
1323:
300:
232:
228:
212:
134:
127:
1155:
End-to-end Data
Protection in SAS and Fibre Channel Hard Disk Drives
1222:
455:
351:
269:
243:
386:
data, where a small error can make an extensive dataset unusable.
551:
Oracle – Core Dumps of a Kernel Hacker's Brain – Eric Lowe's Blog
375:
65:
729:
284:
216:
1065:"End-to-end data integrity for file systems: a ZFS case study"
739:
Proceedings of the
Workshop on Hot Topics in Operating Systems
110:
Epilepsy warning: This video contains bright, flashing images.
975:
335:
19:"Corrupted" redirects here. For the Japanese metal band, see
1135:
End-to-end Data
Integrity for File Systems: A ZFS Case Study
1028:
916:. 8th Annual Workshop on Linux Clusters for Super Computing.
939:
460:
343:
224:
192:
850:. Association for Computing Machinery. November 15, 2007.
133:
Hardware and software failure are the two main causes for
347:
296:
265:
208:
1120:
SoftECC: A System for
Software Memory Integrity Checking
1160:
211:
creator Jeff Bonwick stated that the fast database at
872:
680:
819:"Silent data corruption in disk arrays: A solution"
486:
291:disk arrays have the ability to store and evaluate
1140:DRAM Errors in the Wild: A Large-Scale Field Study
1072:USENIX Conference on File and Storage Technologies
1004:"Read Error Severities and Error Management Logic"
848:"A Conversation with Jeff Bonwick and Bill Moore"
794:HotOS 2021: Cores That Don't Count (Fun Hardware)
68:may intentionally corrupt files as part of their
1456:
1031:"How I Use the Advanced Capabilities of Btrfs"
1029:Margaret Bierman; Lenz Grimmer (August 2012).
1176:
966:
227:over six months and involving about 97
49:In general, when data corruption occurs, a
1183:
1169:
676:
674:
92:Photo of an Atari 2600 with corrupted RAM.
1079:
1052:
578:
544:
878:"Keeping Bits Safe: How Hard Can It Be?"
612:
171:Hard disk drive error rates and handling
95:
87:
25:
671:
1457:
888:from the original on December 17, 2013
681:Bernd Panzer-Steindel (8 April 2007).
1164:
238:Silent data corruption may result in
123:and can cause unpredictable results.
693:from the original on 27 October 2012
906:
231:of data, found that about 128
13:
1063:; Remzi H. Arpaci-Dusseau (2010).
628:from the original on 26 April 2012
315:technology, which is available on
287:can be applied. Certain levels of
260:When data corruption behaves as a
249:
14:
1481:
1113:
1010:from the original on 7 April 2012
854:from the original on 16 July 2011
108:A video that has been corrupted.
1059:Yupu Zhang; Abhishek Rajimwale;
719:from the original on 2013-10-29.
594:from the original on 3 July 2012
223:). Another study, performed by
1041:from the original on 2014-01-02
1022:
996:
985:from the original on 2016-02-02
960:
949:from the original on 2014-11-07
920:
900:
866:
840:
811:
801:from the original on 2021-12-22
785:
774:from the original on 2021-06-03
660:from the original on 2022-01-25
502:from the original on 2010-12-26
723:
705:
640:
606:
579:bcantrill (31 December 2008).
572:
545:Eric Lowe (16 November 2005).
538:
513:
480:
451:List of data recovery software
256:Error detection and correction
1:
1147:, and an associated paper on
1145:A study on silent corruptions
474:
466:Reed–Solomon error correction
613:jforonda (31 January 2007).
581:"Shouting in the Datacenter"
7:
821:. NEC. 2009. Archived from
389:
83:
10:
1486:
615:"Faulty FC port meets ZFS"
492:"Solar Storms: Fast Facts"
360:end-to-end data protection
253:
168:
18:
1198:
622:Blogger – Outside the Box
547:"ZFS saves the day(-ta)!"
187:and many other causes of
164:
1190:
732:"Cores that don't count"
553:. Oracle. Archived from
521:"Silent Data Corruption"
446:Forward error correction
350:, use internal data and
313:Intel Instruction Replay
747:10.1145/3458336.3465297
496:Nature Publishing Group
967:Steve Bostian (2012).
402:, also called data rot
278:error correcting codes
177:silent data corruption
117:silent data corruption
112:
93:
31:
1061:Andrea Arpaci-Dusseau
874:David S. H. Rosenthal
309:instruction pipelines
107:
91:
46:, or lack of errors.
29:
1364:Protection (privacy)
436:Data Integrity Field
139:Background radiation
37:refers to errors in
876:(October 1, 2010).
488:Scientific American
421:Radiation hardening
396:Various resources:
272:, and can often be
1470:Product expiration
1035:Oracle Corporation
910:Silent corruptions
828:on 29 October 2013
560:on 5 February 2012
523:. Google Inc. 2023
416:Database integrity
240:cascading failures
189:soft memory errors
113:
94:
32:
1452:
1451:
1444:Wrangling/munging
1294:Format management
741:. pp. 9–16.
432:Countermeasures:
105:
1477:
1185:
1178:
1171:
1162:
1161:
1108:
1107:
1105:
1104:
1083:
1069:
1056:
1050:
1049:
1047:
1046:
1026:
1020:
1019:
1017:
1015:
1000:
994:
993:
991:
990:
984:
973:
964:
958:
957:
955:
954:
948:
933:
924:
918:
917:
915:
904:
898:
896:
894:
893:
870:
864:
863:
861:
859:
844:
838:
837:
835:
833:
827:
815:
809:
808:
807:
806:
789:
783:
782:
780:
779:
773:
736:
727:
721:
720:
709:
703:
702:
700:
698:
678:
669:
668:
666:
665:
659:
652:
644:
638:
637:
635:
633:
619:
610:
604:
603:
601:
599:
585:
576:
570:
569:
567:
565:
559:
542:
536:
535:
530:
528:
517:
511:
510:
508:
507:
484:
406:Computer science
400:Data degradation
311:; an example is
185:cosmic radiation
128:information loss
106:
21:Corrupted (band)
1485:
1484:
1480:
1479:
1478:
1476:
1475:
1474:
1455:
1454:
1453:
1448:
1424:Synchronization
1194:
1189:
1116:
1111:
1102:
1100:
1081:10.1.1.154.3979
1067:
1057:
1053:
1044:
1042:
1027:
1023:
1013:
1011:
1002:
1001:
997:
988:
986:
982:
971:
965:
961:
952:
950:
946:
931:
925:
921:
913:
905:
901:
891:
889:
871:
867:
857:
855:
846:
845:
841:
831:
829:
825:
817:
816:
812:
804:
802:
791:
790:
786:
777:
775:
771:
757:
734:
728:
724:
711:
710:
706:
696:
694:
679:
672:
663:
661:
657:
650:
646:
645:
641:
631:
629:
617:
611:
607:
597:
595:
583:
577:
573:
563:
561:
557:
543:
539:
526:
524:
519:
518:
514:
505:
503:
485:
481:
477:
472:
392:
356:data redundancy
262:Poisson process
258:
252:
250:Countermeasures
207:As an example,
173:
167:
96:
86:
35:Data corruption
24:
17:
12:
11:
5:
1483:
1473:
1472:
1467:
1450:
1449:
1447:
1446:
1441:
1436:
1431:
1426:
1421:
1416:
1411:
1406:
1401:
1396:
1391:
1386:
1381:
1376:
1371:
1366:
1361:
1356:
1351:
1349:Pre-processing
1346:
1341:
1336:
1331:
1326:
1321:
1316:
1311:
1306:
1301:
1296:
1291:
1286:
1285:
1284:
1279:
1274:
1260:
1255:
1250:
1245:
1240:
1235:
1230:
1225:
1220:
1215:
1210:
1205:
1199:
1196:
1195:
1188:
1187:
1180:
1173:
1165:
1159:
1158:
1152:
1149:data integrity
1142:
1137:
1132:
1127:
1122:
1115:
1114:External links
1112:
1110:
1109:
1051:
1021:
995:
959:
919:
899:
865:
839:
810:
784:
755:
722:
704:
687:Data integrity
670:
639:
605:
571:
537:
512:
490:(2008-07-21).
478:
476:
473:
471:
470:
469:
468:
463:
458:
453:
448:
443:
438:
430:
429:
428:
423:
418:
413:
411:Data integrity
408:
403:
393:
391:
388:
367:Data scrubbing
276:by the use of
251:
248:
166:
163:
85:
82:
64:Some types of
44:data integrity
15:
9:
6:
4:
3:
2:
1482:
1471:
1468:
1466:
1463:
1462:
1460:
1445:
1442:
1440:
1437:
1435:
1432:
1430:
1427:
1425:
1422:
1420:
1417:
1415:
1412:
1410:
1407:
1405:
1402:
1400:
1397:
1395:
1392:
1390:
1387:
1385:
1382:
1380:
1377:
1375:
1372:
1370:
1367:
1365:
1362:
1360:
1357:
1355:
1352:
1350:
1347:
1345:
1342:
1340:
1337:
1335:
1332:
1330:
1327:
1325:
1322:
1320:
1317:
1315:
1312:
1310:
1307:
1305:
1302:
1300:
1297:
1295:
1292:
1290:
1287:
1283:
1280:
1278:
1275:
1273:
1270:
1269:
1268:
1264:
1261:
1259:
1256:
1254:
1251:
1249:
1246:
1244:
1241:
1239:
1236:
1234:
1231:
1229:
1226:
1224:
1221:
1219:
1216:
1214:
1211:
1209:
1206:
1204:
1201:
1200:
1197:
1193:
1186:
1181:
1179:
1174:
1172:
1167:
1166:
1163:
1156:
1153:
1150:
1146:
1143:
1141:
1138:
1136:
1133:
1131:
1128:
1126:
1123:
1121:
1118:
1117:
1099:
1095:
1091:
1087:
1082:
1077:
1073:
1066:
1062:
1055:
1040:
1036:
1032:
1025:
1009:
1005:
999:
981:
977:
970:
963:
945:
941:
937:
930:
923:
912:
911:
903:
887:
883:
879:
875:
869:
853:
849:
843:
824:
820:
814:
800:
796:
795:
788:
770:
766:
762:
758:
756:9781450384384
752:
748:
744:
740:
733:
726:
718:
714:
708:
692:
688:
684:
677:
675:
656:
649:
643:
627:
623:
616:
609:
593:
589:
582:
575:
556:
552:
548:
541:
534:
522:
516:
501:
497:
493:
489:
483:
479:
467:
464:
462:
459:
457:
454:
452:
449:
447:
444:
442:
439:
437:
434:
433:
431:
427:
424:
422:
419:
417:
414:
412:
409:
407:
404:
401:
398:
397:
395:
394:
387:
385:
381:
377:
372:
369:
368:
363:
361:
357:
353:
349:
345:
341:
337:
333:
328:
326:
320:
318:
317:Intel Itanium
314:
310:
306:
302:
298:
294:
290:
286:
281:
279:
275:
271:
267:
264:, where each
263:
257:
247:
245:
241:
236:
234:
230:
226:
222:
218:
214:
210:
205:
201:
198:
194:
190:
186:
180:
178:
172:
162:
160:
156:
153:in the code.
152:
148:
144:
140:
136:
131:
129:
124:
120:
118:
111:
90:
81:
77:
75:
71:
67:
62:
60:
59:error message
56:
55:document file
52:
47:
45:
40:
39:computer data
36:
28:
22:
1465:Data quality
1354:Preservation
1344:Philanthropy
1242:
1208:Augmentation
1151:(CERN, 2007)
1101:. Retrieved
1071:
1054:
1043:. Retrieved
1024:
1012:. Retrieved
998:
987:. Retrieved
962:
951:. Retrieved
935:
922:
909:
907:Kelemen, P.
902:
890:. Retrieved
881:
868:
856:. Retrieved
842:
830:. Retrieved
823:the original
813:
803:, retrieved
793:
787:
776:. Retrieved
738:
725:
707:
695:. Retrieved
686:
662:. Retrieved
642:
630:. Retrieved
621:
608:
596:. Retrieved
587:
584:(Video file)
574:
562:. Retrieved
555:the original
550:
540:
532:
525:. Retrieved
515:
504:. Retrieved
482:
426:Software rot
373:
365:
364:
359:
332:file systems
329:
321:
319:processors.
312:
282:
259:
237:
206:
202:
181:
176:
174:
143:head crashes
132:
125:
121:
116:
114:
109:
78:
63:
48:
34:
33:
1414:Stewardship
1304:Integration
1253:Degradation
1238:Compression
1218:Archaeology
1203:Acquisition
858:14 December
832:14 December
683:"Draft 1.3"
527:January 30,
305:CPU buffers
293:parity bits
159:soft errors
157:cause most
155:Cosmic rays
1459:Categories
1434:Validation
1369:Publishing
1359:Processing
1329:Management
1243:Corruption
1233:Collection
1103:2014-08-12
1098:Q111972797
1045:2014-01-02
989:2016-01-27
953:2015-01-26
892:2014-01-02
805:2021-06-02
778:2021-06-02
664:2014-01-18
653:. USENIX.
506:2009-12-08
475:References
441:ECC memory
384:compressed
334:, such as
325:S.M.A.R.T.
301:CPU caches
254:See also:
197:Amazon.com
169:See also:
1439:Warehouse
1404:Scrubbing
1384:Retention
1379:Reduction
1334:Migration
1309:Integrity
1277:Transform
1228:Cleansing
1076:CiteSeerX
882:ACM Queue
765:235311320
380:encrypted
274:corrected
270:checksums
233:megabytes
229:petabytes
221:scrubbing
213:Greenplum
161:in DRAM.
135:data loss
1409:Security
1399:Scraping
1374:Recovery
1248:Curation
1213:Analysis
1094:Wikidata
1039:Archived
1008:Archived
980:Archived
944:Archived
936:fiala.me
886:Archived
852:Archived
799:archived
769:Archived
717:Archived
691:Archived
689:. CERN.
655:Archived
626:Archived
592:Archived
500:Archived
456:Parchive
390:See also
352:metadata
244:metadata
84:Overview
70:payloads
1419:Storage
1394:Science
1389:Quality
1319:Lineage
1314:Library
1289:Farming
1272:Extract
1258:Editing
1090:5722163
1014:4 April
588:YouTube
376:banking
285:backups
280:(ECC).
66:malware
1339:Mining
1299:Fusion
1157:(HGST)
1096:
1088:
1078:
763:
753:
697:9 June
632:9 June
618:(Blog)
598:9 June
564:9 June
558:(Blog)
346:, and
340:HAMMER
217:NetApp
165:Silent
145:, and
74:trojan
1086:S2CID
1068:(PDF)
983:(PDF)
976:Intel
972:(PDF)
947:(PDF)
932:(PDF)
914:(PDF)
826:(PDF)
772:(PDF)
761:S2CID
735:(PDF)
658:(PDF)
651:(PDF)
336:Btrfs
330:Some
147:aging
1429:Type
1324:Loss
1282:Load
1192:Data
1016:2012
940:IEEE
860:2020
834:2020
751:ISBN
699:2012
634:2012
600:2012
566:2012
529:2023
461:RAID
344:ReFS
307:and
289:RAID
225:CERN
193:CERN
151:bugs
51:file
1267:ELT
1263:ETL
1223:Big
743:doi
382:or
348:ZFS
297:CPU
266:bit
209:ZFS
137:.
1461::
1092:.
1084:.
1074:.
1070:.
1037:.
1033:.
1006:.
978:.
974:.
942:.
938:.
934:.
884:.
880:.
797:,
767:.
759:.
749:.
737:.
715:.
685:.
673:^
624:.
620:.
590:.
586:.
549:.
531:.
498:.
494:.
342:,
338:,
303:,
179:.
141:,
1265:/
1184:e
1177:t
1170:v
1106:.
1048:.
1018:.
992:.
956:.
895:.
862:.
836:.
781:.
745::
701:.
667:.
636:.
602:.
568:.
509:.
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.