42:
92:
of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set. Data sets can also consist of a collection of documents or files.
665:
624:
270:
529:
Statistical Data
Editing: Impact on Data Quality: Volume 3 of Statistical Data Editing, Conference of European Statisticians Statistical standards and studies
309:
112:
Several characteristics define a data set's structure and properties. These include the number and types of the attributes or variables, and various
689:
381:
654:– the Global Change Master Directory containing over 34,000 descriptions of Earth science and environmental science data sets and services
284:
606:
254:
628:
245:
441:
585:
510:
298:
560:
143:
values), for example representing a person's ethnicity. More generally, values may be of any of the kinds described as a
166:, and each row corresponds to the observations on one element of that population. Data sets may further be generated by
540:
100:
discipline, data set is the unit to measure the information released in a public open data repository. The
European
525:
735:
680:
81:
416:
223:
89:
216:
28:
337:
179:
178:
still present their data in the classical data set fashion. If data is missing or suspicious an
17:
322:– Small data set illustrating the importance of graphing the data to avoid statistical fallacies
730:
357:
163:
500:
526:
United
Nations Statistical Commission; United Nations Economic Commission for Europe (2007).
342:
319:
159:
527:
610:
198:
144:
46:
8:
113:
402:
389:
117:
536:
506:
453:
239:
215:– Images of handwritten digits commonly used to test classification, clustering, and
77:
581:
474:
406:
398:
352:
313:
232:
73:
294:
684:
250:
582:"Textbook Examples An Introduction to Categorical Data Analysis by Alan Agresti"
556:
212:
206:
148:
101:
85:
207:
Provided online by
University of California-Irvine Machine Learning Repository
724:
332:
302:
202:
135:, for example representing a person's height in centimeters, but may also be
53:
27:
This article is about the general concept. For files on IBM mainframes, see
661:
136:
41:
674:– free public data published by New York City agencies and other partners.
261:
191:
128:
715:
625:"StatLib :: Data, Software and News from the Statistics Community"
347:
167:
155:
411:
671:
97:
72:. In the case of tabular data, a data set corresponds to one or more
695:
171:
147:. For each variable, the values are normally all of the same kind.
121:
677:
666:
United
Nations Office for the Coordination of Humanitarian Affairs
651:
692:– a wiki/website with links to data sets on many different topics
285:
a snapshot of the data as it was provided on-line by Stuart Coles
132:
700:
660:– The Humanitarian Data Exchange (HDX) is an open humanitarian
140:
711:
158:, data sets usually come from actual observations obtained by
281:
An
Introduction to the Statistical Modeling of Extreme Values
32:
657:
442:"'Big Data': Big gaps of knowledge in the field of Internet"
190:
Several classic data sets have been used extensively in the
175:
69:
645:
498:
382:"The Use of Multiple Measurements in Taxonomic Problems"
706:
714:– Free and open access to global development data by
174:. Some modern statistical analysis software such as
439:
502:Principles of data mining and knowledge discovery
104:portal aggregates more than a million data sets.
722:
557:"UCI Machine Learning Repository: Iris Data Set"
440:Snijders, C.; Matzat, U.; Reips, U.-D. (2012).
379:
229:An Introduction to Categorical Data Analysis
170:for the purpose of testing certain kinds of
151:may exist, which must be indicated somehow.
535:. United Nations Publications. p. 20.
182:method may be used to complete a data set.
45:Various plots of the multivariate data set
375:
373:
446:International Journal of Internet Science
410:
433:
40:
370:
246:Robust Regression and Outlier Detection
14:
723:
201:– Multivariate data set introduced by
235:by UCLA Advanced Research Computing.
127:The values may be numbers, such as
80:of a table represents a particular
24:
519:
403:10.1111/j.1469-1809.1936.tb02137.x
25:
747:
648:– the U.S. Government's open data
639:
499:Jan M. Żytkow, Jan Rauch (2000).
265:– Data used in Chatfield's book,
312:– Used in several papers in the
703:– a machine learning repository
658:Humanitarian Data Exchange(HDX)
588:from the original on 2023-01-31
563:from the original on 2023-04-26
31:. For data communications, see
678:Relational data set repository
617:
599:
574:
549:
492:
467:
227:– Data sets used in the book,
13:
1:
363:
257:at the University of Cologne.
107:
305:, one of the book's authors.
293:– Data used in the book are
116:applicable to them, such as
7:
475:"European open data portal"
326:
267:The Analysis of Time Series
185:
10:
752:
26:
707:UK Government Public Data
696:StatLib–JASA Data Archive
479:European open data portal
316:(data mining) literature.
279:– Data used in the book,
224:Categorical data analysis
139:(i.e., not consisting of
664:platform managed by the
607:"The ROUSSEEUW datasets"
29:Data set (IBM mainframe)
338:Data (computer science)
88:corresponds to a given
358:Data collection system
291:Bayesian Data Analysis
164:statistical population
57:
736:Statistical data sets
481:. European Commission
452:: 1–5. Archived from
380:Fisher, R.A. (1963).
68:) is a collection of
44:
712:World Bank Open Data
287:, the book's author.
243:– Data sets used in
199:Iris flower data set
145:level of measurement
114:statistical measures
683:2018-03-07 at the
390:Annals of Eugenics
320:Anscombe's quartet
253:and Leroy, 1968).
118:standard deviation
58:
38:Collection of data
690:Research Pipeline
512:978-3-540-66490-1
240:Robust statistics
16:(Redirected from
743:
633:
632:
627:. Archived from
621:
615:
614:
609:. Archived from
603:
597:
596:
594:
593:
578:
572:
571:
569:
568:
553:
547:
546:
534:
523:
517:
516:
496:
490:
489:
487:
486:
471:
465:
464:
462:
461:
437:
431:
430:
428:
427:
421:
415:. Archived from
414:
386:
377:
353:Interoperability
314:machine learning
295:provided on-line
271:provided on-line
217:image processing
21:
751:
750:
746:
745:
744:
742:
741:
740:
721:
720:
685:Wayback Machine
642:
637:
636:
623:
622:
618:
605:
604:
600:
591:
589:
580:
579:
575:
566:
564:
555:
554:
550:
543:
532:
524:
520:
513:
497:
493:
484:
482:
473:
472:
468:
459:
457:
438:
434:
425:
423:
419:
384:
378:
371:
366:
329:
310:Bupa liver data
255:Provided online
233:provided online
188:
110:
74:database tables
50:flower data set
39:
36:
23:
22:
15:
12:
11:
5:
749:
739:
738:
733:
719:
718:
709:
704:
698:
693:
687:
675:
669:
655:
649:
641:
640:External links
638:
635:
634:
631:on 2011-01-02.
616:
613:on 2005-02-07.
598:
573:
548:
542:978-9211169522
541:
518:
511:
491:
466:
432:
397:(2): 179–188.
368:
367:
365:
362:
361:
360:
355:
350:
345:
340:
335:
328:
325:
324:
323:
317:
306:
288:
277:Extreme values
274:
258:
236:
220:
213:MNIST database
210:
187:
184:
149:Missing values
109:
106:
102:data.europa.eu
76:, where every
52:introduced by
37:
9:
6:
4:
3:
2:
748:
737:
734:
732:
731:Computer data
729:
728:
726:
717:
713:
710:
708:
705:
702:
699:
697:
694:
691:
688:
686:
682:
679:
676:
673:
672:NYC Open Data
670:
667:
663:
659:
656:
653:
650:
647:
644:
643:
630:
626:
620:
612:
608:
602:
587:
583:
577:
562:
558:
552:
544:
538:
531:
530:
522:
514:
508:
504:
503:
495:
480:
476:
470:
456:on 2019-11-23
455:
451:
447:
443:
436:
422:on 2011-09-28
418:
413:
408:
404:
400:
396:
392:
391:
383:
376:
374:
369:
359:
356:
354:
351:
349:
346:
344:
341:
339:
336:
334:
333:Data blending
331:
330:
321:
318:
315:
311:
307:
304:
303:Andrew Gelman
300:
296:
292:
289:
286:
282:
278:
275:
272:
268:
264:
263:
259:
256:
252:
248:
247:
242:
241:
237:
234:
230:
226:
225:
221:
218:
214:
211:
208:
204:
203:Ronald Fisher
200:
197:
196:
195:
193:
183:
181:
177:
173:
169:
165:
161:
157:
152:
150:
146:
142:
138:
134:
130:
125:
123:
119:
115:
105:
103:
99:
94:
91:
87:
83:
79:
75:
71:
67:
63:
55:
54:Ronald Fisher
51:
49:
43:
34:
30:
19:
662:data sharing
629:the original
619:
611:the original
601:
590:. Retrieved
576:
565:. Retrieved
551:
528:
521:
505:. Springer.
501:
494:
483:. Retrieved
478:
469:
458:. Retrieved
454:the original
449:
445:
435:
424:. Retrieved
417:the original
394:
388:
299:archive link
290:
280:
276:
266:
260:
244:
238:
228:
222:
194:literature:
189:
153:
137:nominal data
129:real numbers
126:
111:
95:
65:
61:
59:
47:
273:by StatLib.
262:Time series
192:statistical
84:, and each
725:Categories
716:World Bank
592:2023-05-02
567:2023-05-02
485:2016-09-23
460:2017-02-10
426:2007-05-22
412:2440/15227
364:References
348:Data store
219:algorithms
180:imputation
168:algorithms
156:statistics
108:Properties
251:Rousseeuw
141:numerical
98:open data
681:Archived
646:Data.gov
586:Archived
561:Archived
343:Sampling
327:See also
205:(1936).
186:Classics
172:software
160:sampling
133:integers
122:kurtosis
82:variable
62:data set
96:In the
66:dataset
56:(1936).
18:Dataset
539:
509:
269:, are
90:record
78:column
533:(PDF)
420:(PDF)
385:(PDF)
301:) by
33:Modem
652:GCMD
537:ISBN
507:ISBN
308:The
283:are
176:SPSS
120:and
70:data
64:(or
48:Iris
701:UCI
407:hdl
399:doi
154:In
131:or
86:row
727::
584:.
559:.
477:.
448:.
444:.
405:.
393:.
387:.
372:^
231:,
162:a
124:.
60:A
668:.
595:.
570:.
545:.
515:.
488:.
463:.
450:7
429:.
409::
401::
395:7
297:(
249:(
209:.
35:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.