Knowledge

Data set

Source 📝

42: 92:
of the data set in question. The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set. Data sets can also consist of a collection of documents or files.
665: 624: 270: 529:
Statistical Data Editing: Impact on Data Quality: Volume 3 of Statistical Data Editing, Conference of European Statisticians Statistical standards and studies
309: 112:
Several characteristics define a data set's structure and properties. These include the number and types of the attributes or variables, and various
689: 381: 654:– the Global Change Master Directory containing over 34,000 descriptions of Earth science and environmental science data sets and services 284: 606: 254: 628: 245: 441: 585: 510: 298: 560: 143:
values), for example representing a person's ethnicity. More generally, values may be of any of the kinds described as a
166:, and each row corresponds to the observations on one element of that population. Data sets may further be generated by 540: 100:
discipline, data set is the unit to measure the information released in a public open data repository. The European
525: 735: 680: 81: 416: 223: 89: 216: 28: 337: 179: 178:
still present their data in the classical data set fashion. If data is missing or suspicious an
17: 322:– Small data set illustrating the importance of graphing the data to avoid statistical fallacies 730: 357: 163: 500: 526:
United Nations Statistical Commission; United Nations Economic Commission for Europe (2007).
342: 319: 159: 527: 610: 198: 144: 46: 8: 113: 402: 389: 117: 536: 506: 453: 239: 215:– Images of handwritten digits commonly used to test classification, clustering, and 77: 581: 474: 406: 398: 352: 313: 232: 73: 294: 684: 250: 582:"Textbook Examples An Introduction to Categorical Data Analysis by Alan Agresti" 556: 212: 206: 148: 101: 85: 207:
Provided online by University of California-Irvine Machine Learning Repository
724: 332: 302: 202: 135:, for example representing a person's height in centimeters, but may also be 53: 27:
This article is about the general concept. For files on IBM mainframes, see
661: 136: 41: 674:– free public data published by New York City agencies and other partners. 261: 191: 128: 715: 625:"StatLib :: Data, Software and News from the Statistics Community" 347: 167: 155: 411: 671: 97: 72:. In the case of tabular data, a data set corresponds to one or more 695: 171: 147:. For each variable, the values are normally all of the same kind. 121: 677: 666:
United Nations Office for the Coordination of Humanitarian Affairs
651: 692:– a wiki/website with links to data sets on many different topics 285:
a snapshot of the data as it was provided on-line by Stuart Coles
132: 700: 660:– The Humanitarian Data Exchange (HDX) is an open humanitarian 140: 711: 158:, data sets usually come from actual observations obtained by 281:
An Introduction to the Statistical Modeling of Extreme Values
32: 657: 442:"'Big Data': Big gaps of knowledge in the field of Internet" 190:
Several classic data sets have been used extensively in the
175: 69: 645: 498: 382:"The Use of Multiple Measurements in Taxonomic Problems" 706: 714:– Free and open access to global development data by 174:. Some modern statistical analysis software such as 439: 502:Principles of data mining and knowledge discovery 104:portal aggregates more than a million data sets. 722: 557:"UCI Machine Learning Repository: Iris Data Set" 440:Snijders, C.; Matzat, U.; Reips, U.-D. (2012). 379: 229:An Introduction to Categorical Data Analysis 170:for the purpose of testing certain kinds of 151:may exist, which must be indicated somehow. 535:. United Nations Publications. p. 20. 182:method may be used to complete a data set. 45:Various plots of the multivariate data set 375: 373: 446:International Journal of Internet Science 410: 433: 40: 370: 246:Robust Regression and Outlier Detection 14: 723: 201:– Multivariate data set introduced by 235:by UCLA Advanced Research Computing. 127:The values may be numbers, such as 80:of a table represents a particular 24: 519: 403:10.1111/j.1469-1809.1936.tb02137.x 25: 747: 648:– the U.S. Government's open data 639: 499:Jan M. Żytkow, Jan Rauch (2000). 265:– Data used in Chatfield's book, 312:– Used in several papers in the 703:– a machine learning repository 658:Humanitarian Data Exchange(HDX) 588:from the original on 2023-01-31 563:from the original on 2023-04-26 31:. For data communications, see 678:Relational data set repository 617: 599: 574: 549: 492: 467: 227:– Data sets used in the book, 13: 1: 363: 257:at the University of Cologne. 107: 305:, one of the book's authors. 293:– Data used in the book are 116:applicable to them, such as 7: 475:"European open data portal" 326: 267:The Analysis of Time Series 185: 10: 752: 26: 707:UK Government Public Data 696:StatLib–JASA Data Archive 479:European open data portal 316:(data mining) literature. 279:– Data used in the book, 224:Categorical data analysis 139:(i.e., not consisting of 664:platform managed by the 607:"The ROUSSEEUW datasets" 29:Data set (IBM mainframe) 338:Data (computer science) 88:corresponds to a given 358:Data collection system 291:Bayesian Data Analysis 164:statistical population 57: 736:Statistical data sets 481:. European Commission 452:: 1–5. Archived from 380:Fisher, R.A. (1963). 68:) is a collection of 44: 712:World Bank Open Data 287:, the book's author. 243:– Data sets used in 199:Iris flower data set 145:level of measurement 114:statistical measures 683:2018-03-07 at the 390:Annals of Eugenics 320:Anscombe's quartet 253:and Leroy, 1968). 118:standard deviation 58: 38:Collection of data 690:Research Pipeline 512:978-3-540-66490-1 240:Robust statistics 16:(Redirected from 743: 633: 632: 627:. Archived from 621: 615: 614: 609:. Archived from 603: 597: 596: 594: 593: 578: 572: 571: 569: 568: 553: 547: 546: 534: 523: 517: 516: 496: 490: 489: 487: 486: 471: 465: 464: 462: 461: 437: 431: 430: 428: 427: 421: 415:. Archived from 414: 386: 377: 353:Interoperability 314:machine learning 295:provided on-line 271:provided on-line 217:image processing 21: 751: 750: 746: 745: 744: 742: 741: 740: 721: 720: 685:Wayback Machine 642: 637: 636: 623: 622: 618: 605: 604: 600: 591: 589: 580: 579: 575: 566: 564: 555: 554: 550: 543: 532: 524: 520: 513: 497: 493: 484: 482: 473: 472: 468: 459: 457: 438: 434: 425: 423: 419: 384: 378: 371: 366: 329: 310:Bupa liver data 255:Provided online 233:provided online 188: 110: 74:database tables 50:flower data set 39: 36: 23: 22: 15: 12: 11: 5: 749: 739: 738: 733: 719: 718: 709: 704: 698: 693: 687: 675: 669: 655: 649: 641: 640:External links 638: 635: 634: 631:on 2011-01-02. 616: 613:on 2005-02-07. 598: 573: 548: 542:978-9211169522 541: 518: 511: 491: 466: 432: 397:(2): 179–188. 368: 367: 365: 362: 361: 360: 355: 350: 345: 340: 335: 328: 325: 324: 323: 317: 306: 288: 277:Extreme values 274: 258: 236: 220: 213:MNIST database 210: 187: 184: 149:Missing values 109: 106: 102:data.europa.eu 76:, where every 52:introduced by 37: 9: 6: 4: 3: 2: 748: 737: 734: 732: 731:Computer data 729: 728: 726: 717: 713: 710: 708: 705: 702: 699: 697: 694: 691: 688: 686: 682: 679: 676: 673: 672:NYC Open Data 670: 667: 663: 659: 656: 653: 650: 647: 644: 643: 630: 626: 620: 612: 608: 602: 587: 583: 577: 562: 558: 552: 544: 538: 531: 530: 522: 514: 508: 504: 503: 495: 480: 476: 470: 456:on 2019-11-23 455: 451: 447: 443: 436: 422:on 2011-09-28 418: 413: 408: 404: 400: 396: 392: 391: 383: 376: 374: 369: 359: 356: 354: 351: 349: 346: 344: 341: 339: 336: 334: 333:Data blending 331: 330: 321: 318: 315: 311: 307: 304: 303:Andrew Gelman 300: 296: 292: 289: 286: 282: 278: 275: 272: 268: 264: 263: 259: 256: 252: 248: 247: 242: 241: 237: 234: 230: 226: 225: 221: 218: 214: 211: 208: 204: 203:Ronald Fisher 200: 197: 196: 195: 193: 183: 181: 177: 173: 169: 165: 161: 157: 152: 150: 146: 142: 138: 134: 130: 125: 123: 119: 115: 105: 103: 99: 94: 91: 87: 83: 79: 75: 71: 67: 63: 55: 54:Ronald Fisher 51: 49: 43: 34: 30: 19: 662:data sharing 629:the original 619: 611:the original 601: 590:. Retrieved 576: 565:. Retrieved 551: 528: 521: 505:. Springer. 501: 494: 483:. Retrieved 478: 469: 458:. Retrieved 454:the original 449: 445: 435: 424:. Retrieved 417:the original 394: 388: 299:archive link 290: 280: 276: 266: 260: 244: 238: 228: 222: 194:literature: 189: 153: 137:nominal data 129:real numbers 126: 111: 95: 65: 61: 59: 47: 273:by StatLib. 262:Time series 192:statistical 84:, and each 725:Categories 716:World Bank 592:2023-05-02 567:2023-05-02 485:2016-09-23 460:2017-02-10 426:2007-05-22 412:2440/15227 364:References 348:Data store 219:algorithms 180:imputation 168:algorithms 156:statistics 108:Properties 251:Rousseeuw 141:numerical 98:open data 681:Archived 646:Data.gov 586:Archived 561:Archived 343:Sampling 327:See also 205:(1936). 186:Classics 172:software 160:sampling 133:integers 122:kurtosis 82:variable 62:data set 96:In the 66:dataset 56:(1936). 18:Dataset 539:  509:  269:, are 90:record 78:column 533:(PDF) 420:(PDF) 385:(PDF) 301:) by 33:Modem 652:GCMD 537:ISBN 507:ISBN 308:The 283:are 176:SPSS 120:and 70:data 64:(or 48:Iris 701:UCI 407:hdl 399:doi 154:In 131:or 86:row 727:: 584:. 559:. 477:. 448:. 444:. 405:. 393:. 387:. 372:^ 231:, 162:a 124:. 60:A 668:. 595:. 570:. 545:. 515:. 488:. 463:. 450:7 429:. 409:: 401:: 395:7 297:( 249:( 209:. 35:. 20:)

Index

Dataset
Data set (IBM mainframe)
Modem

Iris flower data set
Ronald Fisher
data
database tables
column
variable
row
record
open data
data.europa.eu
statistical measures
standard deviation
kurtosis
real numbers
integers
nominal data
numerical
level of measurement
Missing values
statistics
sampling
statistical population
algorithms
software
SPSS
imputation

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.