Knowledge

Overdispersion

Source đź“ť

25: 292:(Gaussian) has variance as a parameter, any data with finite variance (including any finite data) can be modeled with a normal distribution with the exact variance – the normal distribution is a two-parameter model, with mean and variance. Thus, in the absence of an underlying model, there is no notion of data being overdispersed relative to the normal model, though the fit may be poor in other respects (such as the higher moments of 360:, however, meanings have been transposed, so that overdispersion is actually taken to mean more even (lower variance) than expected. This confusion has caused some ecologists to suggest that the terms 'aggregated', or 'contagious', would be better used in ecology for 'overdispersed'. Such preferences are creeping into 241:
distribution is a popular and analytically tractable alternative model to the binomial distribution since it provides a better fit to the observed data. To capture the heterogeneity of the families, one can think of the probability parameter of the binomial model (say, probability of being a boy) is
233:
for one possible explanation) i.e. there are more all-boy families, more all-girl families and not enough families close to the population 51:49 boy-to-girl mean ratio than expected from a binomial distribution, and the resulting empirical variance is larger than specified by a binomial model.
197:. The Poisson distribution has one free parameter and does not allow for the variance to be adjusted independently of the mean. The choice of a distribution from the Poisson family is often dictated by the nature of the empirical data. For example, 319:
of repeated surveys of a fixed population (say with a given sample size, so margin of error is the same), one expects the results to fall on normal distribution with standard deviation equal to the margin of error. However, in the presence of
336:
all with a margin of error of 3%, if they are conducted by different polling organizations, one expects the results to have standard deviation greater than 3%, due to pollster bias from different methodologies.
176:
means that there was less variation in the data than predicted. Overdispersion is a very common feature in applied data analysis because in practice, populations are frequently
300:, etc.). However, in the case that the data is modeled by a normal distribution with an expected variation, it can be over- or under-dispersed relative to that prediction. 280:
With respect to binomial random variables, the concept of overdispersion makes sense only if n>1 (i.e. overdispersion is nonsensical for Bernoulli random variables).
205:. If overdispersion is a feature, an alternative model with additional free parameters may provide a better fit. In the case of count data, a Poisson 213:
can be proposed instead, in which the mean of the Poisson distribution can itself be thought of as a random variable drawn – in this case – from the
217:
thereby introducing an additional free parameter (note the resulting negative binomial distribution is completely characterized by two parameters).
570: 160:. However, especially for simple models with few parameters, theoretical predictions may not match empirical observations for higher 353:, the term 'overdispersion' is generally used as defined here – meaning a distribution with a higher than expected variance. 89: 225:
As a more concrete example, it has been observed that the number of boys born to families does not conform faithfully to a
61: 229:
as might be expected. Instead, the sex ratios of families seem to skew toward either boys or girls (see, for example the
68: 548: 521: 389: 329: 206: 108: 42: 273:. In this case, if the variance of the normal variable is zero, the model reduces to the standard (undispersed) 75: 230: 210: 46: 152:
of the chosen model. It is usually possible to choose the model parameters in such a way that the theoretical
193:
Overdispersion is often encountered when fitting very simple parametric models, such as those based on the
57: 371:, overdispersion is often evident in the analysis of death count data, but demographers prefer the term ' 580: 575: 372: 35: 332:
and will be overdistributed relative to the predicted distribution. For example, given repeated
180:(non-uniform) contrary to the assumptions implicit within widely used simple parametric models. 258: 130: 364:
too. Generally this suggestion has not been heeded, and confusion persists in the literature.
251: 226: 82: 277:. This model has an additional free parameter, namely the variance of the normal variable. 243: 194: 161: 8: 384: 321: 289: 274: 266: 262: 238: 492: 346: 304: 214: 198: 544: 517: 484: 443: 435: 247: 148:
to fit a given set of empirical observations. This necessitates an assessment of the
134: 496: 474: 427: 394: 270: 145: 511: 308: 153: 149: 345:
Over- and underdispersion are terms which have been adopted in branches of the
312: 538: 257:
Another common model for overdispersion—when some of the observations are not
564: 439: 325: 316: 177: 479: 462: 415: 488: 447: 361: 350: 333: 431: 157: 368: 315:
and hence dispersion of results on repeated surveys. If one performs a
202: 141: 122: 24: 16:
Presence of greater variability in a data set than would be expected
297: 293: 165: 357: 463:"Analysis of the Human Sex Ratio by using Overdispersion Models" 340: 416:"The most widely publicized gender problem in human genetics" 414:
Stansfield, William D.; Carlton, Matthew A. (February 2009).
269:. Software is widely available for fitting this type of 133:) in a data set than would be expected based on a given 168:
is higher than the variance of a theoretical model,
49:. Unsourced material may be challenged and removed. 516:(Third ed.). University of California Press. 467:Journal of the Royal Statistical Society, Series C 254:(beta-binomial) has an additional free parameter. 413: 562: 460: 341:Differences in terminology among disciplines 509: 156:of the model is approximately equal to the 250:as the mixing distribution. The resulting 478: 461:Lindsey, J. K.; Altham, P. M. E. (1998). 311:(determined by sample size) predicts the 109:Learn how and when to remove this message 129:is the presence of greater variability ( 563: 536: 283: 47:adding citations to reliable sources 18: 201:analysis is commonly used to model 13: 14: 592: 540:Evolutionary Ecology of Parasites 390:Compound probability distribution 571:Probability distribution fitting 328:, the distribution is instead a 23: 246:) drawn for each family from a 242:itself a random variable (i.e. 34:needs additional citations for 543:. Princeton University Press. 530: 503: 454: 407: 211:negative binomial distribution 1: 400: 324:where studies have different 7: 378: 261:—arises from introducing a 220: 183: 10: 597: 513:Quantitative Plant Ecology 231:Trivers–Willard hypothesis 188: 172:has occurred. Conversely, 140:A common task in applied 510:Greig-Smith, P. (1983). 373:unobserved heterogeneity 480:10.1111/1467-9876.00103 263:normal random variable 131:statistical dispersion 330:compound distribution 252:compound distribution 227:binomial distribution 432:10.3378/027.081.0101 244:random effects model 195:Poisson distribution 164:. When the observed 43:improve this article 537:Poulin, R. (2006). 385:Index of dispersion 347:biological sciences 322:study heterogeneity 290:normal distribution 284:Normal distribution 275:logistic regression 239:beta-binomial model 305:statistical survey 303:For example, in a 237:In this case, the 215:gamma distribution 199:Poisson regression 356:In some areas of 248:beta distribution 135:statistical model 119: 118: 111: 93: 588: 581:Spatial analysis 555: 554: 534: 528: 527: 507: 501: 500: 482: 458: 452: 451: 411: 395:Quasi-likelihood 271:multilevel model 146:parametric model 114: 107: 103: 100: 94: 92: 58:"Overdispersion" 51: 27: 19: 596: 595: 591: 590: 589: 587: 586: 585: 576:Point processes 561: 560: 559: 558: 551: 535: 531: 524: 508: 504: 459: 455: 412: 408: 403: 381: 367:Furthermore in 343: 309:margin of error 286: 223: 191: 186: 174:underdispersion 154:population mean 115: 104: 98: 95: 52: 50: 40: 28: 17: 12: 11: 5: 594: 584: 583: 578: 573: 557: 556: 549: 529: 522: 502: 473:(1): 149–157. 453: 405: 404: 402: 399: 398: 397: 392: 387: 380: 377: 342: 339: 313:sampling error 285: 282: 267:logistic model 222: 219: 190: 187: 185: 182: 170:overdispersion 144:is choosing a 127:overdispersion 117: 116: 31: 29: 22: 15: 9: 6: 4: 3: 2: 593: 582: 579: 577: 574: 572: 569: 568: 566: 552: 550:9780691120850 546: 542: 541: 533: 525: 523:0-632-00142-9 519: 515: 514: 506: 498: 494: 490: 486: 481: 476: 472: 468: 464: 457: 449: 445: 441: 437: 433: 429: 425: 421: 420:Human Biology 417: 410: 406: 396: 393: 391: 388: 386: 383: 382: 376: 374: 370: 365: 363: 359: 354: 352: 348: 338: 335: 334:opinion polls 331: 327: 326:sampling bias 323: 318: 317:meta-analysis 314: 310: 306: 301: 299: 295: 291: 281: 278: 276: 272: 268: 264: 260: 255: 253: 249: 245: 240: 235: 232: 228: 218: 216: 212: 208: 207:mixture model 204: 200: 196: 181: 179: 178:heterogeneous 175: 171: 167: 163: 159: 155: 151: 147: 143: 138: 136: 132: 128: 124: 113: 110: 102: 91: 88: 84: 81: 77: 74: 70: 67: 63: 60: â€“  59: 55: 54:Find sources: 48: 44: 38: 37: 32:This article 30: 26: 21: 20: 539: 532: 512: 505: 470: 466: 456: 423: 419: 409: 366: 362:parasitology 355: 351:parasitology 344: 302: 287: 279: 256: 236: 224: 192: 173: 169: 139: 126: 120: 105: 99:January 2008 96: 86: 79: 72: 65: 53: 41:Please help 36:verification 33: 426:(1): 3–11. 158:sample mean 565:Categories 401:References 369:demography 203:count data 142:statistics 123:statistics 69:newspapers 440:1534-6617 259:Bernoulli 209:like the 497:22354905 489:12293397 448:19589015 379:See also 298:kurtosis 221:Binomial 184:Examples 166:variance 358:ecology 288:As the 265:into a 189:Poisson 162:moments 83:scholar 547:  520:  495:  487:  446:  438:  307:, the 85:  78:  71:  64:  56:  493:S2CID 349:. In 90:JSTOR 76:books 545:ISBN 518:ISBN 485:PMID 444:PMID 436:ISSN 294:skew 62:news 475:doi 428:doi 375:'. 150:fit 121:In 45:by 567:: 491:. 483:. 471:47 469:. 465:. 442:. 434:. 424:81 422:. 418:. 296:, 137:. 125:, 553:. 526:. 499:. 477:: 450:. 430:: 112:) 106:( 101:) 97:( 87:· 80:· 73:· 66:· 39:.

Index


verification
improve this article
adding citations to reliable sources
"Overdispersion"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
statistics
statistical dispersion
statistical model
statistics
parametric model
fit
population mean
sample mean
moments
variance
heterogeneous
Poisson distribution
Poisson regression
count data
mixture model
negative binomial distribution
gamma distribution
binomial distribution
Trivers–Willard hypothesis

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑