Knowledge

Point estimation

Source 📝

707: 718:. In the point estimate we try to choose a unique point in the parameter space which can reasonably be considered as the true value of the parameter. On the other hand, instead of unique estimate of the parameter, we are interested in constructing a family of sets that contain the true (unknown) parameter value with a specified probability. In many problems of statistical inference we are not interested only in estimating the parameter or testing some hypothesis concerning the parameter, we also want to get a lower or an upper bound or both, for the real-valued parameter. To do this, we need to construct a confidence interval. 863: 471:) denote a random sample with joint p.d.f or p.m.f. f(x, θ) (θ may be a vector). The function f(x, θ), considered as a function of θ, is called the likelihood function. In this case, it is denoted by L(θ). The principle of maximum likelihood consists of choosing an estimate within the admissible range of θ, that maximizes the likelihood. This estimator is called the maximum likelihood estimate (MLE) of θ. In order to obtain the MLE of θ, we use the equation 851:) which we expect would include the true value of γ(θ). So this type of estimation is called confidence interval estimation. This estimation provides a range of values which the parameter is expected to lie. It generally gives more information than point estimates and are preferred when making inferences. In some way, we can say that point estimation is the opposite of interval estimation. 164:, the estimator T is called an unbiased estimator for the parameter θ if E = θ, irrespective of the value of θ. For example, from the same random sample we have E(x̄) = μ (mean) and E(s) = σ (variance), then x̄ and s would be unbiased estimators for μ and σ. The difference E − θ is called the bias of T ; if this difference is nonzero, then T is called biased. 456:, due to R.A. Fisher, is the most important general method of estimation. This estimator method attempts to acquire unknown parameters that maximize the likelihood function. It uses a known model (ex. the normal distribution) and uses the values of parameters in the model that maximize a likelihood function to find the most suitable match for the data. 501:, which uses all the known facts about a population and apply those facts to a sample of the population by deriving equations that relate the population moments to the unknown parameters. We can then solve with the sample mean of the population moments. However, due to the simplicity, this method is not always accurate and can be biased easily. 279:
the statistician would like to condense the data by computing some statistics and to base their analysis on these statistics so that there is no loss of relevant information in doing so, that is the statistician would like to choose those statistics which exhaust all information about the parameter, which is contained in the sample. We define
443:
Below are some commonly used methods of estimating unknown parameters which are expected to provide estimators having some of these important properties. In general, depending on the situation and the purpose of our study we apply any one of the methods that may be suitable among the methods of point
278:
In statistics, the job of a statistician is to interpret the data that they have collected and to draw statistically valid conclusion about the population under investigation. But in many cases the raw data, which are too numerous and too costly to store, are not suitable for this purpose. Therefore,
172:
Consistency is about whether the point estimate stays close to the value when the parameter increases its size. The larger the sample size, the more accurate the estimate is. If a point estimator is consistent, its expected value and variance should be close to the true value of the parameter. An
368:
The MAP estimator has good asymptotic properties, even for many difficult problems, on which the maximum-likelihood estimator has difficulties. For regular problems, where the maximum-likelihood estimator is consistent, the maximum-likelihood estimator ultimately agrees with the MAP estimator.
698:, also known as the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. 239:. We can also say that the most efficient estimators are the ones with the least variability of outcomes. Therefore, if the estimator has smallest variance among sample to sample, it is both most efficient and unbiased. We extend the notion of efficiency by saying that estimator T 560:), r = 1, 2, …, k. In the method of moments, we equate k sample moments with the corresponding population moments. Generally, the first k moments are taken because the errors due to sampling increase with the order of the moment. Thus, we get k equations μ 812:. In general, with a normally-distributed sample mean, Ẋ, and with a known value for the standard deviation, σ, a 100(1-α)% confidence interval for the true μ is formed by taking Ẋ ± e, with e = z 134:. However, a biased estimator with a small variance may be more useful than an unbiased estimator with a large variance. Most importantly, we prefer point estimators that have the smallest 118:” is defined as the difference between the expected value of the estimator and the true value of the population parameter being estimated. It can also be described that the closer the 122:
of a parameter is to the measured parameter, the lesser the bias. When the estimated number and the true value is equal, the estimator is considered unbiased. This is called an
295:) be a random sample. A statistic T(X) is said to be sufficient for θ (or for the family of distribution) if the conditional distribution of X given T is free from θ. 979:
Statistical data analysis based on the L1-norm and related methods: Papers from the First International Conference held at Neuchâtel, August 31–September 4, 1987
603:
In the method of least square, we consider the estimation of parameters using some specified form of the expectation and second moment of the observations. For
364:), which finds a maximum of the posterior distribution; for a uniform prior probability, the MAP estimator coincides with the maximum-likelihood estimator; 724:
describes how reliable an estimate is. We can calculate the upper and lower confidence limits of the intervals from the observed data. Suppose a dataset x
648:(BLUE). Again, if we assume that the least square estimates are independently and identically normally distributed, then a linear estimator will be 1100: 820:
is the 100(1-α/2)% cumulative value of the standard normal curve, and n is the number of data values in that column. For example, z
497:
was introduced by K. Pearson and P. Chebyshev in 1887, and it is one of the oldest methods of estimation. This method is based on
262:
Generally, we must consider the distribution of the population when determining the efficiency of estimators. For example, in a
835:
and it is claimed with a certain degree of confidence (measured in probabilistic terms) that the true value of γ lies between l
1012: 665: 649: 347: 1147: 1079: 1044: 695: 645: 494: 485:=0, i = 1, 2, …, k. If θ is a vector, then partial derivatives are considered to get the likelihood equations. 592: 453: 266:, the mean is considered more efficient than the median, but the same does not apply in asymmetrical, or 740:. Let θ be the parameter of interest, and γ a number between 0 and 1. If there exist sample statistics L 1249: 1004: 653: 706: 684: 644:) is a linear function of the parameters and the x-values are known, least square estimators will be 340:; in Bayesian estimation, the risk is defined in terms of the posterior distribution, as observed by 334: 982: 901: 580:, r = 1, 2, …, k. Solving these equations we get the method of moment estimators (or estimates) as 431: 419: 94: 81:. More generally, a point estimator can be contrasted with a set estimator. Examples are given by 886: 90: 540:. Further, let the first k population moments about zero exist as explicit function of θ, i.e. μ 324: 891: 626:), i = 1, 2,…n, we may use the method of least squares. This method consists of minimizing the 389: 381: 377: 309: 89:
A point estimator can also be contrasted with a distribution estimator. Examples are given by
996: 881: 876: 70: 906: 498: 370: 357: 280: 115: 350:, which minimizes the posterior risk for the absolute-value loss function, as observed by 173:
unbiased estimator is consistent if the limit of the variance of estimator T equals zero.
8: 896: 805: 721: 715: 313: 267: 263: 98: 66: 62: 1117: 1095: 1028: 868: 385: 248: 135: 78: 74: 28: 1143: 1135: 1075: 1040: 1036: 1008: 862: 86: 82: 1206:
Experimental Design – With Applications in Management, Engineering, and the Sciences
1113: 1109: 809: 415: 317: 39: 46:) which is to serve as a "best guess" or "best estimate" of an unknown population 1071: 426: 51: 43: 1226:. Vol. I (Second (updated printing 2007) ed.). Pearson Prentice-Hall. 119: 516:) be a random sample from a population having p.d.f. (or p.m.f) f(x,θ), θ = (θ 1243: 1063: 974: 673: 669: 408: 403: 337: 328: 396: 1121: 20: 1208:. Springer: Paul D. Berger, Robert E. Maurer, Giovana B. Celli. 2019. 55: 47: 701: 131: 659: 1230: 351: 1221: 827:
Here two limits are computed from the set of observations, say l
1233:
Statistical Decision Theory: Estimation, Testing, and Selection
937:. F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester. 2005. 652:(MVUE) for the entire class of unbiased estimators. See also 341: 960:. Pradip Kumar Sahu, Santi Ranjan Pal, Ajit Kumar Das. 2015. 714:
There are two major types of estimates: point estimate and
31: 690: 687:
minimizes the risk of the absolute-error loss function.
1098:(1982). "An inconsistent maximum likelihood estimate". 732:
is given, modeled as realization of random variables X
1194:. New York: John Wiley & Sons: Theil Henri. 1971. 447: 438: 858: 710:
Point estimation and confidence interval estimation.
935:
A Modern Introduction to Probability and Statistics
54:). More formally, it is the application of a point 1224:Mathematical Statistics: Basic and Selected Topics 195:be two unbiased estimators for the same parameter 1231:Liese, Friedrich & Miescke, Klaus-J. (2008). 1166:. John Wiley and Sons, New York: Agresti A. 1990. 1068:Asymptotic Methods in Statistical Decision Theory 247:(for the same parameter of interest), if the MSE( 1241: 1222:Bickel, Peter J. & Doksum, Kjell A. (2001). 702:Point estimate v.s. confidence interval estimate 528:). The objective is to estimate the parameters θ 104: 1101:Journal of the American Statistical Association 660:Minimum-variance mean-unbiased estimator (MVUE) 422:have close connections with Bayesian analysis: 316:are the posterior distribution's statistics of 65:: such interval estimates are typically either 1192:Best Linear Unbiased Estimation and Prediction 1134: 1128: 308:Bayesian inference is typically based on the 679: 303: 298: 152:) be an estimator based on a random sample X 488: 16:Parameter estimation via sample statistics 598: 1094: 1027: 1001:Probability Theory: The logic of science 705: 61:Point estimation can be contrasted with 58:to the data to obtain a point estimate. 34:to calculate a single value (known as a 1088: 1058: 1056: 969: 967: 606:fitting a curve of the form y = f( x, β 384:) point estimator is based in Bayesian 1242: 1178:The Concise Encyclopedia of Statistics 1062: 995: 388:and is not so directly related to the 973: 958:Estimation and Inferential Statistics 691:Best linear unbiased estimator (BLUE) 672:(expected loss) of the squared-error 1053: 964: 952: 950: 948: 946: 944: 929: 927: 925: 923: 921: 772:) = γ for every value of θ, then (l 666:minimum-variance unbiased estimator 650:minimum-variance unbiased estimator 320:, e.g., its mean, median, or mode: 13: 1215: 808:for θ. The number γ is called the 448:Method of maximum likelihood (MLE) 439:Methods of finding point estimates 327:, which minimizes the (posterior) 243:is more efficient than estimator T 14: 1261: 941: 918: 861: 824:equals 1.96 for 95% confidence. 235:), irrespective of the value of 1198: 1184: 1170: 1033:A Course in Large Sample Theory 1156: 1114:10.1080/01621459.1982.10477894 1021: 989: 696:Best linear unbiased estimator 646:best linear unbiased estimator 273: 167: 1: 912: 593:generalized method of moments 176: 109: 105:Properties of point estimates 843:. Thus we get an interval (l 716:confidence interval estimate 454:method of maximum likelihood 255:is smaller than the MSE of T 126:The estimator will become a 7: 1180:. Springer: Dodge, Y. 2008. 854: 10: 1266: 1142:(2nd ed.). Springer. 1140:Theory of Point Estimation 1005:Cambridge University Press 654:minimum mean squared error 1164:Categorical Data Analysis 685:Median-unbiased estimator 680:Median unbiased estimator 314:Bayesian point estimators 304:Bayesian point estimation 299:Types of point estimation 1096:Ferguson, Thomas S. 983:North-Holland Publishing 902:Philosophy of statistics 432:Markov chain Monte Carlo 420:computational statistics 369:Bayesian estimators are 91:confidence distributions 887:Confidence distribution 489:Method of moments (MOM) 128:best unbiased estimator 1138:; Casella, G. (1998). 1003:(5. print. ed.). 892:Induction (philosophy) 711: 599:Method of least square 390:posterior distribution 378:Minimum Message Length 333:(expected loss) for a 310:posterior distribution 283:as follows: Let X =( X 38:since it identifies a 882:Binomial distribution 877:Algorithmic inference 804:), is called a 100γ% 709: 373:, by Wald's theorem. 281:sufficient statistics 95:randomized estimators 71:frequentist inference 907:Predictive inference 499:law of large numbers 358:maximum a posteriori 67:confidence intervals 27:involves the use of 1029:Ferguson, Thomas S. 897:Interval estimation 806:confidence interval 722:Confidence interval 264:normal distribution 136:mean square errors. 124:unbiased estimator. 99:Bayesian posteriors 63:interval estimation 1037:Chapman & Hall 869:Mathematics portal 712: 386:information theory 130:if it has minimum 79:Bayesian inference 75:credible intervals 50:(for example, the 1250:Estimation theory 1014:978-0-521-59271-0 495:method of moments 395:Special cases of 270:, distributions. 249:mean square error 140:If we let T = h(X 77:, in the case of 69:, in the case of 1257: 1236: 1227: 1210: 1209: 1202: 1196: 1195: 1188: 1182: 1181: 1174: 1168: 1167: 1160: 1154: 1153: 1132: 1126: 1125: 1108:(380): 831–834. 1092: 1086: 1085: 1060: 1051: 1050: 1025: 1019: 1018: 993: 987: 986: 971: 962: 961: 954: 939: 938: 931: 871: 866: 865: 810:confidence level 629:sum of squares. 618:) to the data (x 397:Bayesian filters 348:Posterior median 318:central tendency 206:would be called 199:. The estimator 25:point estimation 1265: 1264: 1260: 1259: 1258: 1256: 1255: 1254: 1240: 1239: 1218: 1216:Further reading 1213: 1204: 1203: 1199: 1190: 1189: 1185: 1176: 1175: 1171: 1162: 1161: 1157: 1150: 1133: 1129: 1093: 1089: 1082: 1072:Springer-Verlag 1061: 1054: 1047: 1026: 1022: 1015: 1007:. p. 172. 994: 990: 972: 965: 956: 955: 942: 933: 932: 919: 915: 867: 860: 857: 850: 846: 842: 838: 834: 830: 823: 819: 815: 803: 799: 795: 791: 787: 783: 779: 775: 771: 767: 764:) such that P(L 763: 759: 755: 751: 747: 743: 739: 735: 731: 727: 704: 693: 682: 662: 643: 639: 635: 625: 621: 617: 613: 609: 601: 590: 586: 579: 575: 571: 567: 563: 559: 555: 551: 547: 543: 539: 535: 531: 527: 523: 519: 515: 511: 507: 491: 484: 470: 466: 462: 450: 441: 427:particle filter 399:are important: 306: 301: 294: 290: 286: 276: 258: 254: 246: 242: 234: 223: 216: 210:than estimator 205: 194: 187: 179: 170: 163: 159: 155: 151: 147: 143: 112: 107: 83:confidence sets 52:population mean 44:parameter space 17: 12: 11: 5: 1263: 1253: 1252: 1238: 1237: 1228: 1217: 1214: 1212: 1211: 1197: 1183: 1169: 1155: 1148: 1136:Lehmann, E. L. 1127: 1087: 1080: 1064:Le Cam, Lucien 1052: 1045: 1020: 1013: 988: 977:, ed. (1987). 975:Dodge, Yadolah 963: 940: 916: 914: 911: 910: 909: 904: 899: 894: 889: 884: 879: 873: 872: 856: 853: 848: 844: 840: 836: 832: 828: 821: 817: 816:(σ/n), where z 813: 801: 797: 793: 789: 785: 781: 777: 773: 769: 765: 761: 757: 753: 749: 745: 741: 737: 733: 729: 725: 703: 700: 692: 689: 681: 678: 668:minimizes the 664:The method of 661: 658: 641: 637: 633: 623: 619: 615: 611: 607: 600: 597: 588: 584: 577: 573: 569: 565: 561: 557: 553: 549: 545: 541: 537: 533: 529: 525: 521: 517: 513: 509: 505: 490: 487: 482: 468: 464: 460: 449: 446: 440: 437: 436: 435: 429: 412: 411: 406: 366: 365: 355: 345: 325:Posterior mean 305: 302: 300: 297: 292: 288: 284: 275: 272: 256: 252: 244: 240: 232: 221: 214: 208:more efficient 203: 192: 185: 178: 175: 169: 166: 161: 157: 153: 149: 145: 141: 120:expected value 111: 108: 106: 103: 87:credible sets. 36:point estimate 15: 9: 6: 4: 3: 2: 1262: 1251: 1248: 1247: 1245: 1234: 1229: 1225: 1220: 1219: 1207: 1201: 1193: 1187: 1179: 1173: 1165: 1159: 1151: 1149:0-387-98502-6 1145: 1141: 1137: 1131: 1123: 1119: 1115: 1111: 1107: 1103: 1102: 1097: 1091: 1083: 1081:0-387-96307-3 1077: 1073: 1069: 1065: 1059: 1057: 1048: 1046:0-412-04371-8 1042: 1038: 1034: 1030: 1024: 1016: 1010: 1006: 1002: 998: 997:Jaynes, E. T. 992: 984: 980: 976: 970: 968: 959: 953: 951: 949: 947: 945: 936: 930: 928: 926: 924: 922: 917: 908: 905: 903: 900: 898: 895: 893: 890: 888: 885: 883: 880: 878: 875: 874: 870: 864: 859: 852: 825: 811: 807: 768:< θ < U 723: 719: 717: 708: 699: 697: 688: 686: 677: 675: 674:loss-function 671: 667: 657: 655: 651: 647: 630: 627: 604: 596: 594: 581: 502: 500: 496: 486: 480: 476: 472: 457: 455: 445: 433: 430: 428: 425: 424: 423: 421: 417: 410: 409:Wiener filter 407: 405: 404:Kalman filter 402: 401: 400: 398: 393: 391: 387: 383: 379: 374: 372: 363: 359: 356: 353: 349: 346: 343: 339: 338:loss function 336: 335:squared-error 332: 331: 326: 323: 322: 321: 319: 315: 311: 296: 282: 271: 269: 265: 260: 250: 238: 231: 227: 220: 213: 209: 202: 198: 191: 184: 174: 165: 138: 137: 133: 129: 125: 121: 117: 102: 100: 96: 92: 88: 84: 80: 76: 72: 68: 64: 59: 57: 53: 49: 45: 41: 37: 33: 30: 26: 22: 1232: 1223: 1205: 1200: 1191: 1186: 1177: 1172: 1163: 1158: 1139: 1130: 1105: 1099: 1090: 1067: 1032: 1023: 1000: 991: 978: 957: 934: 826: 720: 713: 694: 683: 663: 631: 628: 605: 602: 582: 503: 492: 478: 474: 473: 458: 451: 444:estimation. 442: 413: 394: 375: 367: 361: 329: 307: 277: 261: 236: 229: 225: 218: 211: 207: 200: 196: 189: 182: 180: 171: 139: 127: 123: 113: 60: 35: 24: 18: 1235:. Springer. 800:, . . . , x 788:, . . . , x 760:, . . . , X 748:, . . . , X 736:, . . . , X 728:, . . . , x 632:When f(x, β 591:. See also 274:Sufficiency 168:Consistency 160:, . . . , X 148:, . . . , X 913:References 780:), where l 459:Let X = (X 371:admissible 177:Efficiency 110:Biasedness 21:statistics 56:estimator 48:parameter 1244:Category 1066:(1986). 1031:(1996). 999:(2007). 855:See also 656:(MMSE). 640:, ,,,, β 614:, ,,,, β 587:= 1/n ΣX 536:, ..., θ 467:, ... ,X 414:Several 291:, ... ,X 132:variance 42:in some 1122:2287314 792:) and u 752:) and U 416:methods 352:Laplace 312:. Many 217:if Var( 1146:  1120:  1078:  1043:  1011:  524:, …, θ 504:Let (X 434:(MCMC) 268:skewed 251:) of T 97:, and 29:sample 1118:JSTOR 847:and u 839:and u 831:and u 822:1-α/2 818:1-α/2 814:1-α/2 796:= h(x 784:= g(x 756:= h(X 744:= g(X 576:) = m 572:,…, θ 556:,…, θ 477:L(θ)/ 342:Gauss 73:, or 40:point 1144:ISBN 1076:ISBN 1041:ISBN 1009:ISBN 670:risk 493:The 475:dlog 452:The 376:The 330:risk 228:Var( 226:< 188:and 181:Let 116:Bias 32:data 1110:doi 776:, u 636:, β 622:, y 610:, β 568:, θ 552:, θ 544:= μ 532:, θ 520:, θ 512:,…X 508:, X 463:, X 418:of 382:MML 362:MAP 287:, X 85:or 19:In 1246:: 1116:. 1106:77 1104:. 1074:. 1070:. 1055:^ 1039:. 1035:. 981:. 966:^ 943:^ 920:^ 676:. 595:. 564:(θ 548:(θ 392:. 259:. 224:) 156:,X 144:,X 101:. 93:, 23:, 1152:. 1124:. 1112:: 1084:. 1049:. 1017:. 985:. 849:n 845:n 841:n 837:n 833:n 829:n 802:n 798:1 794:n 790:n 786:1 782:n 778:n 774:n 770:n 766:n 762:n 758:1 754:n 750:n 746:1 742:n 738:n 734:1 730:n 726:1 642:p 638:1 634:0 624:i 620:i 616:p 612:1 608:0 589:i 585:r 583:m 578:r 574:k 570:2 566:1 562:r 558:k 554:2 550:1 546:r 542:r 538:k 534:2 530:1 526:k 522:2 518:1 514:n 510:2 506:1 483:i 481:θ 479:d 469:n 465:2 461:1 380:( 360:( 354:. 344:. 293:n 289:2 285:1 257:1 253:2 245:1 241:2 237:θ 233:1 230:T 222:2 219:T 215:1 212:T 204:2 201:T 197:θ 193:2 190:T 186:1 183:T 162:n 158:2 154:1 150:n 146:2 142:1 114:“

Index

statistics
sample
data
point
parameter space
parameter
population mean
estimator
interval estimation
confidence intervals
frequentist inference
credible intervals
Bayesian inference
confidence sets
credible sets.
confidence distributions
randomized estimators
Bayesian posteriors
Bias
expected value
variance
mean square errors.
mean square error
normal distribution
skewed
sufficient statistics
posterior distribution
Bayesian point estimators
central tendency
Posterior mean

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.