Knowledge

Empirical risk minimization

Source đź“ť

2632: 2662:
Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic
3025:. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can provide on the error for a given sample size for all distributions. 3405:
Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or
2065: 3005: 2610: 2239: 2456: 3396:
In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former.
3406:
characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.
2869: 1942: 1930: 967:; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the true "risk") because we do not know the true distribution of the data, but we can instead 1513: 1223: 3219: 2816: 3295: 2520: 3223:
It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers
1709: 3052: 2862: 2770: 2512: 2119: 971:
and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk".
2488: 1738: 1289: 1256: 1056: 3099: 2726: 2699: 1325: 3164: 3391: 2318: 1639: 1552: 1366: 1112: 1086: 866: 3248: 3131: 2095: 2325: 2283: 2148: 1795: 904: 2156: 3317: 3072: 2836: 2746: 1758: 1659: 1593: 1572: 1426: 1406: 1386: 1136: 1021: 1001: 2667:. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the 3319:. This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. 861: 2346: 2321: 851: 692: 3021:
It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the
899: 1554:. The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because 856: 707: 3567: 3169: 2060:{\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} 2674:
For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier,
438: 939: 742: 2336:, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the 3000:{\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{*})\right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} 3255: 818: 3443: 1807: 367: 3611: 3528: 3493:
Devroye, L., Gyorfi, L. & Lugosi, G. A Probabilistic Theory of Pattern Recognition. Discrete Appl Math 73, 192–194 (1997)
3469: 1431: 1141: 876: 639: 174: 894: 727: 702: 651: 2778: 2668: 775: 770: 423: 433: 71: 2643: 2615:
Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above
963:
based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the
1331: 932: 828: 592: 413: 3420: 3013:, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. 803: 505: 281: 3010: 2605:{\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} 760: 697: 607: 585: 428: 418: 956: 911: 823: 269: 91: 2461:
The empirical risk minimization principle states that the learning algorithm should choose a hypothesis
871: 798: 548: 443: 231: 164: 124: 3630: 2616: 1670: 925: 531: 299: 169: 3339:. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is 3031: 2841: 2751: 2493: 2100: 1981: 2664: 1602: 553: 473: 396: 314: 144: 106: 101: 61: 56: 3568:"Mathematics of Machine Learning Lecture 9 Notes | Mathematics of Machine Learning | Mathematics" 500: 349: 249: 76: 2464: 1714: 1261: 1228: 1026: 3355: 3077: 2704: 2677: 680: 656: 558: 319: 294: 254: 66: 1294: 3136: 3022: 2329: 968: 634: 456: 408: 264: 179: 51: 3361: 2288: 1607: 1522: 1336: 1091: 1065: 3504: 3226: 3104: 2234:{\displaystyle h^{*}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} 2073: 964: 563: 513: 2259: 2124: 1771: 8: 3347: 980: 666: 602: 573: 478: 304: 237: 223: 209: 184: 134: 86: 46: 2248:
is defined to be the classifier minimizing the risk defined with the 0–1 loss function.
3340: 3302: 3057: 2821: 2731: 1743: 1644: 1578: 1557: 1411: 1391: 1371: 1121: 1006: 986: 644: 568: 354: 149: 3393:(and thus stop being agnostic learning algorithms to which the above result applies). 3607: 3534: 3524: 3465: 3336: 3328: 2337: 1936: 1765: 737: 580: 493: 289: 259: 204: 199: 154: 96: 3346:
In practice, machine learning algorithms cope with this issue either by employing a
3516: 2451:{\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} 2245: 960: 765: 518: 468: 378: 362: 332: 194: 189: 139: 129: 27: 3603: 3595: 2773: 1598: 793: 597: 463: 403: 3358:), which is easier to optimize, or by imposing assumptions on the distribution 1798: 813: 344: 81: 3520: 3624: 3538: 1665: 732: 661: 543: 274: 159: 3009:
Similar results hold for regression tasks. These results are often based on
1761: 3460:
Györfi, László; Kohler, Michael; Krzyzak, Adam; Walk, Harro (2010-12-01).
3415: 538: 32: 2631: 3351: 2320:
is unknown to the learning algorithm (this situation is referred to as
687: 383: 309: 3464:(Softcover reprint of the original 1st ed.). New York: Springer. 3250:
converging to zero, it is possible to find a distribution such that:
846: 627: 3552: 2070:
The ultimate goal of a learning algorithm is to find a hypothesis
3332: 622: 3335:
problem even for a relatively simple class of functions such as
3327:
Empirical risk minimization for a classification problem with a
1516: 373: 2490:
which minimizes the empirical risk over the hypothesis class
1664:
It is also assumed that there is a non-negative real-valued
617: 612: 339: 3551:
V. Feldman, V. Guruswami, P. Raghavendra and Yi Wu (2009).
2053: 1925:{\displaystyle R(h)=\mathbf {E} =\int L(h(x),y)\,dP(x,y).} 3166:(meaning that perfect prediction is possible) such that: 905:
List of datasets in computer vision and image processing
3459: 1760:. For classification tasks these loss functions can be 3462:
A Distribution-Free Theory of Nonparametric Regression
2025: 1990: 3554:
Agnostic Learning of Monomials by Halfspaces is Hard.
3400: 3364: 3305: 3258: 3229: 3172: 3139: 3107: 3080: 3060: 3034: 2872: 2844: 2824: 2781: 2754: 2734: 2707: 2680: 2523: 2496: 2467: 2349: 2291: 2262: 2159: 2127: 2103: 2076: 1945: 1810: 1774: 1746: 1717: 1673: 1647: 1610: 1581: 1560: 1525: 1508:{\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} 1434: 1414: 1394: 1374: 1339: 1297: 1264: 1231: 1218:{\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} 1144: 1124: 1094: 1068: 1029: 1009: 989: 979:
The following situation is a general setting of many
3503:
Devroye, Luc; Györfi, László; Lugosi, Gábor (1996).
3445:
Principles of Risk Minimization for Learning Theory.
3214:{\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} 3502: 2701:being much worse than the best possible classifier 1291:is the corresponding response that is desired from 3385: 3311: 3289: 3242: 3213: 3158: 3125: 3093: 3066: 3046: 2999: 2856: 2830: 2810: 2764: 2740: 2720: 2693: 2604: 2506: 2482: 2450: 2312: 2277: 2233: 2142: 2113: 2089: 2059: 1924: 1789: 1752: 1732: 1703: 1653: 1633: 1587: 1566: 1546: 1507: 1420: 1400: 1380: 1360: 1330:To put it more formally, assuming that there is a 1319: 1283: 1250: 1217: 1130: 1106: 1080: 1050: 1015: 995: 2350: 3622: 2811:{\displaystyle {\mathcal {S}}({\mathcal {C}},n)} 3505:"A Probabilistic Theory of Pattern Recognition" 1935:A loss function commonly used in theory is the 2251: 900:List of datasets for machine-learning research 933: 3509:Stochastic Modelling and Applied Probability 2994: 2967: 2285:cannot be computed because the distribution 1711:which measures how different the prediction 3322: 3290:{\displaystyle \mathbb {E} L_{n}\geq a_{i}} 2663:properties of learning algorithms, such as 983:problems. There are two spaces of objects 940: 926: 3600:The Nature of Statistical Learning Theory 3260: 3174: 2874: 2579: 2551: 2213: 2185: 1897: 1740:of a hypothesis is from the true outcome 3016: 2328:training data points, we can compute an 1408:, and that the training set consists of 16:Principle in statistical learning theory 3623: 3602:. Information Science and Statistics. 3594: 3557:(See the paper and references therein) 3489: 3487: 3485: 3483: 3481: 3455: 3453: 2626: 3478: 1574:is not a deterministic function of 1023:and would like to learn a function 895:Glossary of artificial intelligence 13: 3588: 3401:Tilted empirical risk minimization 2947: 2934: 2794: 2784: 2757: 2748:defined over the hypothesis class 2571: 2558: 2555: 2552: 2548: 2545: 2542: 2499: 2205: 2192: 2189: 2186: 2182: 2179: 2176: 2106: 14: 3642: 3450: 3101:, there exists a distribution of 2244:For classification problems, the 2097:among a fixed class of functions 2630: 1827: 3350:to the 0–1 loss function (like 2031: 1996: 1704:{\displaystyle L({\hat {y}},y)} 3560: 3545: 3496: 3436: 3380: 3368: 3120: 3108: 3047:{\displaystyle \epsilon >0} 2958: 2942: 2921: 2908: 2899: 2886: 2857:{\displaystyle \epsilon >0} 2805: 2789: 2765:{\displaystyle {\mathcal {C}}} 2596: 2590: 2530: 2507:{\displaystyle {\mathcal {H}}} 2474: 2442: 2426: 2413: 2407: 2367: 2361: 2324:). However, given a sample of 2307: 2295: 2272: 2266: 2224: 2218: 2137: 2131: 2114:{\displaystyle {\mathcal {H}}} 2038: 2003: 1970: 1958: 1949: 1916: 1904: 1894: 1885: 1879: 1873: 1861: 1858: 1849: 1843: 1837: 1831: 1820: 1814: 1784: 1778: 1724: 1698: 1686: 1677: 1628: 1621: 1614: 1541: 1529: 1502: 1476: 1464: 1438: 1355: 1343: 1332:joint probability distribution 1314: 1301: 1212: 1186: 1174: 1148: 1042: 315:Relevance vector machine (RVM) 1: 3429: 3421:Maximum likelihood estimation 3011:uniform laws of large numbers 2622: 974: 804:Computational learning theory 368:Expectation–maximization (EM) 761:Coefficient of determination 608:Convolutional neural network 320:Support vector machine (SVM) 7: 3409: 3054:and consider a sample size 2252:Empirical risk minimization 1768:associated with hypothesis 957:statistical learning theory 953:Empirical risk minimization 912:Outline of machine learning 809:Empirical risk minimization 10: 3647: 2483:{\displaystyle {\hat {h}}} 1733:{\displaystyle {\hat {y}}} 1284:{\displaystyle y_{i}\in Y} 1251:{\displaystyle x_{i}\in X} 1062:) which outputs an object 1051:{\displaystyle \ h:X\to Y} 959:which defines a family of 549:Feedforward neural network 300:Artificial neural networks 3521:10.1007/978-1-4612-0711-5 3094:{\displaystyle \phi _{n}} 2721:{\displaystyle \phi ^{*}} 2694:{\displaystyle \phi _{n}} 532:Artificial neural network 3323:Computational complexity 3074:and classification rule 2818:given a dataset of size 2671:of the function class. 1603:conditional distribution 1320:{\displaystyle h(x_{i})} 841:Journals and conferences 788:Mathematical foundations 698:Temporal difference (TD) 554:Recurrent neural network 474:Conditional random field 397:Dimensionality reduction 145:Dimensionality reduction 107:Quantum machine learning 102:Neuromorphic engineering 62:Self-supervised learning 57:Semi-supervised learning 3159:{\displaystyle L^{*}=0} 1797:is then defined as the 1114:. To do so, there is a 250:Apprenticeship learning 3387: 3386:{\displaystyle P(x,y)} 3313: 3291: 3244: 3215: 3160: 3127: 3095: 3068: 3048: 3001: 2858: 2832: 2812: 2766: 2742: 2722: 2695: 2606: 2508: 2484: 2452: 2403: 2314: 2313:{\displaystyle P(x,y)} 2279: 2235: 2144: 2115: 2091: 2061: 1926: 1801:of the loss function: 1791: 1754: 1734: 1705: 1655: 1635: 1634:{\displaystyle P(y|x)} 1589: 1568: 1548: 1547:{\displaystyle P(x,y)} 1509: 1422: 1402: 1382: 1362: 1361:{\displaystyle P(x,y)} 1321: 1285: 1252: 1219: 1132: 1108: 1107:{\displaystyle x\in X} 1082: 1081:{\displaystyle y\in Y} 1052: 1017: 997: 799:Bias–variance tradeoff 681:Reinforcement learning 657:Spiking neural network 67:Reinforcement learning 3388: 3314: 3292: 3245: 3243:{\displaystyle a_{i}} 3216: 3161: 3128: 3126:{\displaystyle (X,Y)} 3096: 3069: 3049: 3023:No free lunch theorem 3017:Impossibility results 3002: 2859: 2833: 2813: 2767: 2743: 2723: 2696: 2607: 2509: 2485: 2453: 2383: 2315: 2280: 2256:In general, the risk 2236: 2145: 2116: 2092: 2090:{\displaystyle h^{*}} 2062: 1927: 1792: 1755: 1735: 1706: 1656: 1636: 1590: 1569: 1549: 1510: 1423: 1403: 1383: 1363: 1322: 1286: 1253: 1220: 1133: 1109: 1083: 1053: 1018: 998: 635:Neural radiance field 457:Structured prediction 180:Structured prediction 52:Unsupervised learning 3362: 3348:convex approximation 3303: 3256: 3227: 3170: 3137: 3105: 3078: 3058: 3032: 2870: 2842: 2822: 2779: 2752: 2732: 2728:. Consider the risk 2705: 2678: 2521: 2494: 2465: 2347: 2289: 2278:{\displaystyle R(h)} 2260: 2157: 2143:{\displaystyle R(h)} 2125: 2101: 2074: 1943: 1808: 1790:{\displaystyle h(x)} 1772: 1744: 1715: 1671: 1645: 1608: 1579: 1558: 1523: 1432: 1412: 1392: 1372: 1337: 1295: 1262: 1229: 1142: 1122: 1092: 1066: 1027: 1007: 987: 965:law of large numbers 824:Statistical learning 722:Learning with humans 514:Local outlier factor 2121:for which the risk 981:supervised learning 961:learning algorithms 667:Electrochemical RAM 574:reservoir computing 305:Logistic regression 224:Supervised learning 210:Multimodal learning 185:Feature engineering 130:Generative modeling 92:Rule-based learning 87:Curriculum learning 47:Supervised learning 22:Part of a series on 3572:MIT OpenCourseWare 3442:V. Vapnik (1992). 3383: 3341:linearly separable 3337:linear classifiers 3331:is known to be an 3309: 3287: 3240: 3211: 3156: 3123: 3091: 3064: 3044: 3028:Specifically, Let 2997: 2854: 2838:. Then, for every 2828: 2808: 2762: 2738: 2718: 2691: 2642:. You can help by 2602: 2577: 2504: 2480: 2448: 2310: 2275: 2231: 2211: 2140: 2111: 2087: 2057: 2052: 2029: 1994: 1922: 1787: 1750: 1730: 1701: 1651: 1631: 1585: 1564: 1544: 1505: 1418: 1398: 1378: 1358: 1317: 1281: 1248: 1215: 1128: 1104: 1078: 1048: 1013: 993: 955:is a principle in 235: • 150:Density estimation 3613:978-0-387-98780-4 3530:978-1-4612-6877-2 3471:978-1-4419-2998-3 3329:0-1 loss function 3312:{\displaystyle n} 3067:{\displaystyle n} 2831:{\displaystyle n} 2741:{\displaystyle L} 2660: 2659: 2587: 2540: 2533: 2477: 2381: 2358: 2338:empirical measure 2322:agnostic learning 2174: 2041: 2028: 2006: 1993: 1961: 1937:0-1 loss function 1753:{\displaystyle y} 1727: 1689: 1654:{\displaystyle x} 1588:{\displaystyle x} 1567:{\displaystyle y} 1437: 1421:{\displaystyle n} 1401:{\displaystyle Y} 1381:{\displaystyle X} 1147: 1131:{\displaystyle n} 1032: 1016:{\displaystyle Y} 996:{\displaystyle X} 950: 949: 755:Model diagnostics 738:Human-in-the-loop 581:Boltzmann machine 494:Anomaly detection 290:Linear regression 205:Ontology learning 200:Grammar induction 175:Semantic analysis 170:Association rules 155:Anomaly detection 97:Neuro-symbolic AI 3638: 3631:Machine learning 3617: 3582: 3581: 3579: 3578: 3564: 3558: 3549: 3543: 3542: 3500: 3494: 3491: 3476: 3475: 3457: 3448: 3440: 3392: 3390: 3389: 3384: 3318: 3316: 3315: 3310: 3296: 3294: 3293: 3288: 3286: 3285: 3273: 3272: 3263: 3249: 3247: 3246: 3241: 3239: 3238: 3220: 3218: 3217: 3212: 3198: 3187: 3186: 3177: 3165: 3163: 3162: 3157: 3149: 3148: 3132: 3130: 3129: 3124: 3100: 3098: 3097: 3092: 3090: 3089: 3073: 3071: 3070: 3065: 3053: 3051: 3050: 3045: 3006: 3004: 3003: 2998: 2990: 2985: 2984: 2951: 2950: 2938: 2937: 2928: 2924: 2920: 2919: 2898: 2897: 2877: 2863: 2861: 2860: 2855: 2837: 2835: 2834: 2829: 2817: 2815: 2814: 2809: 2798: 2797: 2788: 2787: 2771: 2769: 2768: 2763: 2761: 2760: 2747: 2745: 2744: 2739: 2727: 2725: 2724: 2719: 2717: 2716: 2700: 2698: 2697: 2692: 2690: 2689: 2655: 2652: 2634: 2627: 2611: 2609: 2608: 2603: 2589: 2588: 2585: 2578: 2576: 2575: 2574: 2561: 2535: 2534: 2526: 2513: 2511: 2510: 2505: 2503: 2502: 2489: 2487: 2486: 2481: 2479: 2478: 2470: 2457: 2455: 2454: 2449: 2441: 2440: 2425: 2424: 2402: 2397: 2382: 2374: 2360: 2359: 2356: 2319: 2317: 2316: 2311: 2284: 2282: 2281: 2276: 2246:Bayes classifier 2240: 2238: 2237: 2232: 2227: 2212: 2210: 2209: 2208: 2195: 2169: 2168: 2149: 2147: 2146: 2141: 2120: 2118: 2117: 2112: 2110: 2109: 2096: 2094: 2093: 2088: 2086: 2085: 2066: 2064: 2063: 2058: 2056: 2055: 2043: 2042: 2034: 2030: 2026: 2008: 2007: 1999: 1995: 1991: 1963: 1962: 1954: 1931: 1929: 1928: 1923: 1830: 1796: 1794: 1793: 1788: 1759: 1757: 1756: 1751: 1739: 1737: 1736: 1731: 1729: 1728: 1720: 1710: 1708: 1707: 1702: 1691: 1690: 1682: 1660: 1658: 1657: 1652: 1640: 1638: 1637: 1632: 1624: 1596: 1594: 1592: 1591: 1586: 1573: 1571: 1570: 1565: 1553: 1551: 1550: 1545: 1514: 1512: 1511: 1506: 1501: 1500: 1488: 1487: 1463: 1462: 1450: 1449: 1435: 1427: 1425: 1424: 1419: 1407: 1405: 1404: 1399: 1387: 1385: 1384: 1379: 1367: 1365: 1364: 1359: 1326: 1324: 1323: 1318: 1313: 1312: 1290: 1288: 1287: 1282: 1274: 1273: 1258:is an input and 1257: 1255: 1254: 1249: 1241: 1240: 1224: 1222: 1221: 1216: 1211: 1210: 1198: 1197: 1173: 1172: 1160: 1159: 1145: 1137: 1135: 1134: 1129: 1113: 1111: 1110: 1105: 1087: 1085: 1084: 1079: 1057: 1055: 1054: 1049: 1030: 1022: 1020: 1019: 1014: 1002: 1000: 999: 994: 942: 935: 928: 889:Related articles 766:Confusion matrix 519:Isolation forest 464:Graphical models 243: 242: 195:Learning to rank 190:Feature learning 28:Machine learning 19: 18: 3646: 3645: 3641: 3640: 3639: 3637: 3636: 3635: 3621: 3620: 3614: 3604:Springer-Verlag 3591: 3589:Further reading 3586: 3585: 3576: 3574: 3566: 3565: 3561: 3550: 3546: 3531: 3501: 3497: 3492: 3479: 3472: 3458: 3451: 3441: 3437: 3432: 3426: 3412: 3403: 3363: 3360: 3359: 3325: 3304: 3301: 3300: 3281: 3277: 3268: 3264: 3259: 3257: 3254: 3253: 3234: 3230: 3228: 3225: 3224: 3194: 3182: 3178: 3173: 3171: 3168: 3167: 3144: 3140: 3138: 3135: 3134: 3106: 3103: 3102: 3085: 3081: 3079: 3076: 3075: 3059: 3056: 3055: 3033: 3030: 3029: 3019: 2986: 2980: 2976: 2946: 2945: 2933: 2932: 2915: 2911: 2893: 2889: 2882: 2878: 2873: 2871: 2868: 2867: 2843: 2840: 2839: 2823: 2820: 2819: 2793: 2792: 2783: 2782: 2780: 2777: 2776: 2774:growth function 2756: 2755: 2753: 2750: 2749: 2733: 2730: 2729: 2712: 2708: 2706: 2703: 2702: 2685: 2681: 2679: 2676: 2675: 2656: 2650: 2647: 2640:needs expansion 2625: 2584: 2580: 2570: 2569: 2562: 2541: 2539: 2525: 2524: 2522: 2519: 2518: 2498: 2497: 2495: 2492: 2491: 2469: 2468: 2466: 2463: 2462: 2436: 2432: 2420: 2416: 2398: 2387: 2373: 2355: 2351: 2348: 2345: 2344: 2290: 2287: 2286: 2261: 2258: 2257: 2254: 2214: 2204: 2203: 2196: 2175: 2173: 2164: 2160: 2158: 2155: 2154: 2126: 2123: 2122: 2105: 2104: 2102: 2099: 2098: 2081: 2077: 2075: 2072: 2071: 2051: 2050: 2033: 2032: 2024: 2022: 2016: 2015: 1998: 1997: 1989: 1987: 1977: 1976: 1953: 1952: 1944: 1941: 1940: 1826: 1809: 1806: 1805: 1773: 1770: 1769: 1745: 1742: 1741: 1719: 1718: 1716: 1713: 1712: 1681: 1680: 1672: 1669: 1668: 1646: 1643: 1642: 1620: 1609: 1606: 1605: 1599:random variable 1580: 1577: 1576: 1575: 1559: 1556: 1555: 1524: 1521: 1520: 1496: 1492: 1483: 1479: 1458: 1454: 1445: 1441: 1433: 1430: 1429: 1413: 1410: 1409: 1393: 1390: 1389: 1373: 1370: 1369: 1338: 1335: 1334: 1308: 1304: 1296: 1293: 1292: 1269: 1265: 1263: 1260: 1259: 1236: 1232: 1230: 1227: 1226: 1206: 1202: 1193: 1189: 1168: 1164: 1155: 1151: 1143: 1140: 1139: 1123: 1120: 1119: 1093: 1090: 1089: 1067: 1064: 1063: 1028: 1025: 1024: 1008: 1005: 1004: 988: 985: 984: 977: 946: 917: 916: 890: 882: 881: 842: 834: 833: 794:Kernel machines 789: 781: 780: 756: 748: 747: 728:Active learning 723: 715: 714: 683: 673: 672: 598:Diffusion model 534: 524: 523: 496: 486: 485: 459: 449: 448: 404:Factor analysis 399: 389: 388: 372: 335: 325: 324: 245: 244: 228: 227: 226: 215: 214: 120: 112: 111: 77:Online learning 42: 30: 17: 12: 11: 5: 3644: 3634: 3633: 3619: 3618: 3612: 3590: 3587: 3584: 3583: 3559: 3544: 3529: 3495: 3477: 3470: 3449: 3434: 3433: 3431: 3428: 3424: 3423: 3418: 3411: 3408: 3402: 3399: 3382: 3379: 3376: 3373: 3370: 3367: 3324: 3321: 3308: 3284: 3280: 3276: 3271: 3267: 3262: 3237: 3233: 3210: 3207: 3204: 3201: 3197: 3193: 3190: 3185: 3181: 3176: 3155: 3152: 3147: 3143: 3122: 3119: 3116: 3113: 3110: 3088: 3084: 3063: 3043: 3040: 3037: 3018: 3015: 2996: 2993: 2989: 2983: 2979: 2975: 2972: 2969: 2966: 2963: 2960: 2957: 2954: 2949: 2944: 2941: 2936: 2931: 2927: 2923: 2918: 2914: 2910: 2907: 2904: 2901: 2896: 2892: 2888: 2885: 2881: 2876: 2853: 2850: 2847: 2827: 2807: 2804: 2801: 2796: 2791: 2786: 2759: 2737: 2715: 2711: 2688: 2684: 2658: 2657: 2637: 2635: 2624: 2621: 2613: 2612: 2601: 2598: 2595: 2592: 2583: 2573: 2568: 2565: 2560: 2557: 2554: 2550: 2547: 2544: 2538: 2532: 2529: 2501: 2476: 2473: 2459: 2458: 2447: 2444: 2439: 2435: 2431: 2428: 2423: 2419: 2415: 2412: 2409: 2406: 2401: 2396: 2393: 2390: 2386: 2380: 2377: 2372: 2369: 2366: 2363: 2354: 2334:empirical risk 2309: 2306: 2303: 2300: 2297: 2294: 2274: 2271: 2268: 2265: 2253: 2250: 2242: 2241: 2230: 2226: 2223: 2220: 2217: 2207: 2202: 2199: 2194: 2191: 2188: 2184: 2181: 2178: 2172: 2167: 2163: 2139: 2136: 2133: 2130: 2108: 2084: 2080: 2054: 2049: 2046: 2040: 2037: 2027: if  2023: 2021: 2018: 2017: 2014: 2011: 2005: 2002: 1992: if  1988: 1986: 1983: 1982: 1980: 1975: 1972: 1969: 1966: 1960: 1957: 1951: 1948: 1933: 1932: 1921: 1918: 1915: 1912: 1909: 1906: 1903: 1900: 1896: 1893: 1890: 1887: 1884: 1881: 1878: 1875: 1872: 1869: 1866: 1863: 1860: 1857: 1854: 1851: 1848: 1845: 1842: 1839: 1836: 1833: 1829: 1825: 1822: 1819: 1816: 1813: 1786: 1783: 1780: 1777: 1749: 1726: 1723: 1700: 1697: 1694: 1688: 1685: 1679: 1676: 1650: 1630: 1627: 1623: 1619: 1616: 1613: 1584: 1563: 1543: 1540: 1537: 1534: 1531: 1528: 1504: 1499: 1495: 1491: 1486: 1482: 1478: 1475: 1472: 1469: 1466: 1461: 1457: 1453: 1448: 1444: 1440: 1417: 1397: 1377: 1357: 1354: 1351: 1348: 1345: 1342: 1316: 1311: 1307: 1303: 1300: 1280: 1277: 1272: 1268: 1247: 1244: 1239: 1235: 1214: 1209: 1205: 1201: 1196: 1192: 1188: 1185: 1182: 1179: 1176: 1171: 1167: 1163: 1158: 1154: 1150: 1127: 1103: 1100: 1097: 1077: 1074: 1071: 1058:(often called 1047: 1044: 1041: 1038: 1035: 1012: 992: 976: 973: 948: 947: 945: 944: 937: 930: 922: 919: 918: 915: 914: 909: 908: 907: 897: 891: 888: 887: 884: 883: 880: 879: 874: 869: 864: 859: 854: 849: 843: 840: 839: 836: 835: 832: 831: 826: 821: 816: 814:Occam learning 811: 806: 801: 796: 790: 787: 786: 783: 782: 779: 778: 773: 771:Learning curve 768: 763: 757: 754: 753: 750: 749: 746: 745: 740: 735: 730: 724: 721: 720: 717: 716: 713: 712: 711: 710: 700: 695: 690: 684: 679: 678: 675: 674: 671: 670: 664: 659: 654: 649: 648: 647: 637: 632: 631: 630: 625: 620: 615: 605: 600: 595: 590: 589: 588: 578: 577: 576: 571: 566: 561: 551: 546: 541: 535: 530: 529: 526: 525: 522: 521: 516: 511: 503: 497: 492: 491: 488: 487: 484: 483: 482: 481: 476: 471: 460: 455: 454: 451: 450: 447: 446: 441: 436: 431: 426: 421: 416: 411: 406: 400: 395: 394: 391: 390: 387: 386: 381: 376: 370: 365: 360: 352: 347: 342: 336: 331: 330: 327: 326: 323: 322: 317: 312: 307: 302: 297: 292: 287: 279: 278: 277: 272: 267: 257: 255:Decision trees 252: 246: 232:classification 222: 221: 220: 217: 216: 213: 212: 207: 202: 197: 192: 187: 182: 177: 172: 167: 162: 157: 152: 147: 142: 137: 132: 127: 125:Classification 121: 118: 117: 114: 113: 110: 109: 104: 99: 94: 89: 84: 82:Batch learning 79: 74: 69: 64: 59: 54: 49: 43: 40: 39: 36: 35: 24: 23: 15: 9: 6: 4: 3: 2: 3643: 3632: 3629: 3628: 3626: 3615: 3609: 3605: 3601: 3597: 3593: 3592: 3573: 3569: 3563: 3556: 3555: 3548: 3540: 3536: 3532: 3526: 3522: 3518: 3514: 3510: 3506: 3499: 3490: 3488: 3486: 3484: 3482: 3473: 3467: 3463: 3456: 3454: 3447: 3446: 3439: 3435: 3427: 3422: 3419: 3417: 3414: 3413: 3407: 3398: 3394: 3377: 3374: 3371: 3365: 3357: 3353: 3349: 3344: 3342: 3338: 3334: 3330: 3320: 3306: 3297: 3282: 3278: 3274: 3269: 3265: 3251: 3235: 3231: 3221: 3208: 3205: 3202: 3199: 3195: 3191: 3188: 3183: 3179: 3153: 3150: 3145: 3141: 3117: 3114: 3111: 3086: 3082: 3061: 3041: 3038: 3035: 3026: 3024: 3014: 3012: 3007: 2991: 2987: 2981: 2977: 2973: 2970: 2964: 2961: 2955: 2952: 2939: 2929: 2925: 2916: 2912: 2905: 2902: 2894: 2890: 2883: 2879: 2865: 2851: 2848: 2845: 2825: 2802: 2799: 2775: 2735: 2713: 2709: 2686: 2682: 2672: 2670: 2669:VC complexity 2666: 2654: 2651:February 2023 2645: 2641: 2638:This section 2636: 2633: 2629: 2628: 2620: 2618: 2599: 2593: 2581: 2566: 2563: 2536: 2527: 2517: 2516: 2515: 2471: 2445: 2437: 2433: 2429: 2421: 2417: 2410: 2404: 2399: 2394: 2391: 2388: 2384: 2378: 2375: 2370: 2364: 2352: 2343: 2342: 2341: 2339: 2335: 2332:, called the 2331: 2327: 2323: 2304: 2301: 2298: 2292: 2269: 2263: 2249: 2247: 2228: 2221: 2215: 2200: 2197: 2170: 2165: 2161: 2153: 2152: 2151: 2134: 2128: 2082: 2078: 2068: 2047: 2044: 2035: 2019: 2012: 2009: 2000: 1984: 1978: 1973: 1967: 1964: 1955: 1946: 1938: 1919: 1913: 1910: 1907: 1901: 1898: 1891: 1888: 1882: 1876: 1870: 1867: 1864: 1855: 1852: 1846: 1840: 1834: 1823: 1817: 1811: 1804: 1803: 1802: 1800: 1781: 1775: 1767: 1763: 1762:scoring rules 1747: 1721: 1695: 1692: 1683: 1674: 1667: 1666:loss function 1662: 1648: 1625: 1617: 1611: 1604: 1600: 1597:but rather a 1582: 1561: 1538: 1535: 1532: 1526: 1518: 1497: 1493: 1489: 1484: 1480: 1473: 1470: 1467: 1459: 1455: 1451: 1446: 1442: 1415: 1395: 1375: 1352: 1349: 1346: 1340: 1333: 1328: 1309: 1305: 1298: 1278: 1275: 1270: 1266: 1245: 1242: 1237: 1233: 1207: 1203: 1199: 1194: 1190: 1183: 1180: 1177: 1169: 1165: 1161: 1156: 1152: 1125: 1117: 1101: 1098: 1095: 1075: 1072: 1069: 1061: 1045: 1039: 1036: 1033: 1010: 990: 982: 972: 970: 966: 962: 958: 954: 943: 938: 936: 931: 929: 924: 923: 921: 920: 913: 910: 906: 903: 902: 901: 898: 896: 893: 892: 886: 885: 878: 875: 873: 870: 868: 865: 863: 860: 858: 855: 853: 850: 848: 845: 844: 838: 837: 830: 827: 825: 822: 820: 817: 815: 812: 810: 807: 805: 802: 800: 797: 795: 792: 791: 785: 784: 777: 774: 772: 769: 767: 764: 762: 759: 758: 752: 751: 744: 741: 739: 736: 734: 733:Crowdsourcing 731: 729: 726: 725: 719: 718: 709: 706: 705: 704: 701: 699: 696: 694: 691: 689: 686: 685: 682: 677: 676: 668: 665: 663: 662:Memtransistor 660: 658: 655: 653: 650: 646: 643: 642: 641: 638: 636: 633: 629: 626: 624: 621: 619: 616: 614: 611: 610: 609: 606: 604: 601: 599: 596: 594: 591: 587: 584: 583: 582: 579: 575: 572: 570: 567: 565: 562: 560: 557: 556: 555: 552: 550: 547: 545: 544:Deep learning 542: 540: 537: 536: 533: 528: 527: 520: 517: 515: 512: 510: 508: 504: 502: 499: 498: 495: 490: 489: 480: 479:Hidden Markov 477: 475: 472: 470: 467: 466: 465: 462: 461: 458: 453: 452: 445: 442: 440: 437: 435: 432: 430: 427: 425: 422: 420: 417: 415: 412: 410: 407: 405: 402: 401: 398: 393: 392: 385: 382: 380: 377: 375: 371: 369: 366: 364: 361: 359: 357: 353: 351: 348: 346: 343: 341: 338: 337: 334: 329: 328: 321: 318: 316: 313: 311: 308: 306: 303: 301: 298: 296: 293: 291: 288: 286: 284: 280: 276: 275:Random forest 273: 271: 268: 266: 263: 262: 261: 258: 256: 253: 251: 248: 247: 240: 239: 234: 233: 225: 219: 218: 211: 208: 206: 203: 201: 198: 196: 193: 191: 188: 186: 183: 181: 178: 176: 173: 171: 168: 166: 163: 161: 160:Data cleaning 158: 156: 153: 151: 148: 146: 143: 141: 138: 136: 133: 131: 128: 126: 123: 122: 116: 115: 108: 105: 103: 100: 98: 95: 93: 90: 88: 85: 83: 80: 78: 75: 73: 72:Meta-learning 70: 68: 65: 63: 60: 58: 55: 53: 50: 48: 45: 44: 38: 37: 34: 29: 26: 25: 21: 20: 3599: 3575:. Retrieved 3571: 3562: 3553: 3547: 3512: 3508: 3498: 3461: 3444: 3438: 3425: 3404: 3395: 3345: 3326: 3298: 3252: 3222: 3027: 3020: 3008: 2866: 2673: 2661: 2648: 2644:adding to it 2639: 2617:optimization 2614: 2460: 2333: 2255: 2243: 2150:is minimal: 2069: 1934: 1663: 1641:for a fixed 1329: 1116:training set 1115: 1059: 978: 952: 951: 819:PAC learning 808: 506: 355: 350:Hierarchical 282: 236: 230: 3416:M-estimator 2665:consistency 1799:expectation 703:Multi-agent 640:Transformer 539:Autoencoder 295:Naive Bayes 33:data mining 3596:Vapnik, V. 3577:2023-10-28 3430:References 3352:hinge loss 3133:with risk 2623:Properties 1428:instances 1060:hypothesis 975:Background 688:Q-learning 586:Restricted 384:Mean shift 333:Clustering 310:Perceptron 238:regression 140:Clustering 135:Regression 3539:0172-4568 3275:≥ 3206:ϵ 3203:− 3189:≥ 3146:∗ 3083:ϕ 3036:ϵ 2978:ϵ 2971:− 2965:⁡ 2930:≤ 2917:∗ 2913:ϕ 2903:− 2891:ϕ 2846:ϵ 2714:∗ 2710:ϕ 2683:ϕ 2619:problem. 2567:∈ 2531:^ 2475:^ 2385:∑ 2201:∈ 2166:∗ 2083:∗ 2039:^ 2010:≠ 2004:^ 1959:^ 1868:∫ 1725:^ 1687:^ 1471:… 1276:∈ 1243:∈ 1181:… 1138:examples 1099:∈ 1073:∈ 1043:→ 847:ECML PKDD 829:VC theory 776:ROC curve 708:Self-play 628:DeepDream 469:Bayes net 260:Ensembles 41:Paradigms 3625:Category 3598:(2000). 3410:See also 3299:for all 2330:estimate 1088:, given 969:estimate 270:Boosting 119:Problems 3333:NP-hard 852:NeurIPS 669:(ECRAM) 623:AlexNet 265:Bagging 3610:  3537:  3527:  3468:  1764:. The 1517:i.i.d. 1515:drawn 1436:  1225:where 1146:  1031:  645:Vision 501:RANSAC 379:OPTICS 374:DBSCAN 358:-means 165:AutoML 2772:with 1601:with 1519:from 1368:over 867:IJCAI 693:SARSA 652:Mamba 618:LeNet 613:U-Net 439:t-SNE 363:Fuzzy 340:BIRCH 3608:ISBN 3535:ISSN 3525:ISBN 3466:ISBN 3354:for 3039:> 2849:> 1766:risk 1388:and 1003:and 877:JMLR 862:ICLR 857:ICML 743:RLHF 559:LSTM 345:CURE 31:and 3517:doi 3356:SVM 2962:exp 2646:. 2586:emp 2357:emp 2326:iid 1118:of 603:SOM 593:GAN 569:ESN 564:GRU 509:-NN 444:SDL 434:PGD 429:PCA 424:NMF 419:LDA 414:ICA 409:CCA 285:-NN 3627:: 3606:. 3570:. 3533:. 3523:. 3515:. 3513:31 3511:. 3507:. 3480:^ 3452:^ 3343:. 2992:32 2864:: 2514:: 2340:: 2067:. 1939:: 1661:. 1327:. 872:ML 3616:. 3580:. 3541:. 3519:: 3474:. 3381:) 3378:y 3375:, 3372:x 3369:( 3366:P 3307:n 3283:i 3279:a 3270:n 3266:L 3261:E 3236:i 3232:a 3209:. 3200:2 3196:/ 3192:1 3184:n 3180:L 3175:E 3154:0 3151:= 3142:L 3121:) 3118:Y 3115:, 3112:X 3109:( 3087:n 3062:n 3042:0 2995:} 2988:/ 2982:2 2974:n 2968:{ 2959:) 2956:n 2953:, 2948:C 2943:( 2940:S 2935:8 2926:) 2922:) 2909:( 2906:L 2900:) 2895:n 2887:( 2884:L 2880:( 2875:P 2852:0 2826:n 2806:) 2803:n 2800:, 2795:C 2790:( 2785:S 2758:C 2736:L 2687:n 2653:) 2649:( 2600:. 2597:) 2594:h 2591:( 2582:R 2572:H 2564:h 2559:n 2556:i 2553:m 2549:g 2546:r 2543:a 2537:= 2528:h 2500:H 2472:h 2446:. 2443:) 2438:i 2434:y 2430:, 2427:) 2422:i 2418:x 2414:( 2411:h 2408:( 2405:L 2400:n 2395:1 2392:= 2389:i 2379:n 2376:1 2371:= 2368:) 2365:h 2362:( 2353:R 2308:) 2305:y 2302:, 2299:x 2296:( 2293:P 2273:) 2270:h 2267:( 2264:R 2229:. 2225:) 2222:h 2219:( 2216:R 2206:H 2198:h 2193:n 2190:i 2187:m 2183:g 2180:r 2177:a 2171:= 2162:h 2138:) 2135:h 2132:( 2129:R 2107:H 2079:h 2048:y 2045:= 2036:y 2020:0 2013:y 2001:y 1985:1 1979:{ 1974:= 1971:) 1968:y 1965:, 1956:y 1950:( 1947:L 1920:. 1917:) 1914:y 1911:, 1908:x 1905:( 1902:P 1899:d 1895:) 1892:y 1889:, 1886:) 1883:x 1880:( 1877:h 1874:( 1871:L 1865:= 1862:] 1859:) 1856:y 1853:, 1850:) 1847:x 1844:( 1841:h 1838:( 1835:L 1832:[ 1828:E 1824:= 1821:) 1818:h 1815:( 1812:R 1785:) 1782:x 1779:( 1776:h 1748:y 1722:y 1699:) 1696:y 1693:, 1684:y 1678:( 1675:L 1649:x 1629:) 1626:x 1622:| 1618:y 1615:( 1612:P 1595:, 1583:x 1562:y 1542:) 1539:y 1536:, 1533:x 1530:( 1527:P 1503:) 1498:n 1494:y 1490:, 1485:n 1481:x 1477:( 1474:, 1468:, 1465:) 1460:1 1456:y 1452:, 1447:1 1443:x 1439:( 1416:n 1396:Y 1376:X 1356:) 1353:y 1350:, 1347:x 1344:( 1341:P 1315:) 1310:i 1306:x 1302:( 1299:h 1279:Y 1271:i 1267:y 1246:X 1238:i 1234:x 1213:) 1208:n 1204:y 1200:, 1195:n 1191:x 1187:( 1184:, 1178:, 1175:) 1170:1 1166:y 1162:, 1157:1 1153:x 1149:( 1126:n 1102:X 1096:x 1076:Y 1070:y 1046:Y 1040:X 1037:: 1034:h 1011:Y 991:X 941:e 934:t 927:v 507:k 356:k 283:k 241:) 229:(

Index

Machine learning
data mining
Supervised learning
Unsupervised learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Meta-learning
Online learning
Batch learning
Curriculum learning
Rule-based learning
Neuro-symbolic AI
Neuromorphic engineering
Quantum machine learning
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering
Feature learning
Learning to rank

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑