2632:
2662:
Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic
3025:. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can provide on the error for a given sample size for all distributions.
3405:
Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or
2065:
3005:
2610:
2239:
2456:
3396:
In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former.
3406:
characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.
2869:
1942:
1930:
967:; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the true "risk") because we do not know the true distribution of the data, but we can instead
1513:
1223:
3219:
2816:
3295:
2520:
3223:
It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers
1709:
3052:
2862:
2770:
2512:
2119:
971:
and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk".
2488:
1738:
1289:
1256:
1056:
3099:
2726:
2699:
1325:
3164:
3391:
2318:
1639:
1552:
1366:
1112:
1086:
866:
3248:
3131:
2095:
2325:
2283:
2148:
1795:
904:
2156:
3317:
3072:
2836:
2746:
1758:
1659:
1593:
1572:
1426:
1406:
1386:
1136:
1021:
1001:
2667:. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the
3319:. This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution.
861:
2346:
2321:
851:
692:
3021:
It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the
899:
1554:. The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because
856:
707:
3567:
3169:
2060:{\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}}
2674:
For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier,
438:
939:
742:
2336:, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the
3000:{\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{*})\right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}}
3255:
818:
3443:
1807:
367:
3611:
3528:
3493:
Devroye, L., Gyorfi, L. & Lugosi, G. A Probabilistic Theory of
Pattern Recognition. Discrete Appl Math 73, 192–194 (1997)
3469:
1431:
1141:
876:
639:
174:
894:
727:
702:
651:
2778:
2668:
775:
770:
423:
433:
71:
2643:
2615:
Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above
963:
based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the
1331:
932:
828:
592:
413:
3420:
3013:, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class.
803:
505:
281:
3010:
2605:{\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).}
760:
697:
607:
585:
428:
418:
956:
911:
823:
269:
91:
2461:
The empirical risk minimization principle states that the learning algorithm should choose a hypothesis
871:
798:
548:
443:
231:
164:
124:
3630:
2616:
1670:
925:
531:
299:
169:
3339:. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is
3031:
2841:
2751:
2493:
2100:
1981:
2664:
1602:
553:
473:
396:
314:
144:
106:
101:
61:
56:
3568:"Mathematics of Machine Learning Lecture 9 Notes | Mathematics of Machine Learning | Mathematics"
500:
349:
249:
76:
2464:
1714:
1261:
1228:
1026:
3355:
3077:
2704:
2677:
680:
656:
558:
319:
294:
254:
66:
1294:
3136:
3022:
2329:
968:
634:
456:
408:
264:
179:
51:
3361:
2288:
1607:
1522:
1336:
1091:
1065:
3504:
3226:
3104:
2234:{\displaystyle h^{*}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.}
2073:
964:
563:
513:
2259:
2124:
1771:
8:
3347:
980:
666:
602:
573:
478:
304:
237:
223:
209:
184:
134:
86:
46:
2248:
is defined to be the classifier minimizing the risk defined with the 0–1 loss function.
3340:
3302:
3057:
2821:
2731:
1743:
1644:
1578:
1557:
1411:
1391:
1371:
1121:
1006:
986:
644:
568:
354:
149:
3393:(and thus stop being agnostic learning algorithms to which the above result applies).
3607:
3534:
3524:
3465:
3336:
3328:
2337:
1936:
1765:
737:
580:
493:
289:
259:
204:
199:
154:
96:
3346:
In practice, machine learning algorithms cope with this issue either by employing a
3516:
2451:{\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).}
2245:
960:
765:
518:
468:
378:
362:
332:
194:
189:
139:
129:
27:
3603:
3595:
2773:
1598:
793:
597:
463:
403:
3358:), which is easier to optimize, or by imposing assumptions on the distribution
1798:
813:
344:
81:
3520:
3624:
3538:
1665:
732:
661:
543:
274:
159:
3009:
Similar results hold for regression tasks. These results are often based on
1761:
3460:
Györfi, László; Kohler, Michael; Krzyzak, Adam; Walk, Harro (2010-12-01).
3415:
538:
32:
2631:
3351:
2320:
is unknown to the learning algorithm (this situation is referred to as
687:
383:
309:
3464:(Softcover reprint of the original 1st ed.). New York: Springer.
3250:
converging to zero, it is possible to find a distribution such that:
846:
627:
3552:
2070:
The ultimate goal of a learning algorithm is to find a hypothesis
3332:
622:
3335:
problem even for a relatively simple class of functions such as
3327:
Empirical risk minimization for a classification problem with a
1516:
373:
2490:
which minimizes the empirical risk over the hypothesis class
1664:
It is also assumed that there is a non-negative real-valued
617:
612:
339:
3551:
V. Feldman, V. Guruswami, P. Raghavendra and Yi Wu (2009).
2053:
1925:{\displaystyle R(h)=\mathbf {E} =\int L(h(x),y)\,dP(x,y).}
3166:(meaning that perfect prediction is possible) such that:
905:
List of datasets in computer vision and image processing
3459:
1760:. For classification tasks these loss functions can be
3462:
2025:
1990:
3554:
Agnostic
Learning of Monomials by Halfspaces is Hard.
3400:
3364:
3305:
3258:
3229:
3172:
3139:
3107:
3080:
3060:
3034:
2872:
2844:
2824:
2781:
2754:
2734:
2707:
2680:
2523:
2496:
2467:
2349:
2291:
2262:
2159:
2127:
2103:
2076:
1945:
1810:
1774:
1746:
1717:
1673:
1647:
1610:
1581:
1560:
1525:
1508:{\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})}
1434:
1414:
1394:
1374:
1339:
1297:
1264:
1231:
1218:{\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})}
1144:
1124:
1094:
1068:
1029:
1009:
989:
979:
The following situation is a general setting of many
3503:
Devroye, Luc; Györfi, László; Lugosi, Gábor (1996).
3445:
3214:{\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .}
3502:
2701:being much worse than the best possible classifier
1291:is the corresponding response that is desired from
3385:
3311:
3289:
3242:
3213:
3158:
3125:
3093:
3066:
3046:
2999:
2856:
2830:
2810:
2764:
2740:
2720:
2693:
2604:
2506:
2482:
2450:
2312:
2277:
2233:
2142:
2113:
2089:
2059:
1924:
1789:
1752:
1732:
1703:
1653:
1633:
1587:
1566:
1546:
1507:
1420:
1400:
1380:
1360:
1330:To put it more formally, assuming that there is a
1319:
1283:
1250:
1217:
1130:
1106:
1080:
1050:
1015:
995:
2350:
3622:
2811:{\displaystyle {\mathcal {S}}({\mathcal {C}},n)}
3505:"A Probabilistic Theory of Pattern Recognition"
1935:A loss function commonly used in theory is the
2251:
900:List of datasets for machine-learning research
933:
3509:Stochastic Modelling and Applied Probability
2994:
2967:
2285:cannot be computed because the distribution
1711:which measures how different the prediction
3322:
3290:{\displaystyle \mathbb {E} L_{n}\geq a_{i}}
2663:properties of learning algorithms, such as
983:problems. There are two spaces of objects
940:
926:
3600:The Nature of Statistical Learning Theory
3260:
3174:
2874:
2579:
2551:
2213:
2185:
1897:
1740:of a hypothesis is from the true outcome
3016:
2328:training data points, we can compute an
1408:, and that the training set consists of
16:Principle in statistical learning theory
3623:
3602:. Information Science and Statistics.
3594:
3557:(See the paper and references therein)
3489:
3487:
3485:
3483:
3481:
3455:
3453:
2626:
3478:
1574:is not a deterministic function of
1023:and would like to learn a function
895:Glossary of artificial intelligence
13:
3588:
3401:Tilted empirical risk minimization
2947:
2934:
2794:
2784:
2757:
2748:defined over the hypothesis class
2571:
2558:
2555:
2552:
2548:
2545:
2542:
2499:
2205:
2192:
2189:
2186:
2182:
2179:
2176:
2106:
14:
3642:
3450:
3101:, there exists a distribution of
2244:For classification problems, the
2097:among a fixed class of functions
2630:
1827:
3350:to the 0–1 loss function (like
2031:
1996:
1704:{\displaystyle L({\hat {y}},y)}
3560:
3545:
3496:
3436:
3380:
3368:
3120:
3108:
3047:{\displaystyle \epsilon >0}
2958:
2942:
2921:
2908:
2899:
2886:
2857:{\displaystyle \epsilon >0}
2805:
2789:
2765:{\displaystyle {\mathcal {C}}}
2596:
2590:
2530:
2507:{\displaystyle {\mathcal {H}}}
2474:
2442:
2426:
2413:
2407:
2367:
2361:
2324:). However, given a sample of
2307:
2295:
2272:
2266:
2224:
2218:
2137:
2131:
2114:{\displaystyle {\mathcal {H}}}
2038:
2003:
1970:
1958:
1949:
1916:
1904:
1894:
1885:
1879:
1873:
1861:
1858:
1849:
1843:
1837:
1831:
1820:
1814:
1784:
1778:
1724:
1698:
1686:
1677:
1628:
1621:
1614:
1541:
1529:
1502:
1476:
1464:
1438:
1355:
1343:
1332:joint probability distribution
1314:
1301:
1212:
1186:
1174:
1148:
1042:
315:Relevance vector machine (RVM)
1:
3429:
3421:Maximum likelihood estimation
3011:uniform laws of large numbers
2622:
974:
804:Computational learning theory
368:Expectation–maximization (EM)
761:Coefficient of determination
608:Convolutional neural network
320:Support vector machine (SVM)
7:
3409:
3054:and consider a sample size
2252:Empirical risk minimization
1768:associated with hypothesis
957:statistical learning theory
953:Empirical risk minimization
912:Outline of machine learning
809:Empirical risk minimization
10:
3647:
2483:{\displaystyle {\hat {h}}}
1733:{\displaystyle {\hat {y}}}
1284:{\displaystyle y_{i}\in Y}
1251:{\displaystyle x_{i}\in X}
1062:) which outputs an object
1051:{\displaystyle \ h:X\to Y}
959:which defines a family of
549:Feedforward neural network
300:Artificial neural networks
3521:10.1007/978-1-4612-0711-5
3094:{\displaystyle \phi _{n}}
2721:{\displaystyle \phi ^{*}}
2694:{\displaystyle \phi _{n}}
532:Artificial neural network
3323:Computational complexity
3074:and classification rule
2818:given a dataset of size
2671:of the function class.
1603:conditional distribution
1320:{\displaystyle h(x_{i})}
841:Journals and conferences
788:Mathematical foundations
698:Temporal difference (TD)
554:Recurrent neural network
474:Conditional random field
397:Dimensionality reduction
145:Dimensionality reduction
107:Quantum machine learning
102:Neuromorphic engineering
62:Self-supervised learning
57:Semi-supervised learning
3159:{\displaystyle L^{*}=0}
1797:is then defined as the
1114:. To do so, there is a
250:Apprenticeship learning
3387:
3386:{\displaystyle P(x,y)}
3313:
3291:
3244:
3215:
3160:
3127:
3095:
3068:
3048:
3001:
2858:
2832:
2812:
2766:
2742:
2722:
2695:
2606:
2508:
2484:
2452:
2403:
2314:
2313:{\displaystyle P(x,y)}
2279:
2235:
2144:
2115:
2091:
2061:
1926:
1801:of the loss function:
1791:
1754:
1734:
1705:
1655:
1635:
1634:{\displaystyle P(y|x)}
1589:
1568:
1548:
1547:{\displaystyle P(x,y)}
1509:
1422:
1402:
1382:
1362:
1361:{\displaystyle P(x,y)}
1321:
1285:
1252:
1219:
1132:
1108:
1107:{\displaystyle x\in X}
1082:
1081:{\displaystyle y\in Y}
1052:
1017:
997:
799:Bias–variance tradeoff
681:Reinforcement learning
657:Spiking neural network
67:Reinforcement learning
3388:
3314:
3292:
3245:
3243:{\displaystyle a_{i}}
3216:
3161:
3128:
3126:{\displaystyle (X,Y)}
3096:
3069:
3049:
3023:No free lunch theorem
3017:Impossibility results
3002:
2859:
2833:
2813:
2767:
2743:
2723:
2696:
2607:
2509:
2485:
2453:
2383:
2315:
2280:
2256:In general, the risk
2236:
2145:
2116:
2092:
2090:{\displaystyle h^{*}}
2062:
1927:
1792:
1755:
1735:
1706:
1656:
1636:
1590:
1569:
1549:
1510:
1423:
1403:
1383:
1363:
1322:
1286:
1253:
1220:
1133:
1109:
1083:
1053:
1018:
998:
635:Neural radiance field
457:Structured prediction
180:Structured prediction
52:Unsupervised learning
3362:
3348:convex approximation
3303:
3256:
3227:
3170:
3137:
3105:
3078:
3058:
3032:
2870:
2842:
2822:
2779:
2752:
2732:
2728:. Consider the risk
2705:
2678:
2521:
2494:
2465:
2347:
2289:
2278:{\displaystyle R(h)}
2260:
2157:
2143:{\displaystyle R(h)}
2125:
2101:
2074:
1943:
1808:
1790:{\displaystyle h(x)}
1772:
1744:
1715:
1671:
1645:
1608:
1579:
1558:
1523:
1432:
1412:
1392:
1372:
1337:
1295:
1262:
1229:
1142:
1122:
1092:
1066:
1027:
1007:
987:
965:law of large numbers
824:Statistical learning
722:Learning with humans
514:Local outlier factor
2121:for which the risk
981:supervised learning
961:learning algorithms
667:Electrochemical RAM
574:reservoir computing
305:Logistic regression
224:Supervised learning
210:Multimodal learning
185:Feature engineering
130:Generative modeling
92:Rule-based learning
87:Curriculum learning
47:Supervised learning
22:Part of a series on
3572:MIT OpenCourseWare
3442:V. Vapnik (1992).
3383:
3341:linearly separable
3337:linear classifiers
3331:is known to be an
3309:
3287:
3240:
3211:
3156:
3123:
3091:
3064:
3044:
3028:Specifically, Let
2997:
2854:
2838:. Then, for every
2828:
2808:
2762:
2738:
2718:
2691:
2642:. You can help by
2602:
2577:
2504:
2480:
2448:
2310:
2275:
2231:
2211:
2140:
2111:
2087:
2057:
2052:
2029:
1994:
1922:
1787:
1750:
1730:
1701:
1651:
1631:
1585:
1564:
1544:
1505:
1418:
1398:
1378:
1358:
1317:
1281:
1248:
1215:
1128:
1104:
1078:
1048:
1013:
993:
955:is a principle in
235: •
150:Density estimation
3613:978-0-387-98780-4
3530:978-1-4612-6877-2
3471:978-1-4419-2998-3
3329:0-1 loss function
3312:{\displaystyle n}
3067:{\displaystyle n}
2831:{\displaystyle n}
2741:{\displaystyle L}
2660:
2659:
2587:
2540:
2533:
2477:
2381:
2358:
2338:empirical measure
2322:agnostic learning
2174:
2041:
2028:
2006:
1993:
1961:
1937:0-1 loss function
1753:{\displaystyle y}
1727:
1689:
1654:{\displaystyle x}
1588:{\displaystyle x}
1567:{\displaystyle y}
1437:
1421:{\displaystyle n}
1401:{\displaystyle Y}
1381:{\displaystyle X}
1147:
1131:{\displaystyle n}
1032:
1016:{\displaystyle Y}
996:{\displaystyle X}
950:
949:
755:Model diagnostics
738:Human-in-the-loop
581:Boltzmann machine
494:Anomaly detection
290:Linear regression
205:Ontology learning
200:Grammar induction
175:Semantic analysis
170:Association rules
155:Anomaly detection
97:Neuro-symbolic AI
3638:
3631:Machine learning
3617:
3582:
3581:
3579:
3578:
3564:
3558:
3549:
3543:
3542:
3500:
3494:
3491:
3476:
3475:
3457:
3448:
3440:
3392:
3390:
3389:
3384:
3318:
3316:
3315:
3310:
3296:
3294:
3293:
3288:
3286:
3285:
3273:
3272:
3263:
3249:
3247:
3246:
3241:
3239:
3238:
3220:
3218:
3217:
3212:
3198:
3187:
3186:
3177:
3165:
3163:
3162:
3157:
3149:
3148:
3132:
3130:
3129:
3124:
3100:
3098:
3097:
3092:
3090:
3089:
3073:
3071:
3070:
3065:
3053:
3051:
3050:
3045:
3006:
3004:
3003:
2998:
2990:
2985:
2984:
2951:
2950:
2938:
2937:
2928:
2924:
2920:
2919:
2898:
2897:
2877:
2863:
2861:
2860:
2855:
2837:
2835:
2834:
2829:
2817:
2815:
2814:
2809:
2798:
2797:
2788:
2787:
2771:
2769:
2768:
2763:
2761:
2760:
2747:
2745:
2744:
2739:
2727:
2725:
2724:
2719:
2717:
2716:
2700:
2698:
2697:
2692:
2690:
2689:
2655:
2652:
2634:
2627:
2611:
2609:
2608:
2603:
2589:
2588:
2585:
2578:
2576:
2575:
2574:
2561:
2535:
2534:
2526:
2513:
2511:
2510:
2505:
2503:
2502:
2489:
2487:
2486:
2481:
2479:
2478:
2470:
2457:
2455:
2454:
2449:
2441:
2440:
2425:
2424:
2402:
2397:
2382:
2374:
2360:
2359:
2356:
2319:
2317:
2316:
2311:
2284:
2282:
2281:
2276:
2246:Bayes classifier
2240:
2238:
2237:
2232:
2227:
2212:
2210:
2209:
2208:
2195:
2169:
2168:
2149:
2147:
2146:
2141:
2120:
2118:
2117:
2112:
2110:
2109:
2096:
2094:
2093:
2088:
2086:
2085:
2066:
2064:
2063:
2058:
2056:
2055:
2043:
2042:
2034:
2030:
2026:
2008:
2007:
1999:
1995:
1991:
1963:
1962:
1954:
1931:
1929:
1928:
1923:
1830:
1796:
1794:
1793:
1788:
1759:
1757:
1756:
1751:
1739:
1737:
1736:
1731:
1729:
1728:
1720:
1710:
1708:
1707:
1702:
1691:
1690:
1682:
1660:
1658:
1657:
1652:
1640:
1638:
1637:
1632:
1624:
1596:
1594:
1592:
1591:
1586:
1573:
1571:
1570:
1565:
1553:
1551:
1550:
1545:
1514:
1512:
1511:
1506:
1501:
1500:
1488:
1487:
1463:
1462:
1450:
1449:
1435:
1427:
1425:
1424:
1419:
1407:
1405:
1404:
1399:
1387:
1385:
1384:
1379:
1367:
1365:
1364:
1359:
1326:
1324:
1323:
1318:
1313:
1312:
1290:
1288:
1287:
1282:
1274:
1273:
1258:is an input and
1257:
1255:
1254:
1249:
1241:
1240:
1224:
1222:
1221:
1216:
1211:
1210:
1198:
1197:
1173:
1172:
1160:
1159:
1145:
1137:
1135:
1134:
1129:
1113:
1111:
1110:
1105:
1087:
1085:
1084:
1079:
1057:
1055:
1054:
1049:
1030:
1022:
1020:
1019:
1014:
1002:
1000:
999:
994:
942:
935:
928:
889:Related articles
766:Confusion matrix
519:Isolation forest
464:Graphical models
243:
242:
195:Learning to rank
190:Feature learning
28:Machine learning
19:
18:
3646:
3645:
3641:
3640:
3639:
3637:
3636:
3635:
3621:
3620:
3614:
3604:Springer-Verlag
3591:
3589:Further reading
3586:
3585:
3576:
3574:
3566:
3565:
3561:
3550:
3546:
3531:
3501:
3497:
3492:
3479:
3472:
3458:
3451:
3441:
3437:
3432:
3426:
3412:
3403:
3363:
3360:
3359:
3325:
3304:
3301:
3300:
3281:
3277:
3268:
3264:
3259:
3257:
3254:
3253:
3234:
3230:
3228:
3225:
3224:
3194:
3182:
3178:
3173:
3171:
3168:
3167:
3144:
3140:
3138:
3135:
3134:
3106:
3103:
3102:
3085:
3081:
3079:
3076:
3075:
3059:
3056:
3055:
3033:
3030:
3029:
3019:
2986:
2980:
2976:
2946:
2945:
2933:
2932:
2915:
2911:
2893:
2889:
2882:
2878:
2873:
2871:
2868:
2867:
2843:
2840:
2839:
2823:
2820:
2819:
2793:
2792:
2783:
2782:
2780:
2777:
2776:
2774:growth function
2756:
2755:
2753:
2750:
2749:
2733:
2730:
2729:
2712:
2708:
2706:
2703:
2702:
2685:
2681:
2679:
2676:
2675:
2656:
2650:
2647:
2640:needs expansion
2625:
2584:
2580:
2570:
2569:
2562:
2541:
2539:
2525:
2524:
2522:
2519:
2518:
2498:
2497:
2495:
2492:
2491:
2469:
2468:
2466:
2463:
2462:
2436:
2432:
2420:
2416:
2398:
2387:
2373:
2355:
2351:
2348:
2345:
2344:
2290:
2287:
2286:
2261:
2258:
2257:
2254:
2214:
2204:
2203:
2196:
2175:
2173:
2164:
2160:
2158:
2155:
2154:
2126:
2123:
2122:
2105:
2104:
2102:
2099:
2098:
2081:
2077:
2075:
2072:
2071:
2051:
2050:
2033:
2032:
2024:
2022:
2016:
2015:
1998:
1997:
1989:
1987:
1977:
1976:
1953:
1952:
1944:
1941:
1940:
1826:
1809:
1806:
1805:
1773:
1770:
1769:
1745:
1742:
1741:
1719:
1718:
1716:
1713:
1712:
1681:
1680:
1672:
1669:
1668:
1646:
1643:
1642:
1620:
1609:
1606:
1605:
1599:random variable
1580:
1577:
1576:
1575:
1559:
1556:
1555:
1524:
1521:
1520:
1496:
1492:
1483:
1479:
1458:
1454:
1445:
1441:
1433:
1430:
1429:
1413:
1410:
1409:
1393:
1390:
1389:
1373:
1370:
1369:
1338:
1335:
1334:
1308:
1304:
1296:
1293:
1292:
1269:
1265:
1263:
1260:
1259:
1236:
1232:
1230:
1227:
1226:
1206:
1202:
1193:
1189:
1168:
1164:
1155:
1151:
1143:
1140:
1139:
1123:
1120:
1119:
1093:
1090:
1089:
1067:
1064:
1063:
1028:
1025:
1024:
1008:
1005:
1004:
988:
985:
984:
977:
946:
917:
916:
890:
882:
881:
842:
834:
833:
794:Kernel machines
789:
781:
780:
756:
748:
747:
728:Active learning
723:
715:
714:
683:
673:
672:
598:Diffusion model
534:
524:
523:
496:
486:
485:
459:
449:
448:
404:Factor analysis
399:
389:
388:
372:
335:
325:
324:
245:
244:
228:
227:
226:
215:
214:
120:
112:
111:
77:Online learning
42:
30:
17:
12:
11:
5:
3644:
3634:
3633:
3619:
3618:
3612:
3590:
3587:
3584:
3583:
3559:
3544:
3529:
3495:
3477:
3470:
3449:
3434:
3433:
3431:
3428:
3424:
3423:
3418:
3411:
3408:
3402:
3399:
3382:
3379:
3376:
3373:
3370:
3367:
3324:
3321:
3308:
3284:
3280:
3276:
3271:
3267:
3262:
3237:
3233:
3210:
3207:
3204:
3201:
3197:
3193:
3190:
3185:
3181:
3176:
3155:
3152:
3147:
3143:
3122:
3119:
3116:
3113:
3110:
3088:
3084:
3063:
3043:
3040:
3037:
3018:
3015:
2996:
2993:
2989:
2983:
2979:
2975:
2972:
2969:
2966:
2963:
2960:
2957:
2954:
2949:
2944:
2941:
2936:
2931:
2927:
2923:
2918:
2914:
2910:
2907:
2904:
2901:
2896:
2892:
2888:
2885:
2881:
2876:
2853:
2850:
2847:
2827:
2807:
2804:
2801:
2796:
2791:
2786:
2759:
2737:
2715:
2711:
2688:
2684:
2658:
2657:
2637:
2635:
2624:
2621:
2613:
2612:
2601:
2598:
2595:
2592:
2583:
2573:
2568:
2565:
2560:
2557:
2554:
2550:
2547:
2544:
2538:
2532:
2529:
2501:
2476:
2473:
2459:
2458:
2447:
2444:
2439:
2435:
2431:
2428:
2423:
2419:
2415:
2412:
2409:
2406:
2401:
2396:
2393:
2390:
2386:
2380:
2377:
2372:
2369:
2366:
2363:
2354:
2334:empirical risk
2309:
2306:
2303:
2300:
2297:
2294:
2274:
2271:
2268:
2265:
2253:
2250:
2242:
2241:
2230:
2226:
2223:
2220:
2217:
2207:
2202:
2199:
2194:
2191:
2188:
2184:
2181:
2178:
2172:
2167:
2163:
2139:
2136:
2133:
2130:
2108:
2084:
2080:
2054:
2049:
2046:
2040:
2037:
2027: if
2023:
2021:
2018:
2017:
2014:
2011:
2005:
2002:
1992: if
1988:
1986:
1983:
1982:
1980:
1975:
1972:
1969:
1966:
1960:
1957:
1951:
1948:
1933:
1932:
1921:
1918:
1915:
1912:
1909:
1906:
1903:
1900:
1896:
1893:
1890:
1887:
1884:
1881:
1878:
1875:
1872:
1869:
1866:
1863:
1860:
1857:
1854:
1851:
1848:
1845:
1842:
1839:
1836:
1833:
1829:
1825:
1822:
1819:
1816:
1813:
1786:
1783:
1780:
1777:
1749:
1726:
1723:
1700:
1697:
1694:
1688:
1685:
1679:
1676:
1650:
1630:
1627:
1623:
1619:
1616:
1613:
1584:
1563:
1543:
1540:
1537:
1534:
1531:
1528:
1504:
1499:
1495:
1491:
1486:
1482:
1478:
1475:
1472:
1469:
1466:
1461:
1457:
1453:
1448:
1444:
1440:
1417:
1397:
1377:
1357:
1354:
1351:
1348:
1345:
1342:
1316:
1311:
1307:
1303:
1300:
1280:
1277:
1272:
1268:
1247:
1244:
1239:
1235:
1214:
1209:
1205:
1201:
1196:
1192:
1188:
1185:
1182:
1179:
1176:
1171:
1167:
1163:
1158:
1154:
1150:
1127:
1103:
1100:
1097:
1077:
1074:
1071:
1058:(often called
1047:
1044:
1041:
1038:
1035:
1012:
992:
976:
973:
948:
947:
945:
944:
937:
930:
922:
919:
918:
915:
914:
909:
908:
907:
897:
891:
888:
887:
884:
883:
880:
879:
874:
869:
864:
859:
854:
849:
843:
840:
839:
836:
835:
832:
831:
826:
821:
816:
814:Occam learning
811:
806:
801:
796:
790:
787:
786:
783:
782:
779:
778:
773:
771:Learning curve
768:
763:
757:
754:
753:
750:
749:
746:
745:
740:
735:
730:
724:
721:
720:
717:
716:
713:
712:
711:
710:
700:
695:
690:
684:
679:
678:
675:
674:
671:
670:
664:
659:
654:
649:
648:
647:
637:
632:
631:
630:
625:
620:
615:
605:
600:
595:
590:
589:
588:
578:
577:
576:
571:
566:
561:
551:
546:
541:
535:
530:
529:
526:
525:
522:
521:
516:
511:
503:
497:
492:
491:
488:
487:
484:
483:
482:
481:
476:
471:
460:
455:
454:
451:
450:
447:
446:
441:
436:
431:
426:
421:
416:
411:
406:
400:
395:
394:
391:
390:
387:
386:
381:
376:
370:
365:
360:
352:
347:
342:
336:
331:
330:
327:
326:
323:
322:
317:
312:
307:
302:
297:
292:
287:
279:
278:
277:
272:
267:
257:
255:Decision trees
252:
246:
232:classification
222:
221:
220:
217:
216:
213:
212:
207:
202:
197:
192:
187:
182:
177:
172:
167:
162:
157:
152:
147:
142:
137:
132:
127:
125:Classification
121:
118:
117:
114:
113:
110:
109:
104:
99:
94:
89:
84:
82:Batch learning
79:
74:
69:
64:
59:
54:
49:
43:
40:
39:
36:
35:
24:
23:
15:
9:
6:
4:
3:
2:
3643:
3632:
3629:
3628:
3626:
3615:
3609:
3605:
3601:
3597:
3593:
3592:
3573:
3569:
3563:
3556:
3555:
3548:
3540:
3536:
3532:
3526:
3522:
3518:
3514:
3510:
3506:
3499:
3490:
3488:
3486:
3484:
3482:
3473:
3467:
3463:
3456:
3454:
3447:
3446:
3439:
3435:
3427:
3422:
3419:
3417:
3414:
3413:
3407:
3398:
3394:
3377:
3374:
3371:
3365:
3357:
3353:
3349:
3344:
3342:
3338:
3334:
3330:
3320:
3306:
3297:
3282:
3278:
3274:
3269:
3265:
3251:
3235:
3231:
3221:
3208:
3205:
3202:
3199:
3195:
3191:
3188:
3183:
3179:
3153:
3150:
3145:
3141:
3117:
3114:
3111:
3086:
3082:
3061:
3041:
3038:
3035:
3026:
3024:
3014:
3012:
3007:
2991:
2987:
2981:
2977:
2973:
2970:
2964:
2961:
2955:
2952:
2939:
2929:
2925:
2916:
2912:
2905:
2902:
2894:
2890:
2883:
2879:
2865:
2851:
2848:
2845:
2825:
2802:
2799:
2775:
2735:
2713:
2709:
2686:
2682:
2672:
2670:
2669:VC complexity
2666:
2654:
2651:February 2023
2645:
2641:
2638:This section
2636:
2633:
2629:
2628:
2620:
2618:
2599:
2593:
2581:
2566:
2563:
2536:
2527:
2517:
2516:
2515:
2471:
2445:
2437:
2433:
2429:
2421:
2417:
2410:
2404:
2399:
2394:
2391:
2388:
2384:
2378:
2375:
2370:
2364:
2352:
2343:
2342:
2341:
2339:
2335:
2332:, called the
2331:
2327:
2323:
2304:
2301:
2298:
2292:
2269:
2263:
2249:
2247:
2228:
2221:
2215:
2200:
2197:
2170:
2165:
2161:
2153:
2152:
2151:
2134:
2128:
2082:
2078:
2068:
2047:
2044:
2035:
2019:
2012:
2009:
2000:
1984:
1978:
1973:
1967:
1964:
1955:
1946:
1938:
1919:
1913:
1910:
1907:
1901:
1898:
1891:
1888:
1882:
1876:
1870:
1867:
1864:
1855:
1852:
1846:
1840:
1834:
1823:
1817:
1811:
1804:
1803:
1802:
1800:
1781:
1775:
1767:
1763:
1762:scoring rules
1747:
1721:
1695:
1692:
1683:
1674:
1667:
1666:loss function
1662:
1648:
1625:
1617:
1611:
1604:
1600:
1597:but rather a
1582:
1561:
1538:
1535:
1532:
1526:
1518:
1497:
1493:
1489:
1484:
1480:
1473:
1470:
1467:
1459:
1455:
1451:
1446:
1442:
1415:
1395:
1375:
1352:
1349:
1346:
1340:
1333:
1328:
1309:
1305:
1298:
1278:
1275:
1270:
1266:
1245:
1242:
1237:
1233:
1207:
1203:
1199:
1194:
1190:
1183:
1180:
1177:
1169:
1165:
1161:
1156:
1152:
1125:
1117:
1101:
1098:
1095:
1075:
1072:
1069:
1061:
1045:
1039:
1036:
1033:
1010:
990:
982:
972:
970:
966:
962:
958:
954:
943:
938:
936:
931:
929:
924:
923:
921:
920:
913:
910:
906:
903:
902:
901:
898:
896:
893:
892:
886:
885:
878:
875:
873:
870:
868:
865:
863:
860:
858:
855:
853:
850:
848:
845:
844:
838:
837:
830:
827:
825:
822:
820:
817:
815:
812:
810:
807:
805:
802:
800:
797:
795:
792:
791:
785:
784:
777:
774:
772:
769:
767:
764:
762:
759:
758:
752:
751:
744:
741:
739:
736:
734:
733:Crowdsourcing
731:
729:
726:
725:
719:
718:
709:
706:
705:
704:
701:
699:
696:
694:
691:
689:
686:
685:
682:
677:
676:
668:
665:
663:
662:Memtransistor
660:
658:
655:
653:
650:
646:
643:
642:
641:
638:
636:
633:
629:
626:
624:
621:
619:
616:
614:
611:
610:
609:
606:
604:
601:
599:
596:
594:
591:
587:
584:
583:
582:
579:
575:
572:
570:
567:
565:
562:
560:
557:
556:
555:
552:
550:
547:
545:
544:Deep learning
542:
540:
537:
536:
533:
528:
527:
520:
517:
515:
512:
510:
508:
504:
502:
499:
498:
495:
490:
489:
480:
479:Hidden Markov
477:
475:
472:
470:
467:
466:
465:
462:
461:
458:
453:
452:
445:
442:
440:
437:
435:
432:
430:
427:
425:
422:
420:
417:
415:
412:
410:
407:
405:
402:
401:
398:
393:
392:
385:
382:
380:
377:
375:
371:
369:
366:
364:
361:
359:
357:
353:
351:
348:
346:
343:
341:
338:
337:
334:
329:
328:
321:
318:
316:
313:
311:
308:
306:
303:
301:
298:
296:
293:
291:
288:
286:
284:
280:
276:
275:Random forest
273:
271:
268:
266:
263:
262:
261:
258:
256:
253:
251:
248:
247:
240:
239:
234:
233:
225:
219:
218:
211:
208:
206:
203:
201:
198:
196:
193:
191:
188:
186:
183:
181:
178:
176:
173:
171:
168:
166:
163:
161:
160:Data cleaning
158:
156:
153:
151:
148:
146:
143:
141:
138:
136:
133:
131:
128:
126:
123:
122:
116:
115:
108:
105:
103:
100:
98:
95:
93:
90:
88:
85:
83:
80:
78:
75:
73:
72:Meta-learning
70:
68:
65:
63:
60:
58:
55:
53:
50:
48:
45:
44:
38:
37:
34:
29:
26:
25:
21:
20:
3599:
3575:. Retrieved
3571:
3562:
3553:
3547:
3512:
3508:
3498:
3461:
3444:
3438:
3425:
3404:
3395:
3345:
3326:
3298:
3252:
3222:
3027:
3020:
3008:
2866:
2673:
2661:
2648:
2644:adding to it
2639:
2617:optimization
2614:
2460:
2333:
2255:
2243:
2150:is minimal:
2069:
1934:
1663:
1641:for a fixed
1329:
1116:training set
1115:
1059:
978:
952:
951:
819:PAC learning
808:
506:
355:
350:Hierarchical
282:
236:
230:
3416:M-estimator
2665:consistency
1799:expectation
703:Multi-agent
640:Transformer
539:Autoencoder
295:Naive Bayes
33:data mining
3596:Vapnik, V.
3577:2023-10-28
3430:References
3352:hinge loss
3133:with risk
2623:Properties
1428:instances
1060:hypothesis
975:Background
688:Q-learning
586:Restricted
384:Mean shift
333:Clustering
310:Perceptron
238:regression
140:Clustering
135:Regression
3539:0172-4568
3275:≥
3206:ϵ
3203:−
3189:≥
3146:∗
3083:ϕ
3036:ϵ
2978:ϵ
2971:−
2965:
2930:≤
2917:∗
2913:ϕ
2903:−
2891:ϕ
2846:ϵ
2714:∗
2710:ϕ
2683:ϕ
2619:problem.
2567:∈
2531:^
2475:^
2385:∑
2201:∈
2166:∗
2083:∗
2039:^
2010:≠
2004:^
1959:^
1868:∫
1725:^
1687:^
1471:…
1276:∈
1243:∈
1181:…
1138:examples
1099:∈
1073:∈
1043:→
847:ECML PKDD
829:VC theory
776:ROC curve
708:Self-play
628:DeepDream
469:Bayes net
260:Ensembles
41:Paradigms
3625:Category
3598:(2000).
3410:See also
3299:for all
2330:estimate
1088:, given
969:estimate
270:Boosting
119:Problems
3333:NP-hard
852:NeurIPS
669:(ECRAM)
623:AlexNet
265:Bagging
3610:
3537:
3527:
3468:
1764:. The
1517:i.i.d.
1515:drawn
1436:
1225:where
1146:
1031:
645:Vision
501:RANSAC
379:OPTICS
374:DBSCAN
358:-means
165:AutoML
2772:with
1601:with
1519:from
1368:over
867:IJCAI
693:SARSA
652:Mamba
618:LeNet
613:U-Net
439:t-SNE
363:Fuzzy
340:BIRCH
3608:ISBN
3535:ISSN
3525:ISBN
3466:ISBN
3354:for
3039:>
2849:>
1766:risk
1388:and
1003:and
877:JMLR
862:ICLR
857:ICML
743:RLHF
559:LSTM
345:CURE
31:and
3517:doi
3356:SVM
2962:exp
2646:.
2586:emp
2357:emp
2326:iid
1118:of
603:SOM
593:GAN
569:ESN
564:GRU
509:-NN
444:SDL
434:PGD
429:PCA
424:NMF
419:LDA
414:ICA
409:CCA
285:-NN
3627::
3606:.
3570:.
3533:.
3523:.
3515:.
3513:31
3511:.
3507:.
3480:^
3452:^
3343:.
2992:32
2864::
2514::
2340::
2067:.
1939::
1661:.
1327:.
872:ML
3616:.
3580:.
3541:.
3519::
3474:.
3381:)
3378:y
3375:,
3372:x
3369:(
3366:P
3307:n
3283:i
3279:a
3270:n
3266:L
3261:E
3236:i
3232:a
3209:.
3200:2
3196:/
3192:1
3184:n
3180:L
3175:E
3154:0
3151:=
3142:L
3121:)
3118:Y
3115:,
3112:X
3109:(
3087:n
3062:n
3042:0
2995:}
2988:/
2982:2
2974:n
2968:{
2959:)
2956:n
2953:,
2948:C
2943:(
2940:S
2935:8
2926:)
2922:)
2909:(
2906:L
2900:)
2895:n
2887:(
2884:L
2880:(
2875:P
2852:0
2826:n
2806:)
2803:n
2800:,
2795:C
2790:(
2785:S
2758:C
2736:L
2687:n
2653:)
2649:(
2600:.
2597:)
2594:h
2591:(
2582:R
2572:H
2564:h
2559:n
2556:i
2553:m
2549:g
2546:r
2543:a
2537:=
2528:h
2500:H
2472:h
2446:.
2443:)
2438:i
2434:y
2430:,
2427:)
2422:i
2418:x
2414:(
2411:h
2408:(
2405:L
2400:n
2395:1
2392:=
2389:i
2379:n
2376:1
2371:=
2368:)
2365:h
2362:(
2353:R
2308:)
2305:y
2302:,
2299:x
2296:(
2293:P
2273:)
2270:h
2267:(
2264:R
2229:.
2225:)
2222:h
2219:(
2216:R
2206:H
2198:h
2193:n
2190:i
2187:m
2183:g
2180:r
2177:a
2171:=
2162:h
2138:)
2135:h
2132:(
2129:R
2107:H
2079:h
2048:y
2045:=
2036:y
2020:0
2013:y
2001:y
1985:1
1979:{
1974:=
1971:)
1968:y
1965:,
1956:y
1950:(
1947:L
1920:.
1917:)
1914:y
1911:,
1908:x
1905:(
1902:P
1899:d
1895:)
1892:y
1889:,
1886:)
1883:x
1880:(
1877:h
1874:(
1871:L
1865:=
1862:]
1859:)
1856:y
1853:,
1850:)
1847:x
1844:(
1841:h
1838:(
1835:L
1832:[
1828:E
1824:=
1821:)
1818:h
1815:(
1812:R
1785:)
1782:x
1779:(
1776:h
1748:y
1722:y
1699:)
1696:y
1693:,
1684:y
1678:(
1675:L
1649:x
1629:)
1626:x
1622:|
1618:y
1615:(
1612:P
1595:,
1583:x
1562:y
1542:)
1539:y
1536:,
1533:x
1530:(
1527:P
1503:)
1498:n
1494:y
1490:,
1485:n
1481:x
1477:(
1474:,
1468:,
1465:)
1460:1
1456:y
1452:,
1447:1
1443:x
1439:(
1416:n
1396:Y
1376:X
1356:)
1353:y
1350:,
1347:x
1344:(
1341:P
1315:)
1310:i
1306:x
1302:(
1299:h
1279:Y
1271:i
1267:y
1246:X
1238:i
1234:x
1213:)
1208:n
1204:y
1200:,
1195:n
1191:x
1187:(
1184:,
1178:,
1175:)
1170:1
1166:y
1162:,
1157:1
1153:x
1149:(
1126:n
1102:X
1096:x
1076:Y
1070:y
1046:Y
1040:X
1037::
1034:h
1011:Y
991:X
941:e
934:t
927:v
507:k
356:k
283:k
241:)
229:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.