Nicholas Carlini - Knowledge

131: 33: 218:, a method of generating adversarial examples against machine learning models. The attack was proved to be useful against defensive distillation, a popular mechanism where a student model is trained based on the features of a parent model to increase the robustness and generalizability of student models. The attack gained popularity when it was shown that the methodology was also effective against most other defenses, rendering them ineffective. In 2018, Carlini demonstrated an attack against 226:

model where he showed that by hiding malicious commands inside normal speech input the speech model would respond to the hidden commands even when the commands were not discernible by humans. In the same year, Carlini and his team at UC Berkeley showed that out of the 11 papers presenting defenses to

297:, expanding on work from a research paper of his from 2015. The judges commented on his submission "This year's Best of Show (carlini) is such a novel way of obfuscation that it would be worth of a special mention in the (future) Best of IOCCC list!". [ 190:, could memorize and output personally identifiable information. His research demonstrated that this issue worsened with larger models, and he later showed similar vulnerabilities in generative image models, such as 185:

In addition to his work on adversarial attacks, Carlini has made significant contributions to understanding the privacy risks of machine learning models. In 2020, he revealed that large language models, like

162:

in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input.

273:

would also sometimes output exact copies of webpages it was trained on, including personally identifiable information. Some of these studies have since been referenced by the courts in debating

261:. He then led an analysis of larger models and studied how memorization increased with model size. Then, in 2022 he showed the same vulnerability in generative image models, and specifically 246:

specifically for answer questions and answer sets. Carlini has also developed methods to cause large language models like ChatGPT to answer harmful questions like how to construct bombs.

826: 532: 753: 596: 174:

model, showing that hidden commands could be embedded in speech inputs, which the model would execute even if they were inaudible to humans. He also led a team at

238:, creating a questionnaire where humans typically scored 35% whereas AI models scored in the 40%, with GPT-3 getting 38% which could be improved to 40% through 228: 179: 1038: 1053: 977: 795: 493: 958:

Nasr, Milad; Hayes, Jamie; Steinke, Thomas; Balle, Borja; Tramèr, Florian; Jagielski, Matthew; Carlini, Nicholas; Terzis, Andreas (2023).

822: 435:"An Approach to Improve the Robustness of Machine Learning based Intrusion Detection System Models Against the Carlini-Wagner Attack" 315:

Best Paper Award, ICML 2018 ("Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples")

877: 386: 1048: 1043: 971: 942: 516: 454: 274: 554: 460: 249:

He is also known for his work studying the privacy of machine learning models. In 2020, he showed for the first time that

580: 258: 1010: 203: 175: 46: 327:

Best Paper Award, ICML 2024 ("Considerations for Differentially Private Learning with Large-Scale Public Pretraining")

907: 727: 666: 930: 847: 206:, in 2013. He then continued his studies at the same university, where he pursued a PhD under the supervision of 694: 356: 269:

could output images of people's faces that it was trained on. Following on this, Carlini then showed that

211: 155: 959: 178:

that successfully broke seven out of eleven defenses against adversarial attacks presented at the 2018

159: 318:

Distinguished Paper Award, USENIX 2021 ("Poisoning the Unlabeled Dataset of Semi-Supervised Learning")

779: 215: 483: 321:

Distinguished Paper Award, USENIX 2023 ("Tight Auditing of Differentially Private Machine Learning")

312:

Best Student Paper Award, IEEE S&P 2017 ("Towards Evaluating the Robustness of Neural Networks")

1058: 202:

Nicholas Carlini obtained his Bachelor of Arts in Computer Science and Mathematics from the

631: 250: 235: 8: 130: 635: 433:

Pujari, Medha; Cherukuri, Bhanu Prakash; Javaid, Ahmad Y; Sun, Weiqing (July 27, 2022).

239: 219: 869: 378: 967: 938: 787: 702: 647: 619: 588: 524: 450: 147: 68: 32: 253:

would memorize some text data that they were trained on. For example, he found that

639: 442: 266: 262: 191: 151: 101: 446: 408: 207: 143: 106: 78: 643: 434: 1032: 1002: 791: 706: 592: 528: 754:"Paper: Stable Diffusion "memorizes" some images, sparking privacy concerns" 324:

Best Paper Award, ICML 2024 ("Stealing Part of a Production Language Model")

671: 651: 488: 899: 821: (United states district court northern district of California), 695:"Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots" 439:

2022 IEEE International Conference on Cyber Security and Resilience (CSR)

290: 839: 555:"As voice assistants go mainstream, researchers warn of vulnerabilities" 223: 171: 667:"AI chatbots can be tricked into misbehaving. Can scientists stop it?" 242:. The best performer in the test was UnifiedQA, a model developed by 780:"ChatGPT Spit Out Sensitive Data When Told to Repeat 'Poem' Forever" 270: 167: 348: 121: 931:"Poisoning the Unlabeled Dataset of {Semi-Supervised} Learning" 294: 243: 85: 210:, completing it in 2018. Carlini became known for his work on 286: 254: 187: 620:"Robo-writers: the rise and risks of language-generating AI" 581:"AI Has a Hallucination Problem That's Proving Tough to Fix" 960:"Tight Auditing of Differentially Private Machine Learning" 299: 50: 517:"Alexa and Siri Can Hear This Hidden Command. You Can't" 432: 91:

Evaluation and Design of Robust Neural Network Defenses

957: 214:. In 2016, he worked alongside Wagner to develop the 285:

Carlini received the Best of Show award at the 2020

180:

International Conference on Learning Representations

234:Since 2021, he and his team have been working on 1030: 870:"IEEE Symposium on Security and Privacy 2017" 227:adversarial attacks accepted in that year's 146:who has published research in the fields of 166:In 2018, Carlini demonstrated an attack on 16:American artificial intelligence researcher 142:is an American researcher affiliated with 129: 31: 1039:University of California, Berkeley alumni 231:, seven of the defenses could be broken. 1054:People associated with computer security 928: 751: 664: 482:Schwab, Katharine (December 12, 2017). 1031: 1013:from the original on September 8, 2024 980:from the original on September 8, 2024 910:from the original on September 2, 2024 880:from the original on September 2, 2024 850:from the original on September 8, 2024 617: 481: 997: 995: 535:from the original on January 25, 2021 514: 496:from the original on October 30, 2023 463:from the original on February 2, 2023 692: 578: 343: 341: 665:Conover, Emily (February 1, 2024). 259:personally identifiable information 13: 992: 798:from the original on July 26, 2024 777: 752:Edwards, Benj (February 1, 2023). 728:"What does GPT-3 "know" about me?" 599:from the original on June 11, 2023 204:University of California, Berkeley 197: 47:University of California, Berkeley 14: 1070: 618:Hutson, Matthew (March 3, 2021). 389:from the original on June 4, 2024 359:from the original on June 3, 2024 338: 515:Smith, Craig S. (May 10, 2018). 951: 922: 892: 862: 832: 810: 771: 745: 720: 686: 658: 158:, particularly his work on the 611: 572: 547: 508: 484:"How To Fool A Neural Network" 475: 426: 401: 371: 154:. He is known for his work on 1: 447:10.1109/CSR54599.2022.9850306 331: 280: 1049:American computer scientists 1044:Machine learning researchers 693:Metz, Cade (July 27, 2023). 293:game entirely with calls to 212:adversarial machine learning 156:adversarial machine learning 7: 216:Carlini & Wagner attack 160:Carlini & Wagner attack 10: 1075: 929:Carlini, Nicholas (2021). 644:10.1038/d41586-021-00530-0 275:the copyright status of AI 306: 116: 112: 100: 84: 74: 64: 57: 42: 30: 23: 441:. IEEE. pp. 62–67. 413:people.eecs.berkeley.edu 874:www.ieee-security.org 732:MIT Technology Review 251:large language models 236:large-language models 964:USENIX Security 2023 935:USENIX Security 2021 353:nicholas.carlini.com 636:2021Natur.591...22H 289:for implementing a 1003:"ICML 2024 Awards" 900:"ICML 2018 Awards" 829:from the original. 778:Newman, Lily Hay. 699:The New York Times 521:The New York Times 379:"Nicholas Carlini" 349:"Nicholas Carlini" 265:, by showing that 240:few shot prompting 220:Mozilla Foundation 973:978-1-939133-37-3 944:978-1-939133-24-3 456:978-1-6654-9952-1 148:computer security 137: 136: 69:Computer Security 59:Scientific career 1066: 1023: 1022: 1020: 1018: 999: 990: 989: 987: 985: 955: 949: 948: 926: 920: 919: 917: 915: 896: 890: 889: 887: 885: 866: 860: 859: 857: 855: 840:"The 27th IOCCC" 836: 830: 820: 814: 808: 807: 805: 803: 775: 769: 768: 766: 764: 749: 743: 742: 740: 738: 724: 718: 717: 715: 713: 690: 684: 683: 681: 679: 662: 656: 655: 615: 609: 608: 606: 604: 576: 570: 569: 567: 565: 551: 545: 544: 542: 540: 512: 506: 505: 503: 501: 479: 473: 472: 470: 468: 430: 424: 423: 421: 419: 405: 399: 398: 396: 394: 375: 369: 368: 366: 364: 345: 267:Stable Diffusion 263:diffusion models 192:Stable Diffusion 152:machine learning 140:Nicholas Carlini 133: 128: 125: 123: 102:Doctoral advisor 96: 35: 25:Nicholas Carlini 21: 20: 1074: 1073: 1069: 1068: 1067: 1065: 1064: 1063: 1029: 1028: 1027: 1026: 1016: 1014: 1001: 1000: 993: 983: 981: 974: 956: 952: 945: 927: 923: 913: 911: 898: 897: 893: 883: 881: 868: 867: 863: 853: 851: 838: 837: 833: 816: 815: 811: 801: 799: 776: 772: 762: 760: 750: 746: 736: 734: 726: 725: 721: 711: 709: 691: 687: 677: 675: 663: 659: 630:(7848): 22–25. 616: 612: 602: 600: 579:Simonite, Tom. 577: 573: 563: 561: 553: 552: 548: 538: 536: 513: 509: 499: 497: 480: 476: 466: 464: 457: 431: 427: 417: 415: 407: 406: 402: 392: 390: 377: 376: 372: 362: 360: 347: 346: 339: 334: 309: 283: 229:ICLR conference 200: 198:Life and career 144:Google DeepMind 120: 107:David A. Wagner 94: 79:Google DeepMind 43:Alma mater 38: 37:Carlini in 2022 26: 17: 12: 11: 5: 1072: 1062: 1061: 1056: 1051: 1046: 1041: 1025: 1024: 991: 972: 950: 943: 921: 891: 861: 831: 809: 770: 744: 719: 685: 657: 610: 571: 546: 507: 474: 455: 425: 400: 370: 336: 335: 333: 330: 329: 328: 325: 322: 319: 316: 313: 308: 305: 282: 279: 199: 196: 135: 134: 118: 114: 113: 110: 109: 104: 98: 97: 88: 82: 81: 76: 72: 71: 66: 62: 61: 55: 54: 44: 40: 39: 36: 28: 27: 24: 15: 9: 6: 4: 3: 2: 1071: 1060: 1059:Living people 1057: 1055: 1052: 1050: 1047: 1045: 1042: 1040: 1037: 1036: 1034: 1012: 1008: 1004: 998: 996: 979: 975: 969: 966:: 1631–1648. 965: 961: 954: 946: 940: 937:: 1577–1592. 936: 932: 925: 909: 905: 901: 895: 879: 875: 871: 865: 849: 845: 844:www.ioccc.org 841: 835: 828: 824: 819: 813: 797: 793: 789: 785: 781: 774: 759: 755: 748: 733: 729: 723: 708: 704: 700: 696: 689: 674: 673: 668: 661: 653: 649: 645: 641: 637: 633: 629: 625: 621: 614: 598: 594: 590: 586: 582: 575: 560: 556: 550: 534: 530: 526: 522: 518: 511: 495: 491: 490: 485: 478: 462: 458: 452: 448: 444: 440: 436: 429: 414: 410: 404: 388: 384: 380: 374: 358: 354: 350: 344: 342: 337: 326: 323: 320: 317: 314: 311: 310: 304: 302: 301: 296: 292: 288: 278: 276: 272: 268: 264: 260: 257:could output 256: 252: 247: 245: 241: 237: 232: 230: 225: 221: 217: 213: 209: 205: 195: 193: 189: 183: 181: 177: 173: 169: 164: 161: 157: 153: 149: 145: 141: 132: 127: 119: 115: 111: 108: 105: 103: 99: 92: 89: 87: 83: 80: 77: 73: 70: 67: 63: 60: 56: 52: 48: 45: 41: 34: 29: 22: 19: 1017:September 2, 1015:. Retrieved 1006: 984:September 2, 982:. Retrieved 963: 953: 934: 924: 914:September 2, 912:. Retrieved 903: 894: 884:September 2, 882:. Retrieved 873: 864: 852:. Retrieved 843: 834: 817: 812: 800:. Retrieved 783: 773: 761:. Retrieved 758:Ars Technica 757: 747: 735:. Retrieved 731: 722: 710:. Retrieved 698: 688: 676:. Retrieved 672:Science News 670: 660: 627: 623: 613: 601:. Retrieved 584: 574: 562:. Retrieved 558: 549: 537:. Retrieved 520: 510: 498:. Retrieved 489:Fast Company 487: 477: 465:. Retrieved 438: 428: 416:. Retrieved 412: 403: 391:. Retrieved 382: 373: 361:. Retrieved 352: 298: 284: 248: 233: 208:David Wagner 201: 184: 165: 139: 138: 90: 75:Institutions 58: 18: 409:"Graduates" 383:AI for Good 291:tic-tac-toe 176:UC Berkeley 1033:Categories 332:References 281:Other work 224:DeepSpeech 172:DeepSpeech 792:1059-1028 707:0362-4331 593:1059-1028 529:0362-4331 1011:Archived 978:Archived 908:Archived 878:Archived 854:July 26, 848:Archived 827:archived 818:J. DOE 1 802:July 26, 796:Archived 763:July 26, 737:July 26, 712:July 26, 678:July 26, 652:33658699 597:Archived 533:Archived 494:Archived 461:Archived 387:Archived 357:Archived 277:models. 124:.carlini 122:nicholas 1007:icml.cc 904:icml.cc 632:Bibcode 603:June 4, 564:June 4, 539:June 4, 500:June 4, 467:June 4, 418:June 4, 393:June 4, 363:June 4, 271:ChatGPT 168:Mozilla 117:Website 970: 941: 825:, 790: 705: 650: 624:Nature 591: 527: 453: 307:Awards 303:] 295:printf 244:Google 95:(2018) 93: 86:Thesis 65:Fields 784:Wired 585:Wired 287:IOCCC 255:GPT-2 188:GPT-2 1019:2024 986:2024 968:ISBN 939:ISBN 916:2024 886:2024 856:2024 823:Text 804:2024 788:ISSN 765:2024 739:2024 714:2024 703:ISSN 680:2024 648:PMID 605:2024 589:ISSN 566:2024 559:CNET 541:2024 525:ISSN 502:2023 469:2024 451:ISBN 420:2024 395:2024 365:2024 150:and 126:.com 640:doi 628:591 443:doi 300:sic 222:'s 182:. 170:'s 51:PhD 1035:: 1009:. 1005:. 994:^ 976:. 962:. 933:. 906:. 902:. 876:. 872:. 846:. 842:. 794:. 786:. 782:. 756:. 730:. 701:. 697:. 669:. 646:. 638:. 626:. 622:. 595:. 587:. 583:. 557:. 531:. 523:. 519:. 492:. 486:. 459:. 449:. 437:. 411:. 385:. 381:. 355:. 351:. 340:^ 194:. 1021:. 988:. 947:. 918:. 888:. 858:. 806:. 767:. 741:. 716:. 682:. 654:. 642:: 634:: 607:. 568:. 543:. 504:. 471:. 445:: 422:. 397:. 367:. 53:) 49:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index