What is Nicholas Carlini known for?

American artificial intelligence researcher

Nicholas Carlini: Security researcher (1990 – present)

Nicholas Carlini is an American researcher affiliated with Anthropic and previously with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses ag

Early Life and Education

Nicholas Carlini earned a Bachelor of Arts in Computer Science and Mathematics from the University of California, Berkeley in 2013. He remained at Berkeley for his doctoral studies, completing his PhD in 2018 under the supervision of David Wagner, a prominent figure in computer security research.

Adversarial Machine Learning

Carlini's most widely cited contribution to the field is the Carlini & Wagner (C&W) attack, developed in collaboration with his PhD advisor David Wagner in 2016. The attack is a method for generating adversarial examples — inputs crafted to cause machine learning models to produce incorrect outputs. It proved particularly effective against defensive distillation, a technique in which a student model is trained on the probabilistic outputs of a parent model to improve robustness and generalizability. The C&W attack demonstrated that defensive distillation offered far less protection than previously believed, and its methodology was subsequently shown to defeat most other proposed defenses against adversarial inputs, significantly raising the bar for what constitutes a credible defense evaluation.

In 2018, Carlini demonstrated a further adversarial attack targeting Mozilla Foundation's DeepSpeech automatic speech recognition model. By embedding hidden commands within ordinary audio input, he showed that the model would respond to those commands even when they were entirely inaudible to human listeners. That same year, Carlini led a team at UC Berkeley that evaluated eleven papers presenting adversarial defenses accepted at the International Conference on Learning Representations (ICLR). The team successfully broke seven of those defenses, findings published under the title "Obfuscated Gradients Give a False Sense of Security," which received the Best Paper Award at ICML 2018.

Privacy of Machine Learning Models

Beginning around 2020, Carlini shifted significant research attention toward the privacy risks inherent in large-scale machine learning models. He demonstrated for the first time that large language models could memorize portions of their training data and reproduce them verbatim. Using GPT-2 as a case study, he showed that the model could output personally identifiable information present in its training corpus. Follow-on research examined how this memorization behavior scaled with model size, finding that larger models exhibited greater memorization.

In 2022, Carlini extended this line of inquiry to generative image models, showing that Stable Diffusion could reproduce recognizable images of individuals' faces drawn from its training data. He subsequently demonstrated that ChatGPT would, under certain conditions, output exact copies of webpages it had been trained on, including personally identifiable information. Several of these studies have been referenced in legal proceedings concerning the copyright status of AI-generated content and training data.

Other Research

Carlini and collaborators also investigated the question-answering capabilities of AI models relative to humans, constructing a benchmark on which human participants typically scored around 35% while AI models scored in the 40% range. GPT-3 achieved 38% on this benchmark, improvable to 40% through few-shot prompting, while Google's UnifiedQA model was the top performer. Additionally, Carlini has researched methods by which large language models such as ChatGPT can be prompted to respond to harmful queries.

Outside of machine learning security, Carlini received the Best of Show award at the 2020 International Obfuscated C Code Contest (IOCCC) for an entry that implemented a tic-tac-toe game entirely through calls to the C standard library function printf, building on a research paper he had authored in 2015.

Recognition

Carlini has received numerous paper awards across top-tier venues. These include the Best Student Paper Award at IEEE S&P 2017, Best Paper Awards at ICML 2018 and ICML 2024 (for two separate papers), and Distinguished Paper Awards at USENIX Security in both 2021 and 2023. He has been affiliated with Google DeepMind and, subsequently, Anthropic.