Liwei Jiang Profile Picture

Liwei Jiang | 姜力炜 Ph.D. Candidate

Paul G. Allen School of Computer Science & Engineering, University of Washington

I am a Ph.D. candidate at Paul G. Allen School of Computer Science & Engineering, University of Washington, advised by Professor Yejin Choi. I am also a graduate research intern at NVIDIA and was previously a student researcher at Allen Institute for Artificial Intelligence (Ai2). I earned B.A. in Computer Science and B.A. in Mathematics from Colby College.

My research centers on humanistic AI safety, aiming to foster the synergistic, secure, and sustainable coexistence of AI and society—ultimately shaping the co-evolution of AI and humanity:

From Human to AI: Developing human-centered, continually-evolving, and future-oriented AI systems—drawing upon interdisciplinary insights into human intelligence, values, and global needs.

From AI to Human: Advancing the frontiers of knowledge about individuals and society, augmenting human capabilities, and addressing consequential sociotechnical challenges—through robust, efficient, and scalable innovations in data, learning algorithms, and system design.

My current research spans pluralistic alignment, self-improving algorithms for steerable and secure language models, anticipatory strategies for long-term risks—including overreliance and the erosion of human creativity—and socially beneficial applications.

Artificial Intelligence AI Safety Natural Language Processing Machine Learning Pluralistic Alignment Human-AI Interaction

News

(Upcoming) Oct 2025: Co-organizing another edition of the SoLaR workshop on Socially Responsible Language Modelling Research at COLM 2025.
(Upcoming) July 2025: Our tutorial on Guardrails and Security for LLMs: Safe, Secure, and Controllable Steering of LLM Applications is accepted at ACL 2025. See you in Vienna!
Oct/Nov 2024: Give two guest lectures on How to Build AI with Deep Concerns for Human Traits, Values, and Needs? at UIUC and University of Pittsburgh.
Oct/Nov 2024: Give two guest lectures on LLM Reasoning (In-Context Learning, Prompting, and Reasoning) at KAIST and UW.
Jan 2024: Lead of TA effort and co-design (a LLM version!) for the class module of CSE 447/517, the undergrad and grad NLP class at UW CSE.

Publications (*, + indicate equal contribution) Google Scholar

Preprints
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Maarten Sap, Yulia Tsvetkov, Nouha Dziri, Alon Albalak, Yejin Choi
In submission (PDF coming soon)
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Liwei Jiang*, Mickel Liu*, Yancheng Liang, Simon S. Du, Yejin Choi, Tim Althoff+, Natasha Jaques+
In submission (PDF coming soon)
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
Liwei Jiang*, Salman Rahman*, James Shiffer*, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel
In submission
AI Debate Aids Assessment of Controversial Claims
Salman Rahman, Sheriff Issaka, Ashima Suvarna, Genglin Liu, James Shiffer, Jaeyoung Lee, Md Rizwan Parvez, Hamid Palangi, Shi Feng, Nanyun Peng, Yejin Choi, Julian Michael, Liwei Jiang, Saadia Gabriel
In submission
HAICosystem: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Xuhui Zhou, Hyunwoo Kim*, Faeze Brahman*, Liwei Jiang, Hao Zhu, Ximing Lu, Frank Xu, Bill Yuchen Lin, Yejin Choi, Niloofar Mireshghallah, Ronan Le Bras, Maarten Sap
In submission
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar*, Devansh Jain*, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Tom Hartvigsen, Maarten Sap
In submission
WildHallucination: Evaluating Long-Form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi
Preprint
2025
Investigating Machine Moral Judgment through the Delphi Experiment 📘
Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Sydney Levine, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Taylor Sorensen, Jon Borchardt, Jack Hessel, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi
Nature Machine Intelligence, 2025
Can Language Models Reason about Individualistic Human Values and Preferences?
Liwei Jiang, Taylor Sorensen, Sydney Levine, Yejin Choi
ACL 2025
CulturalBench: Building a Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming
Yu Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, Yejin Choi
ACL 2025
Position Paper: Political Neutrality in AI is Impossible—But Here is How to Approximate it
Jillian Fisher, Ruth Elisabeth Appel, Chan Young Park, Yujin Potter, Liwei Jiang, Taylor Sorensen, Shangbin Feng, Yulia Tsvetkov, Margaret Roberts, Jennifer Pan, Dawn Song, Yejin Choi
ICML 2025
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne G. E. Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine
ICML 2025
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu, Liwei Jiang, Yejin Choi
(Spotlight) ICLR 2025
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi
(Oral) ICLR 2025
To Err is AI: A Case Study Informing LLM Flaw Reporting Practices
Sean McGregor, Allyson Ettinger, Nick Judd, Paul Albee, Liwei Jiang, Kavel Rao, Will Smith, Shayne Longpre, Avijit Ghosh, Christopher Fiorelli, Michelle Hoang, Sven Cattell, Nouha Dziri
IAAI 2025
2024
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Liwei Jiang, Kavel Rao*, Seungju Han*, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri
NeurIPS 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han*, Kavel Rao*, Liwei Jiang+, Allyson Ettinger+, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri
NeurIPS Datasets & Benchmarks 2024
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi
AAAI 2024
Culture-Gen: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Huihan Li, Liwei Jiang, Jena D. Huang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi
COLM 2024
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
ICLR 2024
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, Maarten Sap
AIES 2024
Information-Theoretic Distillation for Reference-less Summarization
Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi
COLM 2024
Position Paper: A Roadmap to Pluralistic Alignment
Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi
ICML 2024
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi
NAACL 2024
JAMDEC: Unsupervised Authorship Obfuscation Using Constrained Decoding Over Small Language Models
Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi
NAACL 2024
MigraineTracker: Examining Patient Experiences with Goal-Directed Self-Tracking for a Chronic Health Condition 🏆
Yasaman S. Sefidgar, Carla L. Castillo, Shaan Chopra, Liwei Jiang, Tae Jones, Anant Mittal, Hyeyoung Ryu, Jessica Schroeder, Allison Cole, Natalia Murinova, Sean A. Munson, James Fogarty
(Outstanding Paper Award) CHI 2024
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Peter West*, Ximing Lu*, Nouha Dziri*, Faeze Brahman*, Linjie Li*, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi
ICLR 2024
2023
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Liwei Jiang*, Kavel Rao*, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi
(Findings) EMNLP 2023
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi
(Findings) EMNLP 2023
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization 🏆
Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi
(Outstanding Paper Award) EMNLP 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu
EMNLP 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Yejin Choi
EMNLP 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap
(Short Paper) EMNLP 2023
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri*, Ximing Lu*, Melanie Sclar*, Liwei Jiang+, Xiang Lorraine Li+, Bill Yuchen Lin+, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi
(Spotlight) NeurIPS 2023
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations
Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi, Chandra Bhagavatula
ACL 2023
2022
Aligning to Social Norms and Values in Interactive Narratives
Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi
NAACL 2022
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West, Chandra Bhagavatula*, Jack Hessel*, Jena D. Hwang*, Liwei Jiang*, Ronan Le Bras*, Ximing Lu*, Sean Welleck*, Yejin Choi
NAACL 2022
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics 🏆
Ximing Lu, Sean Welleck*, Peter West*, Liwei Jiang+, Jungo Kasai+, Daniel Khashabi+, Ronan Le Bras+, Lianhui Qin+, Youngjae Yu+, Rowan Zellers+, Noah A. Smith, Yejin Choi
(Best Paper Award) NAACL 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Hyunwoo Kim*, Youngjae Yu*, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap
EMNLP 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu, Sean Welleck*, Jack Hessel*, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi
NeurIPS 2022
2021
“I’m Not Mad”: Commonsense Implications of Negation and Contradiction
Liwei Jiang, Antoine Bosselut, Chandra Bhagavatula, Yejin Choi
NAACL-HLT 2021
EnglishBot: An AI-Powered Conversational System for Second Language Learning
Liwei Jiang*, Sherry Ruan*, Qianyao Xu*, Zhiyuan Liu, Glenn M. Davis, Emma Brunskill, James A. Landay
ACM IUI 2021
2019
QuizBot: A Dialogue-Based Adaptive Learning System for Factual Knowledge
Sherry Ruan, Liwei Jiang, Justin Xu, Bryce Joe-Kun Tham, Zhengneng Qiu, Yeshuang Zhu, Elizabeth L. Murnane, Emma Brunskill, James A. Landay
CHI 2019

Education & Experience

Education

University of Washington

Ph.D. in Computer Science and Engineering
Sept 2019 - Dec 2025 (expected), Seattle, Washington

Colby College

B.A. in Computer Science and Mathematics (Summa Cum Laude, Top 0.5%)
Sept 2015 - Dec 2018, Waterville, Maine

Professional Experience

NVIDIA

Research Intern at the Nemo-Guardrail Team
March 2025 - Present, Santa Clara, California

Allen Institute for Artificial Intelligence (Ai2)

Graduate Research Intern at the Mosaic and AllenNLP Teams
June 2020 - December 2024, Seattle, Washington

The Future Laboratory, Tsinghua University

Visit Student Researcher
July 2019 - September 2019, Beijing, China

Stanford University

Undergraduate Research Intern at the Computer Science Department
June 2017 - September 2019, Stanford, California

Honors & Awards

Outstanding Paper Award

CHI 2024 (May 2024)

Outstanding Paper Award

EMNLP 2023 (Dec 2023)

Best Paper Award

NAACL 2022 (Jul 2022)

Anne Dinning - Michael Wolf Endowed Regental Fellowship

University of Washington (2019)
Paul G. Allen School First-Year Ph.D. Fellowship

Member of the Phi Beta Kappa Society

Colby College (2018)
Elected as a member of Phi Beta Kappa with junior standing

Honorable Mention of Interdisciplinary Contest in Modeling (ICM)

COMAP (2018)
20th annual Interdisciplinary Contest in Modeling (ICM)

Phi Beta Kappa Undergraduate Scholastic Achievement Award

Colby College (2017)
Top two students in the sophomore and junior classes

Julius Seelye Bixler Scholar

Colby College (2016, 2017, 2018)
Top-ranking students as determined by the cumulative academic record, three-time recipient

Phi Beta Kappa Summer Research Scholar

Colby College (2016)
Summer research stipend