Liwei Jiang

Liwei Jiang | 姜力炜 Ph.D. Candidate

Paul G. Allen School of Computer Science & Engineering, University of Washington

Contact: lwjiang@cs.washington.edu | Google Scholar | LinkedIn | GitHub | X

⛰️ I'm on the job market for 2026! Please feel free to reach out if you find my background a good fit for your organization.

I am a final-year Ph.D. candidate at Paul G. Allen School of Computer Science & Engineering, University of Washington, advised by Prof. Yejin Choi. I am also a graduate student researcher at NVIDIA and was previously a student researcher at Allen Institute for Artificial Intelligence (Ai2).

My research centers on Humanistic, Pluralistic, and Coevolutionary AI Safety and Alignment aiming to foster the long-term secure, sustainable, and synergistic coevolution of AI and humanity:

From Human to AI: Developing human-centered, ever-evolving, and future-oriented AI systems, anchored in interdisciplinary insights into human intelligence, values, and global needs.

From AI to Human: Advancing the frontiers of human knowledge, augmenting human capabilities, and addressing consequential sociotechnical challenges through robust, efficient, and scalable innovations in data, learning algorithms, and AI system design.

My current research focuses on developing data, algorithmic, and system-level solutions to address sociotechnical challenges in AI safety, security, and LLM alignment, often through multi-agent, RL, and data synthesis angles. My works have spearheaded research on moral and (pluralistic) value reasoning of LLMs. An overview of my research:

Artificial Intelligence Natural Language Processing AI Safety Machine Learning Pluralistic Alignment Human-AI Interaction

News

Dec 2025: Our paper Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) received the 🏆 Best Paper Award at NeurIPS 2025!

Dec 2025: I'll be traveling to San Diego for NeurIPS 2025 for presenting two accepted papers: Oral for Artificial Hivemind and poster for AI debates for controversial claims. I'll also be attending the Alignment Workshop! Please feel free to reach out if you'd like to chat!

Nov 2025: Give a guest lectures on In-Context Learning, Prompting, and Basics of Reasoning at CSE 447 (Undergrad NLP) at UW.

Oct 2025: Our paper Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models won the 🏆 Outstanding Paper Award at the AIA workshop @ COLM 2025!

Oct 2025: Co-organize the 3rd edition of the Socially Responsible Language Modelling Research (SoLaR) Workshop at COLM 2025.

July 2025: Co-present our tutorial on Guardrails and Security for LLMs: Safe, Secure, and Controllable Steering of LLM Applications at ACL 2025! Check out the slides at the tutorial website!

March/May 2025: Give two guest lectures on Red-Teaming and Safeguarding Language Models: Current Practices, Challenges, and Future Directions at CMU and UCLA.

March 2025: Started my internship with NVIDIA NeMo Guardrails team and began my visit at Stanford!

Dec 2024: Co-organize the 2nd edition the Socially Responsible Language Modelling Research (SoLaR) Workshop at NeurIPS 2024.

Oct/Nov 2024: Give two guest lectures on How to Build AI with Deep Concerns for Human Traits, Values, and Needs? at UIUC and University of Pittsburgh.

Oct/Nov 2024: Give two guest lectures on LLM Reasoning (In-Context Learning, Prompting, and Reasoning) at KAIST and UW.

May 2024: Our paper MigraineTracker: Examining Patient Experiences with Goal-Directed Self-Tracking for a Chronic Health Condition recieved the 🏆 Best Paper Award at CHI 2024!

Jan 2024: Lead of TA effort and co-design (a LLM version!) for the class module of CSE 447/517, the undergrad and grad NLP class at UW CSE.

Dec 2023: Co-organize the AI Meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics (MP2) Workshop at NeurIPS 2023.

Dec 2023: Our paper SODA: Million-Scale Dialogue Distillation with Social Commonsense Contextualization recieved the 🏆 Outstanding Paper Award at EMNLP 2023!

July 2022: Our paper NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics recieved the 🏆 Best Paper Award at NAACL 2022!

June 2020: Started my internship at Ai2!

Publications (*, + indicate equal contribution) Google Scholar

Preprints

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Mickel Liu*, Liwei Jiang*, Yancheng Liang, Simon S. Du, Yejin Choi, Tim Althoff+, Natasha Jaques+

In submission

PDF Code

Multi-Agent Systems in the Era of Foundation Models (MAS) Workshop @ ICML 2025

🏆 Outstanding Paper Award (Oral)

AI Agents: Capabilities and Safety (AIA) Workshop @ COLM 2025

HieraSuite: A Holistic Toolkit for Building Versatile System-User Instruction Hierarchy

Liwei Jiang, Erick Galinkin, Makesh Narsimhan Sreedhar, Chong Xiang, Yejin Choi, Traian Rebedea, Christopher Parisien

In submission

PDF (Forthcoming)

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

Taylor Sorensen, Benjamin Newman, Jared Moore, Chan Young Park, Jillian Fisher, Niloofar Mireshghallah, Liwei Jiang, Yejin Choi

In submission

PDF Code Model

Activation Steering for LLM Alignment via a Unified ODE-Based Framework

Hongjue Zhao, Haosen Sun, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek F. Abdelzaher, Yejin Choi, Manling Li, Huajie Shao

In submission

PDF (Forthcoming)

PluralisticBehaviorSuite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies

Prasoon Varshney*, Makesh Narsimhan Sreedhar*, Liwei Jiang, Traian Rebedea, Christopher Parisien

In submission

PDF

Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop @ NeurIPS 2025

WildHallucination: Evaluating Long-Form Factuality in LLMs with Real-World Entity Queries

Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

Preprint

PDF 🎙️ TechCrunch

2025

Investigating Machine Moral Judgment through the Delphi Experiment 📘

Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Sydney Levine, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Taylor Sorensen, Jon Borchardt, Jack Hessel, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi

Nature Machine Intelligence, 2025

PDF Code 🎙️ The New York Times The New Yorker Vox IEEE Spectrum The Guardian Nature Outlook Wired TechXplore

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) 🏆 Best Paper Award

Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Alon Albalak, Yejin Choi

(Oral) NeurIPS 2025, Datasets & Benchmarks Track

PDF Code Data

Can Language Models Reason about Individualistic Human Values and Preferences?

Liwei Jiang, Taylor Sorensen, Sydney Levine, Yejin Choi

ACL 2025

PDF Data

Pluralistic Alignment Workshop @ NeurIPS 2024

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman*, Liwei Jiang*, James Shiffer*, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel

COLM 2025

PDF Code Data Blog

Multi-Agent Systems in the Era of Foundation Models (MAS) Workshop @ ICML 2025

CulturalBench: Building a Robust, Diverse, and Challenging Cultural Benchmark by Human-AI CulturalTeaming

Yu Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, Yejin Choi

ACL 2025

PDF Data

AI Debate Aids Assessment of Controversial Claims

Salman Rahman, Sheriff Issaka, Ashima Suvarna, Genglin Liu, James Shiffer, Jaeyoung Lee, Md Rizwan Parvez, Hamid Palangi, Shi Feng, Nanyun Peng, Yejin Choi, Julian Michael, Liwei Jiang, Saadia Gabriel

NeurIPS 2025

PDF Code

Multi-Agent Systems in the Era of Foundation Models (MAS) Workshop @ ICML 2025

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Yu Ying Chiu, Liwei Jiang, Yejin Choi

(Spotlight, Top 5.1%) ICLR 2025

PDF Code Data

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Priyanshu Kumar*, Devansh Jain*, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Tom Hartvigsen, Maarten Sap

COLM 2025

PDF Model & Data Code

HAICosystem: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Xuhui Zhou, Hyunwoo Kim*, Faeze Brahman*, Liwei Jiang, Hao Zhu, Ximing Lu, Frank Xu, Bill Yuchen Lin, Yejin Choi, Niloofar Mireshghallah, Ronan Le Bras, Maarten Sap

COLM 2025

PDF Code Blog

Towards Safe & Trustworthy Agents Workshop @ NeurIPS 2024

Position Paper: Political Neutrality in AI is Impossible—But Here is How to Approximate it

Jillian Fisher, Ruth Elisabeth Appel, Chan Young Park, Yujin Potter, Liwei Jiang, Taylor Sorensen, Shangbin Feng, Yulia Tsvetkov, Margaret Roberts, Jennifer Pan, Dawn Song, Yejin Choi

(Oral, Top 3.3%) ICML 2025

PDF 🎙️ Stanford HAI Policy Brief

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne G. E. Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine

ICML 2025

PDF Code Data Model: Benefit Model: Harm Blog

Pluralistic Alignment Workshop @ NeurIPS 2024

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi

(Oral, Top 1.8%) ICLR 2025

PDF Code Demo 🎙️ Science Magazine News

To Err is AI: A Case Study Informing LLM Flaw Reporting Practices

Sean McGregor, Allyson Ettinger, Nick Judd, Paul Albee, Liwei Jiang, Kavel Rao, Will Smith, Shayne Longpre, Avijit Ghosh, Christopher Fiorelli, Michelle Hoang, Sven Cattell, Nouha Dziri

IAAI 2025

PDF

2024

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Liwei Jiang, Kavel Rao*, Seungju Han*, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

NeurIPS 2024

PDF Code Data Model Blog Eval Code

NextGenAISafety Workshop @ ICML 2024

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Seungju Han*, Kavel Rao*, Allyson Ettinger+, Liwei Jiang+, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri

NeurIPS 2024, Datasets & Benchmarks Track

PDF Code Model Data

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi

(Oral, Top 3%) AAAI 2024

PDF Code Data

Culture-Gen: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Huihan Li, Liwei Jiang, Jena D. Hwang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi

COLM 2024

PDF Code

TrustNLP Workshop @ NAACL 2024

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren

(Oral, Top 1.2%) ICLR 2024

PDF Code

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, Maarten Sap

AIES 2024

PDF

Information-Theoretic Distillation for Reference-less Summarization

Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi

COLM 2024

PDF

Position Paper: A Roadmap to Pluralistic Alignment

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi

ICML 2024

PDF 🎙️ Jack Clark’s Import AI Interconnects

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing

Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi

NAACL 2024

PDF

JAMDEC: Unsupervised Authorship Obfuscation Using Constrained Decoding Over Small Language Models

Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi

NAACL 2024

PDF Code

MigraineTracker: Examining Patient Experiences with Goal-Directed Self-Tracking for a Chronic Health Condition 🏆 Best Paper Award

Yasaman S. Sefidgar, Carla L. Castillo, Shaan Chopra, Liwei Jiang, Tae Jones, Anant Mittal, Hyeyoung Ryu, Jessica Schroeder, Allison Cole, Natalia Murinova, Sean A. Munson, James Fogarty

(Oral) CHI 2024

PDF

The Generative AI Paradox: "What It Can Create, It May Not Understand"

Peter West*, Ximing Lu*, Nouha Dziri*, Faeze Brahman*, Linjie Li*, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

ICLR 2024

PDF

2023

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Kavel Rao*, Liwei Jiang*, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

(Findings) EMNLP 2023

PDF Data

NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation

Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi

(Findings) EMNLP 2023

PDF

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization 🏆 Outstanding Paper Award

Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi

(Oral) EMNLP 2023

PDF Code Data Blog

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

(Oral) EMNLP 2023

PDF Blog Code

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Yejin Choi

EMNLP 2023

PDF Code

BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases

Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap

(Short Paper) EMNLP 2023

PDF

Faith and Fate: Limits of Transformers on Compositionality

Nouha Dziri*, Ximing Lu*, Melanie Sclar*, Xiang Lorraine Li+, Liwei Jiang+, Bill Yuchen Lin+, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi

(Spotlight, Top 3.1%) NeurIPS 2023

PDF Code 🎙️ ScienceNews Quanta Magazine

ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations

Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi, Chandra Bhagavatula

ACL 2023

PDF Code

2022

Aligning to Social Norms and Values in Interactive Narratives

Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi

NAACL 2022

PDF

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Peter West, Chandra Bhagavatula*, Jack Hessel*, Jena D. Hwang*, Liwei Jiang*, Ronan Le Bras*, Ximing Lu*, Sean Welleck*, Yejin Choi

NAACL 2022

PDF Code

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics 🏆 Best Paper Award

Ximing Lu, Sean Welleck*, Peter West*, Liwei Jiang+, Jungo Kasai+, Daniel Khashabi+, Ronan Le Bras+, Lianhui Qin+, Youngjae Yu+, Rowan Zellers+, Noah A. Smith, Yejin Choi

(Oral) NAACL 2022

PDF

ProsocialDialog: A Prosocial Backbone for Conversational Agents

Hyunwoo Kim*, Youngjae Yu*, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap

EMNLP 2022

PDF 🎙️ BBC Science Focus

Quark: Controllable Text Generation with Reinforced Unlearning

Ximing Lu, Sean Welleck, Liwei Jiang, Jack Hessel, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

(Oral) NeurIPS 2022

PDF Code

2021

“I’m Not Mad”: Commonsense Implications of Negation and Contradiction

Liwei Jiang, Antoine Bosselut, Chandra Bhagavatula, Yejin Choi

NAACL-HLT 2021

PDF Data

EnglishBot: An AI-Powered Conversational System for Second Language Learning

Sherry Ruan*, Liwei Jiang*, Qianyao Xu*, Zhiyuan Liu, Glenn M. Davis, Emma Brunskill, James A. Landay

(Oral) ACM IUI 2021

PDF

2019

QuizBot: A Dialogue-Based Adaptive Learning System for Factual Knowledge

Sherry Ruan, Liwei Jiang, Justin Xu, Bryce Joe-Kun Tham, Zhengneng Qiu, Yeshuang Zhu, Elizabeth L. Murnane, Emma Brunskill, James A. Landay

(Oral) CHI 2019

PDF 🎙️ World Economic Forum Stanford Report EdScoop

Education & Experience

Education

University of Washington

Ph.D. in Computer Science and Engineering

Sept 2019 - Dec 2025 (expected), Seattle, Washington

Colby College

B.A. in Computer Science and Mathematics (Summa Cum Laude, Top 0.5%)

Sept 2015 - Jan 2019, Waterville, Maine

Professional Experience

NVIDIA

Graduate Student Researcher at Nemo-Guardrail Team

March 2025 - Present, Santa Clara, California

Allen Institute for Artificial Intelligence (Ai2)

Graduate Student Researcher at Mosaic and AllenNLP Teams

June 2020 - December 2024, Seattle, Washington

Stanford University

Undergraduate Research Intern at the Computer Science Department

June 2017 - September 2019, Stanford, California

The Future Laboratory, Tsinghua University

Visiting Student Researcher

July 2019 - September 2019, Beijing, China

Awards

Best Paper Award

NeurIPS 2025 (Dec 2025)

Outstanding Paper Award

AI Agents: Capabilities and Safety (AIA) Workshop @ COLM 2025 (Oct 2025)

Best Paper Award

CHI 2024 (May 2024)

Outstanding Paper Award

EMNLP 2023 (Dec 2023)

Best Paper Award

NAACL 2022 (Jul 2022)

Anne Dinning - Michael Wolf Endowed Regental Fellowship

University of Washington (2019)

Paul G. Allen School First-Year Ph.D. Fellowship

Member of the Phi Beta Kappa Society

Colby College (2018)

Elected as a member of Phi Beta Kappa with junior standing

Honorable Mention of Interdisciplinary Contest in Modeling (ICM)

COMAP (2018)

20th annual Interdisciplinary Contest in Modeling (ICM)

Phi Beta Kappa Undergraduate Scholastic Achievement Award

Colby College (2017)

Top two students in the sophomore and junior classes

Julius Seelye Bixler Scholar

Colby College (2016, 2017, 2018)

Top-ranking students as determined by the cumulative academic record, three-time recipient

Phi Beta Kappa Summer Research Scholar

Colby College (2016)

Summer research stipend

Teaching & Services

Courses

CSE447/517 Natural Language Processing—An LLM Version (Spring 2024, Grad + Undergrad. Instructed by Prof. Yejin Choi)

Head TA for the NLP class with 230+ undergraduate and graduate students

Co-design the class module, including teaching materials and assignments

Slides (NLP Overview) Slides (Transformer) Slides (Pre-Training)

CSE 599 D1 Exploration on Language, Knowledge, and Reasoning (Winter 2023, Grad, Instructed by Prof. Yejin Choi)

Lead TA for a graduate-level seminar with over 30 students

Conference Tutorials

Guardrails and Security for LLMs: Safe, Secure, and Controllable Steering of LLM Applications

July 2025, Co-Instructor, ACL 2025

Slides (LLM Security) Slides (LLM Alignment and Misalignment) Slides (LLM Agent Safety)

Guest Lectures

CSE 447: Natural Language Processing, University of Washington

In-Context Learning, Prompting, and Basics of Reasoning Slides

Nov 2025 (Instructor: Yulia Tsvetkov)

COM SCI 162: Natural Language Processing, UCLA

Red-Teaming and Safeguarding Language Models: Current Practices, Challenges, and Future Directions Slides

May 2025 (Instructor: Saadia Gabriel)

11-830: Ethics, Social Biases, and Positive Impact in Language Technologies, CMU

Red-Teaming and Safeguarding Language Models: Current Practices, Challenges, and Future Directions Slides

Feb 2025 (Instructor: Maarten Sap)

IS504: Sociotechnical Information Systems, UIUC

How to Build AI with Deep Concerns for Human Traits, Values, and Needs? Slides

Nov 2024 (Instructor: Yue Guo)

CS475: ML for NLP, KAIST, South Korea

LLM Reasoning (In-Context Learning, Prompting, and Reasoning) Slides

Nov 2024 (Instructor: Alice Oh)

CSE 447: Natural Language Processing, University of Washington

In-Context Learning, Prompting, and Basics of Reasoning Slides

Nov 2024 (Instructor: Yulia Tsvetkov)

CS1684/2084: Bias and Ethical Implications in Artificial Intelligence, University of Pittsburgh

How to Build AI with Deep Concerns for Human Traits, Values, and Needs? Slides

Oct 2024 (Instructor: Xiang Lorraine Li)

CSE 163: Intermediate Data Programming, University of Washington

How to Build AI with Deep Concerns for Human Traits, Values, and Needs?

Aug 2024 (Instructor: Yuxuan Mei)

Ethics and Citizenship, The Downtown School, Seattle

Can We Teach Machines Human Ethics and Values?

Sept 2023, w/ Valentina Pyatkin and Taylor Sorensen

CS496: AI Perspectives: Symbolic Reasoning to Deep Learning, Northwestern University

Toward Interpretable and Interactive Socially & Ethically Informed AI

March 2023 (Instructor: Mohammed Anwarul Alam)

LAW E 553: Technology Law And Public Policy Seminar, University of Washington

Toward Interpretable and Interactive Socially & Ethically Informed AI

March 2023 (Instructor: Inyoung Cheong)

Ethics and Citizenship, The Downtown School, Seattle

Toward Socially Aware & Ethically Informed AI Slides

Sept 2022, w/ Saadia Gabriel

HONORS 222 B: Artificial Intelligence Meets Society, University of Washington

Toward Ethically Informed & Socially Aware AI

May 2022 (Instructor: Richard Freeman)

Workshop Organizations

3rd Edition of Socially Responsible Language Modelling Research (SoLaR)

Oct 2025, Co-Organizer, COLM 2025

2nd Edition of Socially Responsible Language Modelling Research (SoLaR)

Dec 2024, Co-Organizer, NeurIPS 2024

AI Meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics (MP2)

Dec 2023, Co-Organizer, NeurIPS 2023

Talks

Netskope

WildTeaming and WildGuard: Building Robust Model-Level and System-Level Safeguards of Language Models

May 2025, Speaker

Darpa ITM PI Meeting

Can Language Models Reason about Individualistic Human Values and Preferences?

March 2025, Speaker

University of Washington, Foster School of Business, Computational Minds and Machines lab

How to Build Machines with Deep Concerns of Human Traits, Values, and Needs?—Towards Humanistic AI Alignment

Feb 2025, Speaker (Hosted by Max Kleiman-Weiner)

Annual Research Showcase and Open House Event, UW CSE

AI Safety Panel

Oct 2024, Panelist

All-Ai2 Meeting, Allen Institute for Artificial Intelligence (Ai2)

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer LMs

July 2024, Speaker

The Big Picture Workshop, EMNLP, Singapore

On the Outcomes of Scientific Disagreements on Machine Morality

Dec 2023, Speaker

Darpa ITM Kickoff PI Meeting

Toward Interpretable and Interactive Socially & Ethically Informed AI

May 2023, Speaker

Mosaic Morality & AI Series, Allen Institute for Artificial Intelligence (Ai2)

Toward Interpretable, Interactive, Informative Machine Moral Reasoning

Feb 2023, Discussant

UW NLP Retreat

Toward Socially Aware & Ethically Informed AI

Sept 2022, Speaker

All-Ai2 Meeting, Allen Institute for Artificial Intelligence (Ai2)

Delphi: Toward Machine Ethics and Norms

Oct 2021, Speaker

Personal

I deeply value mentorship and am profoundly grateful to the mentors who have shaped and supported my research journey (in alphabetical order): Chandra Bhagavatula, Antoine Bosselut, Yejin Choi, Oren Etzioni, Erick Galinkin, Jena D. Hwang, Natasha Jaques, James Landay, Ronan Le Bras, Sydney Levine, Christopher Parisien, Sherry Ruan, Maarten Sap, and Yulia Tsvetkov.

I firmly believe that everyone has the potential to achieve anything they set their mind to. Keep going and try again.

Your path is uniquely yours. Follow what ignites you. Every twist, every turn, every unexpected direction is exactly where you need to be.

Two cats, an orange tabby named Loopy and an orange british shorthair named Loafy, adopted me as their owner.

My current life motto: be happy and be healthy.