Zeming Wei (魏泽明)

I’m Zeming Wei (魏泽明), an incoming Ph.D. student at School of Mathematical Sciences, proudly supervised by Professor Meng Sun. I obtained my Bachelor degree from Peking University in 2025 and visited UC Berkeley in Fall 2023, where I was also fortunate to be advised by Prof. Yisen Wang, Prof. Jun Sun, Prof. Yang Liu, and Prof. David Wagner.

My research focuses on the trustworthiness of AI, specifically on mechanism interpretability, adversarial robustness, and generative model safety. If you are interested in collaborating (or just chatting) with me, feel free to email me.

🔥 News

2025.06: 🎊 I got my Bachelor of Science degree from Peking University and will start my Ph.D. studies in September 2025. Thank you to all my advisors and collaborators!
2025.06: 🏅 I received the Outstanding Bachelor Dissertation Award by Peking University (Top 3%, pdf).
2025.06: 🌟 3 new preprints exploring fresh paradigms in LLM safe alignment are available online, including safety modeling, retrieval, and fine-tuning.
2025.05: 🎉 1 Paper (as first author) accepted by ICML 2025.
2025.04: 🌟 Our new preprint on discussing Risks in LLM-based Agents is available online.
2025.03: 🎖 I’m selected for the Somersault Cloud Talent Program (筋斗云人才计划) by ByteDance (internship track).
2024.12: 🎉 2 Papers (as corresponding author; 1 as Oral) accepted by ICASSP 2025.
2024.12: 🏅 I received the Academic Rising Star Award (Top 5 undergraduates university-wide, blog) by School of Computer Science, Peking University.
2024.11: 🏅 I received the May 4th Scholarship (五四奖学金, blog), which is the highest honor scholarship of Peking University (only 1 undergraduate awardee in School of Mathematical Sciences, Top 0.1%).
2024.10: ✨ My research grant (as Principal Investigator) is approved by Beijing Natural Science Foundation.
2024.09: 🎉 3 Papers accepted by NeurIPS 2024.
2024.07: 🎡 I attended ICML 2024 at Vienna and illustrated our poster.
2024.05: 🎉 1 Paper (as corresponding author) accepted by ICML 2024.
2023.12: 💯 I achieved a full GPA (4.0/4.0) during my study at UC Berkeley (with 1 A and 2 A+ grades).
2023.10: 🔗 I serve as a fellow of Berkeley AI Safety Initiative for Students (BASIS).
2023.09: 🏅 I received the Exceptional Award for Academic Innovation of Peking University (only 1 undergraduate awardee in School of Mathematical Sciences, Top 0.1%).
2023.08: 🎉 1 Paper (as first author) accepted by Journal of Logical and Algebraic Methods in Programming.
2023.06: 🍁 I attended CVPR 2023 at Vancouver and illustrated our poster.
2023.05: 🥈 Won Second prize in Chinese Mathematics Competitions for Undergraduates (National final, Top 0.2%).
2023.02: 🎉 1 Paper (as first author) accepted by CVPR 2023.
2022.12: 🥇 Won First prize in Chinese Mathematics Competitions for Undergraduates (Beijing Division), and qualified for the finals.

📝 Selected Preprints

(${}^{\boldsymbol\dagger}$: Corresponding Author; *: Equal Contribution)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei, Yifei Wang, Ang Li, Yichuan Mo, Yisen Wang
Preprint (Cited 300+ times)
[pdf]
Position: Agent-Specific Trustworthiness Risk as a Research Priority
Zeming Wei, Tianlin Li, Xiaojun Jia, Yang Liu, Meng Sun
Preprint
[pdf]
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
Zeming Wei, Chengcan Wu, Meng Sun
Preprint
[pdf]
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Taiye Chen*, Zeming Wei*, Ang Li, Yisen Wang
Preprint
[pdf]
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Chengcan Wu*, Zhixin Zhang*, Zeming Wei*, Yihao Zhang, Meng Sun
Preprint
[pdf]

📝 Selected Publications

(${}^{\boldsymbol\dagger}$: Corresponding Author; *: Equal Contribution)

Identifying and Understanding Cross-Class Features in Adversarial Training
Zeming Wei, Yiwen Guo, Yisen Wang
ICML 2025
[pdf] [arxiv] [code]
Boosting Jailbreak Attack with Momentum
Yihao Zhang*, Zeming Wei*${}^{\boldsymbol\dagger}$
ICASSP 2025 (Oral)
[pdf] [arxiv] [code]
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang*, Hangzhou He*, Jingyu Zhu*, Huanran Chen, Yifei Wang, Zeming Wei${}^{\boldsymbol\dagger}$
ICML 2024
[pdf] [arxiv] [code]
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks
Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun
Journal of Logical and Algebraic Methods in Programming
[pdf] [arxiv] [code]
CFA: Class-wise Calibrated Fair Adversarial Training
Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang
CVPR 2023
[pdf] [arxiv] [code]

💻 Grants

Adversarial Safety Testing and Defense of AI Foundation Models
Principal Investigator, Beijing Natural Science Foundation (Grant No. QY24035)
2024.10 - 2026.09

🎖️ Talent Programs

Somersault Cloud Talent Program (筋斗云人才计划), internship track, ByteDance
Elite Ph.D. Program in Mathematics, Peking University
Elite Ph.D. Program in Applied Mathematics, Center for Machine Learning Research, Peking University

🏅 Honors and Awards

Outstanding Bachelor Dissertation Award (Top 3%), Peking University, 2025
Excellent Graduate of Beijing Municipal (Top 5%), Beijing Municipal Education Commission, 2025
Excellent Graduate of Peking University, Peking University, 2025
May 4th Scholarship (五四奖学金, Top 0.1%) [blog], the highest honor scholarship of Peking University, 2024
Academic Rising Star Award (Top 5 undergraduates university-wide) [blog], School of Computer Science, Peking University, 2024
Merit Student (Top 10%), Peking University, 2024
Spotlight Award (Best Paper), 1st ICML Workshop on In-Context Learning, 2024
ICML Travel Award, 2024
Exceptional Award for Academic Innovation (Top 0.1%), Peking University, 2023
Merit Student (Top 10%), Peking University, 2023
Second prize, Chinese Mathematics Competitions for Undergraduates (National Final, Top 0.2%), 2023
First prize, Chinese Mathematics Competitions for Undergraduates (Beijing Division), 2022
Merit Student (Top 10%), Peking University, 2022
Award for Contribution in Student Organizations, Peking University, 2021

📖 Educations

2025.09 (expected) -, Ph.D. Student, School of Mathematical Sciences, Peking University
2023.08 - 2023.12, Visiting Student, University of California Berkeley
2021.06 - 2025.06, Undergraduate Student, School of Mathematical Sciences, Peking University

💼 Academic Service

Reviewer (Conference): NeurIPS, ICLR, ICML, AISTATS, ECCV, CVPR, AAAI
Reviewer (Journal): TIFS, TMLR
Reviewer (Workshop): XAIA (@NeurIPS 2023), ICL (@ICML 2024), IAI (@NeurIPS 2024)
Fellow, Berkeley AI Safety Initiative for Students (BASIS), UC Berkeley

🔗 Links

(Alphabetical Order)

Ph.D. Supervisor: Meng Sun
Advisors & Senior Co-authors: Stefanie Jegelka (MIT), Yang Liu (NTU), Jun Sun, David Wagner (UCB), Yisen Wang
Co-authors: Huanran Chen, Hangzhou He, Julien Piet (UCB), Chawin Sitawarin (Google), Yifei Wang (MIT), Xiyue Zhang