I’m Zeming Wei (魏泽明), a third-year undergraduate at School of Mathematical Sciences, Peking University. I was also a visiting student at UC Berkeley in Fall, 2023. I am interested in improving the trustworthiness of Machine Learning, specifically focusing on mechanism interpretability, adversarial robustness, and generation safety.

If you are interested in working with me, please send me an email.

🔥 News

  • 2023.12:  💯 I achieved a full GPA (4.0/4.0) during my study at UC Berkeley (with 1 A and 2 A+ grades).
  • 2023.11:  🎙 I gave a lightning talk on our LLM safety paper at Constellation, Berkeley.
  • 2023.10:  🔗 I serve as a fellow of Berkeley AI Safety Initiative for Students (BASIS).
  • 2023.10:  ✨ New preprint on LLM Safety is available at arxiv.
  • 2023.09:  🎖 I received the Exceptional Award for Academic Innovation in the academic year of 2022-2023 (only 1 awardee among undergraduates in School of Mathematical Sciences, Peking University, Top 0.1%).
  • 2023.08:  🎉 1 Paper (as first author) accepted by Journal of Logical and Algebraic Methods in Programming.
  • 2023.08:  🏫 I started a visiting student program at UC Berkeley in Fall 2023.
  • 2023.07:  🏖 I attended ICML 2023 at Honolulu and illustrated our workshop poster.
  • 2023.07:  🔍 I reviewed 11 papers for NeurIPS 2023 (9 regular + 2 ethics).
  • 2023.06:  🎉 1 Paper (as first author & corresponding author) accepted by ICML 2023 AdvML-Frontiers Workshop.
  • 2023.06:  🍁 I attended CVPR 2023 at Vancouver and illustrated our poster.
  • 2023.05:  🥈 Won Second prize in Chinese Mathematics Competitions for College Students (National final).
  • 2023.05:  🎙 I gave a talk on our CVPR paper in Safe & Responsible AI workshop (ICLR 2023 social event) at Tsinghua University.
  • 2023.02:  🎉 1 Paper (as first author) accepted by CVPR 2023.
  • 2022.12:  🥇 Won First prize in Chinese Mathematics Competitions for College Students (Beijing Division), and qualified for the finals.

📝 Selected First-Author Papers

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations (Preprint)

Explores manipulating the safety of LLMs with only a few in-context demonstrations, introduces In-Context Attack (ICA) and In-Context Defense (ICD) for jailbreaking and safeguarding these models, and highlights the potential of in-context learning (ICL) to influence the alignment of LLMs.

CFA: Class-wise Calibrated Fair Adversarial Training (CVPR 2023)

  • Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang
  • [pdf] [arxiv] [code]

Delves into the theoretical and empirical analysis of class preferences in adversarial configurations within Adversarial Training (AT), introduces a CFA Framework that automatically tailors training configurations for each class, and demonstrates that CFA achieves state-of-the-art robust fairness while seamlessly integrating with other AT variants.

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks (Journal of Logical and Algebraic Methods in Programming)

  • Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun
  • [pdf] [arxiv] [code]

Extending WFA extraction of RNNs from formal to natural language tasks by identifying transition sparsity and context dependency problems, propose an RNN interpretation framework with a transition-based word embedding of the extracted automata, and further propose two applications (pretraining and adversarial attack) on RNNs with the embedding.

📝 Selected Corresponding-Author Papers

(*: Equal Contribution; ${}^\dagger$: Corresponding Author)

Sharpness-Aware Minimization Alone can Improve Adversarial Robustness (ICML 2023 AdvML-Frontiers Workshop)

  • Zeming Wei*${}^{\boldsymbol\dagger}$, Jingyu Zhu*, Yihao Zhang*
  • [pdf] [arxiv] [code]

Uncover the intriguing benefit of Sharpness-Aware Minimization (SAM) that SAM can improve adversarial robustness notably without sacrificing natural accuracy, provide empirical and theoretical insights into understanding this property, and propose that SAM can be regarded as a lightweight substitute for AT under certain requirements.

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

  • Yihao Zhang*, Hangzhou He*, Jingyu Zhu*, Huanran Chen, Yifei Wang, Zeming Wei${}^{\boldsymbol\dagger}$
  • [pdf] [arxiv] [code]


💡 Patents

An image classification method based on fair and robust neural networks (patent pending)

Yisen Wang and Zeming Wei

🎖 Honors and Awards

  • Exceptional Award for Academic Innovation (Top 0.1%), Peking University, 2023
  • Merit Student (Top 10%), Peking University, 2023
  • University Scholarship, Peking University, 2023
  • Second prize, Chinese Mathematics Competitions for Undergraduates (National Final), 2023
  • First prize, Chinese Mathematics Competitions for Undergraduates (Beijing Division), 2022
  • Merit Student (Top 10%), Peking University, 2022
  • University Scholarship, Peking University, 2022
  • Award for Contribution in Student Organizations, Peking University, 2021
  • University Scholarship, Peking University, 2021

📖 Educations

  • 2023.08 - 2023.12, Visiting Student, University of California Berkeley
  • 2021.06 - 2025.06 (expected), Undergraduate Student, School of Mathematical Sciences, Peking University
  • 2020.09 - 2021.06, Undergraduate Student, College of Engineering, Peking University
  • 2017.09 - 2020.06, Senior High School Student, Beijing No.4 High School

💼 Academic Service

  • Journal Reviewer: TMLR
  • Conference Reviewer: NeurIPS 2023, ICLR 2024, AISTATS 2024, ICML 2024, ECCV 2024
  • Workshop Reviewer: XAIA (@NeurIPS 2023)
  • Fellow, Berkeley AI Safety Initiative for Students (BASIS), UC Berkeley

🔗 Links

(Alphabetical Order)