I am a third-year undergraduate at School of Mathematical Sciences, Peking University. Currently, I am a visiting student at UC Berkeley. I am interested in improving the trustworthiness of Machine Learning, specifically focusing on adversarial robustness and explainability.

🔥 News

  • 2023.11:  🎙 I gave a lightening talk on our LLM safety paper at Constellation, Berkeley.
  • 2023.10:  🔗 I serve as a fellow of Berkeley AI Safety Initiative for Students (BASIS).
  • 2023.10:  ✨ New preprint on LLM Safety is available at arxiv.
  • 2023.09:  🎖 I received the Award for Academic Innovation in the academic year 2022-2023 (only 1 awardee among undergraduates in School of Mathematical Sciences, Peking University, Top 0.1%).
  • 2023.08:  🎉 1 Paper (as first author) accepted by Journal of Logical and Algebraic Methods in Programming.
  • 2023.08:  🏫 I started a visiting student program at UC Berkeley in Fall 2023.
  • 2023.07:  🏖 I attended ICML 2023 at Honolulu and illustrated our workshop poster.
  • 2023.07:  🔍 I reviewed 11 papers for NeurIPS 2023 (9 regular + 2 ethics).
  • 2023.06:  🎉 1 Paper (as first author & corresponding author) accepted by ICML 2023 AdvML-Frontiers Workshop.
  • 2023.06:  🍁 I attended CVPR 2023 at Vancouver and illustrated our poster.
  • 2023.05:  🥈 Won Second prize in Chinese Mathematics Competitions for College Students (National final).
  • 2023.05:  🎙 I gave a talk on our CVPR paper in Safe & Responsible AI workshop (ICLR 2023 social event) at Tsinghua University.
  • 2023.02:  🎉 1 Paper (as first author) accepted by CVPR 2023.
  • 2022.12:  🥇 Won First prize in Chinese Mathematics Competitions for College Students (Beijing Division), and qualified for the finals.

📝 Selected Papers

(*: Equal Contribution; ${}^\dagger$: Corresponding Author)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations (Preprint)

Zeming Wei, Yifei Wang, Yisen Wang${}^\dagger$

  • Uncover the vulnerability and robustness of aligned language models under only few in-context demonstrations without fine-tuning
  • Propose In-Context Attack (ICA) and In-Context Defense (ICD) to jailbreak and guard aligned language models
  • Shed light on the potential of in-context learning (ICL) to manipulate the alignment of LLMs
  • [pdf] [arxiv] [code] [website]

CFA: Class-wise Calibrated Fair Adversarial Training (CVPR 2023)

Zeming Wei, Yifei Wang, Yiwen Guo, Yisen Wang${}^\dagger$

  • Theoretically and empirically investigate the preference of different classes for adversarial configurations in Adversarial Training (AT)
  • Propose a CFA framework that customizes specific training configurations for each class automatically
  • CFA improves both overall robustness and fairness, and can be easily incorporated into other AT variants
  • [pdf] [arxiv] [code]

Sharpness-Aware Minimization Alone can Improve Adversarial Robustness (ICML 2023 Workshop)

Zeming Wei*${}^{\boldsymbol\dagger}$, Jingyu Zhu*, Yihao Zhang*

  • Theoretically show that using Sharpness-Aware Minimization (SAM) can improve adversarial robustness
  • Empirically illustrate that SAM can improve robustness with a friendly computational cost and no decrease in natural accuracy
  • Propose that SAM can be regarded as a lightweight substitute for AT under certain requirements
  • [pdf] [arxiv] [code]

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks (Journal of Logical and Algebraic Methods in Programming)

Zeming Wei, Xiyue Zhang, Yihao Zhang, Meng Sun${}^\dagger$

  • Extending WFA extraction methods of RNNs from formal to natural language tasks by identifying transition sparsity and context dependency problems
  • Propose an RNN interpretation framework with a transition-based word embedding of the extracted automata
  • Further propose two applications (pretraining and adversarial attack) on RNNs with the embedding
  • [pdf] [arxiv] [code]

💡 Patents

An image classification method based on fair and robust neural networks (patent pending)

Yisen Wang and Zeming Wei

🎖 Honors and Awards

  • Award for Academic Innovation (Top 0.1%), Peking University, 2023
  • Merit Student (Top 10%), Peking University, 2023
  • University Scholarship, Peking University, 2023
  • Second prize, Chinese Mathematics Competitions for Undergraduates (National Final), 2023
  • First prize, Chinese Mathematics Competitions for Undergraduates (Beijing Division), 2022
  • Merit Student (Top 10%), Peking University, 2022
  • University Scholarship, Peking University, 2022
  • Award for Contribution in Student Organizations, Peking University, 2021
  • University Scholarship, Peking University, 2021

📖 Educations

  • 2023.08 - 2023.12 (expected), Visiting Student, University of California Berkeley
  • 2021.06 - 2025.06 (expected), Undergraduate Student, School of Mathematical Sciences, Peking University
  • 2020.09 - 2021.06, Undergraduate Student, College of Engineering, Peking University
  • 2017.09 - 2020.06, Senior High School Student, Beijing No.4 High School

💼 Academic Service

  • Journal Reviewer: TMLR
  • Conference Reviewer: NeurIPS 2023, ICLR 2024, AISTATS 2024
  • Workshop Reviewer: XAIA (@NeurIPS 2023)
  • Fellow, Berkeley AI Safety Initiative for Students (BASIS), UC Berkeley

🔗 Links

(Alphabetical Order)