Taken in Seattle(Credit to Tiffany😊)

Email: zliu2803[at]usc[dot]edu

Ziyi Liu

Hi there👋 , I am a first-year PhD student at the University of Southern California advised by Prof. Jieyu Zhao. Previously, I obtained my master degree in USC and I was also a Research Assistant in USC ISI - Ink Lab for two years, advised by Professor Xiang Ren. My current research is focused on explainable AI and human-in-the-loop in NLP models. More specifically, we look into how machine explanation benefits models and users, and how human intervention improves model performance. I am also very interested in grounding natural language, and took the class of it. I think this is a research field that is full of challenges but a very promising direction for NLP.

My research interests mainly include:

  • how machine learning can benefit human and augment human intelligence; how human-in-the-loop techniques better intervene into a model
  • how should we explain the model, and how can we align the model's reasoning process with human's thinking process
  • Social bias mitigation in language model, and how to let model have social intelligence


Nov 15th: Check our new preprint SCORE: A framework for Self-Contradictory Reasoning Evaluation

May 1st: Our paper 'Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales' got ACCEPTED at ACL 2023! 🥳


My previous and current research work

Explanation regularization for NLP models

Neural language models' reasoning processes are notoriously hard to explain. There has been much progress in automatically generating machine rationales of NLM behavior, but less in utilizing the rationales to improve NLM behavior. For the latter, explanation regularization (ER) aims to improve NLM generalization by pushing the machine rationales to align with human rationales. Whereas prior works primarily evaluate such ER models via in-distribution (ID) generalization, ER's impact on out-of-distribution (OOD) is largely underexplored. In our research, we explored ER in 4 dimensions:

  • what kind of regularization criteria performs best for ER
  • How distantly supervised annotated rationales affect ER performance
  • How is ER affected by the number/choice of train instances with human rationales?
  • How is ER affected by the time taken to annotate the rationales?

My other research interests

More specifically, my current research interests are at the intersection of interpretability, human-centered AI, robustness and generalizability. Below are some high-level research questions I'd like to investigate, roughly ordered from short-term to long-term:

  • How to intervene with models' reasoning process/self-explanation?

    The paper ‘Ethical Advice Taker’ inspired me on intervention using natural language. In my previous research on human-in-the-loop intervention, we used attribution scores to regularize the explanation, but also didn’t see much improvement in model performance. I think it is crucial to come up with new ways for intervention to mitigate the bias, including gender bias, race bias, social group bias, or just model internal bias. Can we mitigate the bias by improving the interpretability of models? Can we better align the reasoning process of the model with human's?
  • How to improve models' robustness and generalizability?

    Nowadays there are many evaluation methods like out-of-domain test, contrast set, etc to test the generalizability and robustness of models. Though many models perform excellently in in-domain test sets, however, they might just learn some short-cuts and are not robust to trivial perturbation of the input.

  • How to enable models with social intelligence in human-centered AI system?

    I believe that AI agents should be companions for people which indicates that machines should be capable of social intelligence. The paper ‘Experience Grounds Language’ inspires me a lot in where NLP is going.
  • Recently, I am also interested in text generation. In my current research, I found out even GPT-3 can generate concise, coherent explanations for tasks, but many of explanations are not factually correct.


Google Summer of Code (Open Source)

I have been working with Red Hen Lab in GSoC 2019 on Chinese Pipeline, very pleasant to work with professors from UCLA and CWRU. We built a pipeline to process multimedia resources. My work mainly focus on automatic speech recognition of Chinese news audio. More information can be found in here

2D game (Mario style with puzzles)

This is actually a game development class project(I learnt that game development is definitely very hard 🤣). It is just a demo for now, and we are working on mechanisms. Welcome to give a try on our game! We would love your feedback!



Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales (ACL 2023)

Brihi Joshi*, Ziyi Liu*, Sahana Ramnath, Aaron Chan, Zhewei Tong, Shaoliang Nie, Qifan Wang, Yejin Choi, Xiang Ren (* means same contribution)


ER-Test: Evaluating Explanation Regularization Methods for NLP Models (Findings of EMNLP 2022)

Brihi Joshi*, Aaron Z. Chan*, Ziyi Liu*, Shaoliang Nie, Maziar Sanjabi, Hamed Firooz and Xiang Ren(* means same contribution)


A deep-learning framework for multi-level peptide–protein interaction prediction (Nature communications)

Yipin Lei, Shuya Li, Ziyi Liu, Fangping Wan, Tingzhong Tian, Shao Li, Dan Zhao, Jianyang Zeng