Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study

Mar 1, 2024·

Haotian Chen

Lingwei Zhang

Yiran Liu

Yang Yu

· 0 min read

PDF Code DOI

Abstract

While large language models (LLMs) exhibit impressive performance on a wide range of NLP tasks, most of them fail to learn the causality from correlation. We propose a causality-aware self-attention mechanism (CASAM) and eight kinds of legal-specific attacks for evaluation. Experimental results demonstrate CASAM achieves SOTA performances and the strongest robustness on three legal text prediction benchmarks.

Type

Conference paper

Publication

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024)

Last updated on Mar 1, 2024

Large Language Models Causality

Authors

Haotian Chen (he/him)

Research Assistant Professor

I am a Research Assistant Professor at the School of Artificial Intelligence, Shanghai Jiao Tong University, where I work with Prof. Junchi Yan at RethinkLab. I study how to build AI systems that can automate long-horizon, effort-intensive, and creativity-demanding tasks such as research, engineering, and development. My current work focuses on autonomous agents, large language models, and AI4Research. Before joining SJTU, I received my PhD in Data Science from Fudan University, advised by Prof. Xiangdong Zhou, and completed postdoctoral research at Tsinghua University (THUNLP), working with Prof. Zhiyuan Liu and Prof. Maosong Sun. I was also a research intern at the Machine Learning Research Group of Microsoft Research Asia, mentored by Xiao Yang, and at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, working with Prof. Yang Yu.

← RD2Bench: Toward Data-Centric Automatic R&D May 1, 2024

SALAS: Supervised Aspect Learning Improves Abstractive Multi-Document Summarization through Aspect Information Loss Sep 1, 2023 →

No results found

Rethinking the Development of Large Language Models from the Causal Perspective: A Legal Text Prediction Case Study