| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| FSCS (Niklaus et al., 2021) | 📄 🤗 💻 | Swiss court judgments | 🇩🇪 🇫🇷 🇮🇹 | 85K cases w/ 2 outcomes |
| ECtHR (Chalkidis et al., 2021) | 📄 🤗 | EU court judgments | 🇬🇧 | 11K cases w/ 11 outcomes |
| ECHR (Aletras et al., 2019) | 📄 💾 | EU court judgments | 🇬🇧 | 11.5K cases w/ 11 outcomes |
| CAIL (Xiao et al., 2018) | 📄 💻 | Chinese court judgements | 🇨🇳 | 2.6M cases w/ 6 outcomes |
| AnnoCaseLaw (2025) | 📄 💻 | US Appeals Court negligence cases | 🇺🇸 | 471 annotated cases with expert labels |
| IndianBailJudgments-1200 (2025) | 📄 🤗 💻 | Indian court bail decisions | 🇮🇳 | 1.2K judgments with 20+ structured attributes |
| CaseSumm (2025) | 📄 🤗 | US Supreme Court opinions | 🇺🇸 | 25.6K opinions with official syllabuses |
| JUSTICE (2022) | 📄 💻 | US Supreme Court cases | 🇺🇸 | Benchmark for judgment prediction |
| Cambridge Law Corpus (CLC) (2023) | 📄 | UK court cases | 🇬🇧 | 258K+ cases (16th century–present) |
| Super-SCOTUS (2025) | 📄 💻 | US Supreme Court decisions | 🇺🇸 | Decision direction and related tasks |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| GLC (Papaloukas et al., 2021) | 📄 💻 | Greek legislation | 🇬🇷 | 47.5K laws w/ 2.7K labels |
| CUAD (Hendrycks et al., 2021) | 📄 🤗 💻 | Contracts | 🇬🇧 | 510 contracts w/ 41 classes |
| MultiEURLEX (Chalkidis et al., 2021) | 📄 🤗 💻 | EU legislation | 🇬🇧 🇩🇪 🇫🇷 🇮🇹 🇪🇸 (18+) | 65K laws w/ 4.5K labels |
| LEDGAR (Tuggener et al., 2020) | 📄 💾 | Contracts | 🇬🇧 | 60.5K contracts w/ 12.6K labels |
| Contract Discovery (Borchmann et al., 2020) | 📄 💻 | Contracts | 🇬🇧 | 2.6K clauses w/ 21 classes |
| EURLEX-57K (Chalkidis et al., 2019) | 📄 💾 | EU legislation | 🇬🇧 | 57K laws w/ 4.3K labels |
| Unfair-ToS (Lippi et al., 2018) | 📄 💾 | Contracts | 🇬🇧 | 9.4K sentences w/ 9 classes |
| Contract Elements (Chalkidis et al., 2017) | 📄 💾 | Contracts | 🇬🇧 | 2.4K contracts w/ 10 classes |
| OPP-115 (Wilson et al., 2016) | 📄 💾 | Privacy laws | 🇬🇧 | 115 policies w/ 23K labels |
| FairLex (2022) | 📄 🤗 💻 | Multi-jurisdictional legal texts | 🇬🇧🇩🇪🇫🇷🇮🇹🇨🇳 | Fairness-focused classification datasets |
| Legal Case Document Summarization (Kaggle) | 📄 | Legal case summaries | Various | Large-scale dataset |
| Legal Citation Text Classification Dataset (Kaggle) | 📄 | General legal documents | 🇬🇧 | 25K cases with catchphrases and citations |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| BSARD (Louis et al., 2022) | 📄 🤗 💻 | Belgian legislation | 🇫🇷 | 1.1K questions w/ 22.6K candidate statutory articles |
| EU2UK (Chalkidis et al., 2021) | 📄 💾 | EU & UK legislation | 🇬🇧 | 2K query documents w/ 52.5K candidate documents |
| UK2EU (Chalkidis et al., 2021) | 📄 💾 | EU & UK legislation | 🇬🇧 | 2.1K query documents w/ 3.9K candidate documents |
| COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) | 📄 💾 | Canadian precedents | 🇬🇧 | 650 query cases w/ 128K candidate cases |
| COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) | 📄 💾 | Japanese legislation | 🇬🇧 🇯🇵 | 808 questions w/ 768 candidate statutory articles |
| CAIL2019-SCM (Xiao et al., 2019) | 📄 💻 | Chinese court judgements | 🇨🇳 | 8.9K triplets of cases |
| CLERC (2024) | 📄 🤗 💻 | Legal case retrieval | 🇬🇧 | Large corpus for retrieval and RAG |
| LEAD (2024) | 📄 💻 | Legal case retrieval | Various | 100K+ pairs of similar legal cases |
| Legal IR Philippines (2024) | 📄 | Philippine legal documents | 🇵🇭 | Datasets with synthetic queries |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| CaseHOLD (Zheng et al., 2021) | 📄 💻 | US case holdings | 🇬🇧 | 53.1K multiple-choice questions |
| JEC-QA (Zhong et al., 2019) | 📄 💾 | Chinese law | 🇨🇳 | 26.3K multiple-choice questions |
| CJRC (Duan et al., 2019) | 📄 💻 | Chinese court judgements | 🇨🇳 | 50K question-answers from 10K documents |
| PrivacyQA (Ravichander et al., 2019) | 📄 💻 | Privacy policies | 🇬🇧 | 1.7K question-answers from 35 documents |
| LLeQA (2024) | 📄 🤗 💻 | French-Belgian statutes | 🇫🇷 | 1,868 expert-annotated long-form QA |
| IndicLegalQA (2025) | 📄 | Indian Supreme Court judgments | 🇮🇳 | 10K QA pairs from 1,256 judgments |
| GerLayQA (2024) | 📄 💻 | German civil law | 🇩🇪 | 21K laymen legal Qs with lawyer answers |
| LEGAL-UQA (2024) | 📄 | Legal questions | 🇵🇰 | 619 parallel Urdu–English QA pairs |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| COLIEE-Case-Law-Entailment (Rabelo et al., 2020) | 📄 💾 | Canadian precedents | 🇬🇧 | 425 cases w/ related case |
| COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) | 📄 💾 | Japanese legislation | 🇬🇧 🇯🇵 | 808 questions w/ related statutory article |
| LAR-ECHR (2024) | 📄 | European Court of Human Rights | 🇬🇧 | Legal argument reasoning task dataset |
| δ-Stance (2025) | 📄 | US legal argumentation | 🇺🇸 | Large-scale stances and arguments |
| Dataset | Links | Domain | Language | Size |
|---|---|---|---|---|
| UK-Abs (Shukla et al., 2022) | 📄 💻 💾 | UK court cases | 🇬🇧 | 793 pairs of (case, abastractive summary) from the UK Supreme Court |
| IN-Abs (Shukla et al., 2022) | 📄 💻 💾 | Indian court cases | 🇬🇧 | 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court |
| IN-Ext (Shukla et al., 2022) | 📄 💻 💾 | Indian court cases | 🇬🇧 | 50 pairs of (case, extractive summary) from the Indian Supreme Court |
| TOS;DR (Keymanesh et al., 2020) | 📄 💻 | Terms of service | 🇬🇧 | 1.6K pairs of (agreement text, summary) from data privacy policies |
| BillSum (Kornilova et al., 2019) | 📄 💻 💾 | US Congressional bills | 🇬🇧 | 22.2K pairs of (bill, summary) |
| TL;DRLegal (Manor et al., 2019) | 📄 💻 | Terms of service | 🇬🇧 | 84 pairs of (agreement text, summary) from software licenses |
| TOS;DR (Manor et al., 2019) | 📄 💻 | Terms of service | 🇬🇧 | 421 pairs of (agreement text, summary) from data privacy policies |
| BVA Cases (Zhong et al., 2019) | 📄 💻 | US court cases | 🇬🇧 | 92 pairs of (case, summary) from the US Board of Veterans' Appeal |
| LCR (Galgani et al., 2012) | 📄 💾 | Australian court cases | 🇬🇧 | 3.9K pairs of (case, catchphrases) |
| EurLexSummarization (2022) | 📄 🤗 💻 | EU legislation | 🌍 | Multilingual summarization across 24 languages |
| Multi-LexSum (2025) | 📄 | Legal documents | 🇬🇧 | 40K+ documents with 9K+ expert summaries |
| CaseSumm (2025) | 📄 🤗 | US Supreme Court opinions | 🇬🇧 | 25.6K opinions with official syllabuses |
| Dataset | Links | Language | Size |
|---|---|---|---|
| Pile of Law (Henderson et al., 2022) | 📄 🤗 💻 | 🇬🇧 | ~256GB of legal and administrative legal text |
| MultiLegalPile (2024) | 📄 🤗 | 🌍 | 689GB multilingual legal corpus from 17 jurisdictions |
| Dataset | Task | Language | Tasks |
|---|---|---|---|
| FairLex (Chalkidis et al., 2022) | 📄 🤗 💻 | 🇬🇧 🇩🇪 🇫🇷 🇮🇹 🇨🇳 | Clasification (x1), legal judgement prediction (x3) |
| LexGLUE (Chalkidis et al., 2022) | 📄 🤗 💻 | 🇬🇧 | Classsification (x6), multiple-choice QA (x1) |
-
[
2017] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link] -
[
2024] Large Language Models and International Law, Chicago Journal of International Law [🌐] -
[
2024] Computational Legal Studies Comes of Age, SSRN [📄]
-
[
2020-05] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf] -
[
2019-09] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf] -
[
2018-12] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf] -
[
2024] Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models and Challenges, F. Ariai et al. [📄] -
[
2025] Computational Law: Datasets, Benchmarks, and Ontologies, D. Küçük & F. Can [📄] -
[
2025] A Comprehensive Survey on Legal Summarization, arXiv [📄] -
[
2024] Large Language Models in Law: A Survey, J. Lai et al. [📄] -
[
2025] Large Language Models in Argument Mining: A Survey, arXiv [📄] -
[
2024] When Large Language Models Meet Law: Dual-Lens Survey, arXiv [📄]
- [
2019-06] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides] - [
2019-04] Artificial Intelligence and Law – An Overview and History, H. Surden. [video]
- The Natural Legal Language Processing (NLLP) Workshop [website]
- The International Conference on Artificial Intelligence and Law (ICAIL) [website]
- The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
- The EXplainable AI in Law (XAILA) Workshop [website]
- The International Workshop on Juris-informatics (JURISIN) [website]
- The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
- The International Workshop on Legal Information Retrieval [website]
- NLLP 2025 - Natural Legal Language Processing Workshop (EMNLP 2025, Suzhou) [🌐]
- RegNLP 2025 - Regulatory Natural Language Processing Workshop (COLING 2025) [🌐]
- JURIX 2025 - 38th International Conference on Legal Knowledge and Information Systems (Turin, December 9-11, 2025) [🌐]
- ICAIL 2025 - 20th International Conference on Artificial Intelligence and Law (Chicago, June 16-20, 2025) [🌐]
- MWAiL 2025 - Multilingual Workshop on AI & Law Research (Chicago, June 20, 2025) [🌐]
- LLMFinLegal 2025 - Workshop on Large Language Models for Finance and Legal (COLING 2025) [🌐]
- 8th World Legal Tech and AI Summit (Berlin, September 18-19, 2025) [🌐]
- AI Legal Summit 2025 - Various industry conferences on AI in legal practice [🌐]
- Legal AI Conferences Online Platform - Centralized platform for legal AI events [🌐]
- Embedding Benchmarking Tools: MTEB, Hugging Face evaluate, LegalBench, COLIEE [🌐]
- Legal Argument Mining Tools: RMU:ECHR corpus and mining models [💻]
- Multilingual Legal Processing: Evaluation pipelines for multilingual legal LLMs [📄]
- LegalEval-Q: Quality evaluation for LLM-generated legal text [📄]
- FairLex Evaluation: Bias and fairness assessment [🌐]
Last Updated: 2025-09-30 Research Coverage: 2024-01 to 2025-09 Sources: 180+ academic papers, datasets, and conference proceedings
