Logparser provides a toolkit and benchmarks for automated log parsing, which is a crucial step towards structured log analytics. By applying logparser, users can automatically learn event templates from unstructured logs and convert raw log messages into a sequence of structured events. In the literature, the process of log parsing is sometimes refered to as message template extraction, log key extraction, or log message clustering.

An illustrative example of log parsing
👉 Read the docs: https://logparser.readthedocs.io
🔭 If you use any of our tools or benchmarks in your research for publication, please kindly cite the following papers.
- [ICSE'19] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. International Conference on Software Engineering (ICSE), 2019.
- [DSN'16] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
Code organization:
- benchmark: the benchmark scripts to reproduce the evaluation results of log parsing
- demo: the demo files to show how to run logparser on HDFS logs.
- logparser: the logparser package
- logs: Some log samples and manually parsed structured logs with their templates (ground truth).
Please follow the installation steps and demo in the docs to get started.
All the log parsers have been evaluated across 16 different logs available in loghub. We report parsing accuracy as the percentage of accurately parsed log messages. To reproduce the experimental results, please run the benchmark scripts.
👇 Check the detailed bechmarking result table (click to expand)
In the table, accuracy values above 0.9 are marked in bold, and the best accuracy results achieved are marked with *. Some of the accuracy values may be lower than what have been reported by previous studies (e.g., Drain, LogMine). The reasons are two-fold: 1) We use a more rigorous accuracy metric which rejects events that are only partially matched. 2) For fairness of comparison, we apply only a few preprocessing regular expressions (e.g., IP or number replacement) to each log parser. Adding more preprocessing rules can boost parsing accuracy, but requires more manual efforts as well.
- [ICSE'19] Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. International Conference on Software Engineering (ICSE), 2019.
- [TDSC'18] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Transactions on Dependable and Secure Computing (TDSC), 2018.
- [ICWS'17] Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed Depth Tree. IEEE International Conference on Web Services (ICWS), 2017.
- [DSN'16] Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
Logparser is implemented based on a number of existing open-source projects:
- SLCT (C++)
- LogCluster (perl)
- LenMa (python 2)
- MoLFI (python 3)
For any questions or feedback, please post to the issue page.



