GitHub - Cowa/parallel-document-identification: A simple hapax-based method to identify parallel documents · GitHub
Skip to content

Cowa/parallel-document-identification

Folders and files

Repository files navigation

Parallel document identification

A simple hapax-based method to identify parallel documents

Report

The report (in french) can be found here.

Where is the data?

The data used was Wikipédia articles in french and english.
But I was not allowed to publish it here (obviously).

Anyway it was too big.

About

A simple hapax-based method to identify parallel documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages