A simple hapax-based method to identify parallel documents
The report (in french) can be found here.
The data used was Wikipédia articles in french and english.
But I was not allowed to publish it here (obviously).
Anyway it was too big.
A simple hapax-based method to identify parallel documents
The report (in french) can be found here.
The data used was Wikipédia articles in french and english.
But I was not allowed to publish it here (obviously).
Anyway it was too big.