You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stephantul edited this page Feb 21, 2020
·
3 revisions
The pattern.nl module contains a fast part-of-speech
tagger for Dutch (identifies nouns, adjectives, verbs, etc. in a
sentence), sentiment analysis, and tools for Dutch verb conjugation and
noun singularization & pluralization.
The functions in this module take the same parameters and return the
same values as their counterparts in pattern.en.
Refer to the documentation there for more details.
Noun singularization & pluralization
For Dutch nouns there is singularize()
and pluralize(). The implementation is
slightly less robust than the English version (accuracy 91% for
singularization and 80% for pluralization).
>>> from pattern.nl import singularize, pluralize
>>>
>>> print singularize('katten')
>>> print pluralize('kat')
kat
katten
Verb conjugation
For Dutch verbs there is conjugate(),
lemma(), lexeme() and tenses(). The lexicon for verb conjugation
contains about 4,000 common Dutch verbs. For unknown verbs it will fall
back to a rule-based approach with an accuracy of about 81%.
>>> from pattern.nl import conjugate
>>> from pattern.nl import INFINITIVE, PRESENT, SG
>>>
>>> print conjugate('ben', INFINITIVE)
>>> print conjugate('ben', PRESENT, 2, SG)
zijn
bent
Attributive & predicative adjectives
Dutch adjectives followed by a noun inflect with an -e suffix (e.g., braaf → brave kat). You
can get the base form with the predicative() function, or vice versa
with attributive(). Accuracy is 99%.
For opinion mining there is sentiment(), which returns a (polarity, subjectivity)-tuple, based on a lexicon of
adjectives. Polarity is a value between -1.0 and +1.0, subjectivity between 0.0 and 1.0.
The accuracy is around 82% (P 0.79, R 0.86) for book reviews:
>>> from pattern.nl import sentiment
>>> print sentiment('Een onwijs spannend goed boek!')
(0.69, 0.90)
Parser
For parsing there is parse(), parsetree() and split(). The parse() function annotates words in the given
string with their part-of-speech
tags (e.g.,
NN for nouns and VB for verbs). The parsetree() function takes a
string and returns a tree of nested objects (Text → Sentence → Chunk → Word). The split() function takes the output of parse() and returns a Text. See the pattern.en documentation
(here) how
to manipulate Text objects.
>>> from pattern.nl import parse, split
>>>
>>> s = parse('De kat zit op de mat.')
>>> for sentence in split(s):
>>> print sentence
Sentence('De/DT/B-NP/O kat/NN/I-NP/O zit/VBZ/B-VP/O op/IN/B-PP/B-PNP'
'de/DT/B-NP/I-PNP mat/NN/I-NP/I-PNP ././O/O')
The parser is built on Jeroen Geertzen's Dutch language
model. The accuracy is
around 91%. The
original WOTAN
tagset is mapped to Penn Treebank. If you need to
work with the original tags you can also use parse() with an optional parameter tagset="WOTAN".