You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stephantul edited this page Feb 21, 2020
·
3 revisions
The pattern.fr module contains a fast part-of-speech
tagger for French (identifies nouns, adjectives, verbs, etc. in a
sentence), sentiment analysis, and tools for French verb conjugation and
noun singularization & pluralization.
The functions in this module take the same parameters and return the
same values as their counterparts in pattern.en.
Refer to the documentation there for more details.
Noun singularization & pluralization
For French nouns there is singularize()
and pluralize(). The implementation
uses a statistical approach with 93% accuracy for singularization and
92% for pluralization.
For French verbs there is conjugate(),
lemma(), lexeme() and tenses(). The lexicon for verb conjugation
contains about 1,750 common French verbs (constructed with Bob Salita's
verb conjugation rules). For unknown verbs it will fall back to regular
expressions with an accuracy of about 83%.
French verbs have more tenses than English verbs. In particular, the
plural differs for each person, and there are additional forms for
the FUTURE tense, the IMPERATIVE, CONDITIONAL and SUBJUNCTIVE mood and the PERFECTIVE aspect:
>>> from pattern.fr import conjugate
>>> from pattern.fr import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
>>>
>>> print conjugate('suis', INFINITIVE)
>>> print conjugate('suis', PRESENT, 1, SG, mood=SUBJUNCTIVE)
>>> print conjugate('suis', PAST, 3, SG)
>>> print conjugate('suis', PAST, 3, SG, aspect=PERFECTIVE)
être
sois
était
fut
For PAST tense + PERFECTIVE aspect we can also use PRETERITE (passé simple). For PAST tense + IMPERFECTIVE aspect we can also use IMPERFECT (imparfait):
>>> from pattern.fr import conjugate
>>> from pattern.fr import IMPERFECT, PRETERITE
>>>
>>> print conjugate('suis', IMPERFECT, 3, SG)
>>> print conjugate('suis', PRETERITE, 3, SG)
était
fut
The conjugate() function takes the
following optional parameters:
Tense
Person
Number
Mood
Aspect
Alias
Example
INFINITVE
None
None
None
None
"inf"
être
PRESENT
1
SG
INDICATIVE
IMPERFECTIVE
"1sg"
je __suis__
PRESENT
2
SG
INDICATIVE
IMPERFECTIVE
"2sg"
tu __es__
PRESENT
3
SG
INDICATIVE
IMPERFECTIVE
"3sg"
il __est__
PRESENT
1
PL
INDICATIVE
IMPERFECTIVE
"1pl"
nous __sommes__
PRESENT
2
PL
INDICATIVE
IMPERFECTIVE
"2pl"
vous __êtes__
PRESENT
3
PL
INDICATIVE
IMPERFECTIVE
"3pl"
ils __sont__
PRESENT
None
None
INDICATIVE
PROGRESSIVE
"part"
étant
PRESENT
2
SG
IMPERATIVE
IMPERFECTIVE
"2sg!"
sois
PRESENT
1
PL
IMPERATIVE
IMPERFECTIVE
"1pl!"
soyons
PRESENT
2
PL
IMPERATIVE
IMPERFECTIVE
"2pl!"
soyez
PRESENT
1
SG
CONDITIONAL
IMPERFECTIVE
"1sg->"
je __serais__
PRESENT
2
SG
CONDITIONAL
IMPERFECTIVE
"2sg->"
tu __serais__
PRESENT
3
SG
CONDITIONAL
IMPERFECTIVE
"3sg->"
il __serait__
PRESENT
1
PL
CONDITIONAL
IMPERFECTIVE
"1pl->"
nous __serions__
PRESENT
2
PL
CONDITIONAL
IMPERFECTIVE
"2pl->"
vous __seriez__
PRESENT
3
PL
CONDITIONAL
IMPERFECTIVE
"3pl->"
ils __seraient__
PRESENT
1
SG
SUBJUNCTIVE
IMPERFECTIVE
"1sg?"
je __sois__
PRESENT
2
SG
SUBJUNCTIVE
IMPERFECTIVE
"2sg?"
tu __sois__
PRESENT
3
SG
SUBJUNCTIVE
IMPERFECTIVE
"3sg?"
il __soit__
PRESENT
1
PL
SUBJUNCTIVE
IMPERFECTIVE
"1pl?"
nous __soyons__
PRESENT
2
PL
SUBJUNCTIVE
IMPERFECTIVE
"2pl?"
vous __soyez__
PRESENT
3
PL
SUBJUNCTIVE
IMPERFECTIVE
"3pl?"
ils __soient__
PAST
1
SG
INDICATIVE
IMPERFECTIVE
"1sgp"
j' __étais__
PAST
2
SG
INDICATIVE
IMPERFECTIVE
"2sgp"
tu __étais__
PAST
3
SG
INDICATIVE
IMPERFECTIVE
"3sgp"
il __était__
PAST
1
PL
INDICATIVE
IMPERFECTIVE
"1ppl"
nous __étions__
PAST
2
PL
INDICATIVE
IMPERFECTIVE
"2ppl"
vous __étiez__
PAST
3
PL
INDICATIVE
IMPERFECTIVE
"3ppl"
ils __étaient__
PAST
None
None
INDICATIVE
PROGRESSIVE
"ppart"
été
PAST
1
SG
INDICATIVE
PERFECTIVE
"1sgp+"
je __fus__
PAST
2
SG
INDICATIVE
PERFECTIVE
"2sgp+"
tu __fus__
PAST
3
SG
INDICATIVE
PERFECTIVE
"3sgp+"
il __fut__
PAST
1
PL
INDICATIVE
PERFECTIVE
"1ppl+"
nous __fûmes__
PAST
2
PL
INDICATIVE
PERFECTIVE
"2ppl+"
vous __fûtes__
PAST
3
PL
INDICATIVE
PERFECTIVE
"3ppl+"
ils __furent__
PAST
1
SG
SUBJUNCTIVE
IMPERFECTIVE
"1sgp?"
je __fusse__
PAST
2
SG
SUBJUNCTIVE
IMPERFECTIVE
"2sgp?"
tu __fusses__
PAST
3
SG
SUBJUNCTIVE
IMPERFECTIVE
"3sgp?"
il __fût__
PAST
1
PL
SUBJUNCTIVE
IMPERFECTIVE
"1ppl?"
nous __fussions__
PAST
2
PL
SUBJUNCTIVE
IMPERFECTIVE
"2ppl?"
vous __fussiez__
PAST
3
PL
SUBJUNCTIVE
IMPERFECTIVE
"3ppl?"
ils __fussent__
FUTURE
1
SG
INDICATIVE
IMPERFECTIVE
"1sgf"
je __serai__
FUTURE
2
SG
INDICATIVE
IMPERFECTIVE
"2sgf"
tu __seras__
FUTURE
3
SG
INDICATIVE
IMPERFECTIVE
"3sgf"
il __sera__
FUTURE
1
PL
INDICATIVE
IMPERFECTIVE
"1plf"
nous __serons__
FUTURE
2
PL
INDICATIVE
IMPERFECTIVE
"2plf"
vous __serez__
FUTURE
3
PL
INDICATIVE
IMPERFECTIVE
"3plf"
ils __seron__
Instead of optional parameters, a single short alias, or PARTICIPLE or PAST+PARTICIPLE can also be given. With no
parameters, the infinitive form of the verb is returned.
French adjectives inflect with an -e, -s
or -es suffix depending on gender.
There are many irregular cases (e.g., curieux → une fille curieuse).
You can get the base form with the predicative() function. A statistical
approach is used with an accuracy of 95%.
>>> from pattern.fr import predicative
>>> print predicative('curieuse')
curieux
Sentiment analysis
For opinion mining there is sentiment(), which returns a (polarity, subjectivity)-tuple, based on a lexicon of
adjectives. Polarity is a value between -1.0 and +1.0, subjectivity between 0.0 and 1.0.
The accuracy is around 74% (P 0.77, R 0.73) for book reviews:
For parsing there is parse(), parsetree() and split(). The parse() function annotates words in the given
string with their part-of-speech
tags (e.g.,
NN for nouns and VB for verbs). The parsetree() function takes a string and
returns a tree of nested objects (Text → Sentence → Chunk → Word). The split() function takes the output of parse() and returns a Text. See the pattern.en documentation (here) how to
manipulate Text objects.
>>> from pattern.fr import parse, split
>>>
>>> s = parse(u"Le chat noir s'était assis sur le tapis.")
>>> for sentence in split(s):
>>> print sentence
Sentence('Le/DT/B-NP/O chat/NN/I-NP/O noir/JJ/I-NP/O'
"s'/PRP/B-NP/O était/VB/B-VP/O assis/VBN/I-VP/O"
'sur/IN/B-PP/B-PNP le/DT/B-NP/I-PNP tapis/NN/I-NP/I-PNP ././O/O')
The parser is based on
Lefff. For words in
Lefff that can have multiple part-of-speech tags, we used
Lexique to find the most frequent POS-tag.
References:
Sagot, B. (2010). The Lefff, a freely available and large-coverage
morphological and syntantic lexicon for French. Proceedings of
LREC'10.
New, B., Pallier, C., Ferrand, L. & Matos, R. (2001). A lexical database
for contemporary french: LEXIQUE. L'année Psychologique.