Repository files navigation Projects in Statistical Methods & Text Analysis
1. Categorizing news articles
Given a bunch of Reuters news service articles, develop a set of labels for categorizing them
Labels should be a single word or short phrase. Some articles might fit more than one label, and some might not fit any.
Aim for about 10–15 labels, give or take
Use methods from labs so far (keyword analysis, terminology extraction, topic models)
No specific ‘correct’ answer; the process you use to develop the list is more important than the solution.
List of labels
For each label, the number of articles from the dataset that fit that label
The number of articles that don't fit any of the labels (ideally this won't be a big number)
Annotated notebook showing your process
2. Classifying Mislabeled Wine
Presented with mislabeled wine bottles, construct a classifier to determine which wine is which
Find words that the model is using to predict labels (model coefficients, interpretation through LIME)
Find the amount of test examples excluded for F1>.81
Improve accuracy by using label changes. Use confusion matrix for a new labeling scheme.
3. Topic Models for Amazon video game reviews
multiword expressions, candidate terms, c-values, tokenization, hyperparameter tuning, wordclouds
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.