GitHub - xtroubleclef/stat-methods_text-analysis · GitHub
Skip to content

xtroubleclef/stat-methods_text-analysis

Folders and files

Repository files navigation

Projects in Statistical Methods & Text Analysis

1. Categorizing news articles

Your task

  • Given a bunch of Reuters news service articles, develop a set of labels for categorizing them
  • Labels should be a single word or short phrase. Some articles might fit more than one label, and some might not fit any.
  • Aim for about 10–15 labels, give or take
  • Use methods from labs so far (keyword analysis, terminology extraction, topic models)
  • No specific ‘correct’ answer; the process you use to develop the list is more important than the solution.

Deliverables

  • List of labels
  • For each label, the number of articles from the dataset that fit that label
  • The number of articles that don't fit any of the labels (ideally this won't be a big number)
  • Annotated notebook showing your process

2. Classifying Mislabeled Wine

Your task

  • Presented with mislabeled wine bottles, construct a classifier to determine which wine is which
  • Find words that the model is using to predict labels (model coefficients, interpretation through LIME)
  • Find the amount of test examples excluded for F1>.81
  • Improve accuracy by using label changes. Use confusion matrix for a new labeling scheme.

3. Topic Models for Amazon video game reviews

  • multiword expressions, candidate terms, c-values, tokenization, hyperparameter tuning, wordclouds

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors