iframe-proxy | Sunbelt Computer Software

Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.Rmd	README.Rmd
linear_regression_example.Rproj	linear_regression_example.Rproj
nyas_submission_model.r	nyas_submission_model.r

Repository files navigation

---
title: "Pepsico Data Challenge - Linear Regression Model"
author: "Sam Tauke"
date: "10/14/2020"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Summary

This code is my submission to the 2020 Pepsico Data Science challenge sponsored by the New York Academy of Sciences. The original posting of the challenge description is available
[here](https://www.nyas.org/challenges/pepsico-challenge/?utm_source=PepsiCo+Data+Challenge+2020&utm_campaign=7deec2e1ed-EMAIL_CAMPAIGN_2020_10_08_08_22&utm_medium=email&utm_term=0_0568b79be0-7deec2e1ed-331285254).

The purpose of the challenge was to devise a model that predicts the scores from 28 different assessments on two grain varieties across a number of different testing sites. To do this I built a linear regression model that regresses the assessment scores against a variety of explanatory variables including weather, soil quality, and growth stage. The model allows for variation between testing sites, assessment types, and grain varieties.

The model performs well for most assessment types but there are several instances where there are clear failings in its predictive abilities. Those are easily identifiable as the assessment_score/grain variety combinations where we see very low adjusted r-squared values. Future projects will focus on improving the reliability of the model for these specific instances and validating that a linear model is appropriate for this data.

## Instructions

1. Downloading the data
* Users of this code will need to download the data either from the challenge description [website](https://www.nyas.org/challenges/pepsico-challenge/?utm_source=PepsiCo+Data+Challenge+2020&utm_campaign=7deec2e1ed-EMAIL_CAMPAIGN_2020_10_08_08_22&utm_medium=email&utm_term=0_0568b79be0-7deec2e1ed-331285254) (if it is still live) or from my excel live [folder](https://1drv.ms/x/s!AkFlokYzR3chglaicMpBTYD59LqN?e=3R4gy0).
* The data should be saved with its current name: "nyas-challenge-2020-data.xlsx" as an excel file
* The data should be saved in the same directory as the R Code

2. Run the Code
* Run the code file "nyas_submission_model.r"
* The file requires that certain libraries be installed and loaded by the program
3. Output
* The code file exports a csv called "predicted_scores.csv" to your working directory. This csv contains all of the original observations and the predicted score or text indicating that no score was predicted due to the date of the observation or data errors.

## Contact

If you have questions I can be reached at samtauke@gmail.com

About

Application of linear regression model to real world data science problem

Readme

Activity

0 stars

1 watching

0 forks

Report repository

Languages

R 100.0%

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages