GitHub - bigcode-project/the-stack-v2: Code for the curation of The Stack v2 and StarCoder2 training data · GitHub
Skip to content

bigcode-project/the-stack-v2

Folders and files

Repository files navigation

The Stack v2 & StarCoder2Data

In this repository you can find the code for building The Stack v2 dataset, as well as the extra sources used to make StarCoder2data: the training corpus of the StarCoder2 family of models.

This reposirory is a follow-up of on the work in bigcode-dataset used for The Stack v1 and StarCoderData.

About

Code for the curation of The Stack v2 and StarCoder2 training data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors