🚀 Version 2.11.0 out now! Read the release notes here..
skpro is a library for supervised probabilistic prediction in python.
It provides scikit-learn-like, scikit-base compatible interfaces to:
- tabular supervised regressors for probabilistic prediction - interval, quantile and distribution predictions
- tabular probabilistic time-to-event and survival prediction - instance-individual survival distributions
- metrics to evaluate probabilistic predictions, e.g., pinball loss, empirical coverage, CRPS, survival losses
- reductions to turn
scikit-learnregressors into probabilisticskproregressors, such as bootstrap or conformal - building pipelines and composite models, including tuning via probabilistic performance metrics
- symbolic probability distributions with value domain of
pandas.DataFrame-s andpandas-like interface
| Overview | |
|---|---|
| Open Source | |
| Tutorials | |
| Community | |
| CI/CD | |
| Code | |
| Downloads | |
| Citation |
| Documentation | |
|---|---|
| ⭐ Tutorials | New to skpro? Here's everything you need to know! |
| 📋 Binder Notebooks | Example notebooks to play with in your browser. |
| 👩💻 User Guides | How to use skpro and its features. |
| ✂️ Extension Templates | How to build your own estimator using skpro's API. |
| 🎛️ API Reference | The detailed reference for skpro's API. |
| 🛠️ Changelog | Changes and version history. |
| 🌳 Roadmap | skpro's software and community development plan. |
| 📝 Related Software | A list of related software. |
Questions and feedback are extremely welcome! We strongly believe in the value of sharing help publicly, as it allows a wider audience to benefit from it.
skpro is maintained by the sktime community, we use the same social channels.
| Type | Platforms |
|---|---|
| 🐛 Bug Reports | GitHub Issue Tracker |
| ✨ Feature Requests & Ideas | GitHub Issue Tracker |
| 👩💻 Usage Questions | GitHub Discussions · Stack Overflow |
| 💬 General Discussion | GitHub Discussions |
| 🏭 Contribution & Development | dev-chat channel · Discord |
| 🌐 Community collaboration session | Discord - Fridays 13 UTC, dev/meet-ups channel |
Our objective is to enhance the interoperability and usability of the AI model ecosystem:
-
skprois compatible with scikit-learn and sktime, e.g., ansktimeproba forecaster can be built with anskproproba regressor which in ansklearnregressor with proba mode added byskpro -
skproprovides a mini-package management framework for first-party implementations, and for interfacing popular second- and third-party components, such as cyclic-boosting, MAPIE, or ngboost packages.
skpro curates libraries of components of the following types:
| Module | Status | Links |
|---|---|---|
| Probabilistic tabular regression | maturing | Tutorial · API Reference · Extension Template |
| Time-to-event (survival) prediction | maturing | Tutorial · API Reference · Extension Template |
| Performance metrics | maturing | API Reference |
| Probability distributions | maturing | Tutorial · API Reference · Extension Template |
To install skpro, use pip:
pip install skproor, with maximum dependencies,
pip install skpro[all_extras]Releases are available as source packages and binary wheels. You can see all available wheels here.
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from skpro.regression.residual import ResidualDouble
# step 1: data specification
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, y_test = train_test_split(X, y)
# step 2: specifying the regressor - any compatible regressor is valid!
# example - "squaring residuals" regressor
# random forest for mean prediction
# linear regression for variance prediction
reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)
# step 3: fitting the model to training data
reg_proba.fit(X_train, y_train)
# step 4: predicting labels on new data
# probabilistic prediction modes - pick any or multiple
# full distribution prediction
y_pred_proba = reg_proba.predict_proba(X_new)
# interval prediction
y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)
# quantile prediction
y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])
# variance prediction
y_pred_var = reg_proba.predict_var(X_new)
# mean prediction is same as "classical" sklearn predict, also available
y_pred_mean = reg_proba.predict(X_new)# step 5: specifying evaluation metric
from skpro.metrics import CRPS
metric = CRPS() # continuous rank probability score - any skpro metric works!
# step 6: evaluat metric, compare predictions to actuals
metric(y_test, y_pred_proba)
>>> 32.19There are many ways to get involved with development of skpro, which is
developed by the sktime community.
We follow the all-contributors
specification: all kinds of contributions are welcome - not just code.
To cite skpro in a scientific publication, see citations.

