GitHub - datamindedbe/eu-data-platform: Spin up a minimalistic Data Analytics Platform on a European cloud provider · GitHub
Skip to content

datamindedbe/eu-data-platform

Folders and files

Repository files navigation

Data Platform Stack

This repository contains the code discussed in the following blogposts:

The project consists of the infrastructure for each of the EU cloud providers as well as the opensource components that make up the data platform.

  • The open source components are:
    • Trino
    • Airflow
    • Open Policy Agent
    • Hashicorp Vault
    • ArgoCD
    • Zitadel
  • The infrastructure required for the following EU-based providers
    • OVH
    • Scaleway
    • UpCloud
    • Exoscale

Architecture

Architecture

The core of the platform is a Trino cluster, providing a SQL-like interface to data. This is used by

  • data engineers who can query the data via a database client
  • jobs scheduled by Airflow

Supporting components:

  • Zitadel - for single sign on
  • ArgoCD - for application deployment
  • Vault - for secrets management
  • Open Policy Agent - for authorization of Trino queries

Interacting with the Platform

The Data Engineers interact with the platform via the Airflow UI and via a database client connecting to Trino.

Deploying the platform

Tools needed

Requirements

Infra deployment

  • pick a provider in the infra folder and follow the instructions from the README.md in that folder
  • follow the readme in the bootstrap-data-platform folder to setup argocd.
  • if needed, continue bootstrapping the platform with the relevant infra provider (databases, credentials etc)

Deploy some ETL jobs

  • deploy the Airflow DAGs

Contributors

About

Spin up a minimalistic Data Analytics Platform on a European cloud provider

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors