GitHub - stephanie-w/kubernetes-bigquery-python: Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub · GitHub
Skip to content

stephanie-w/kubernetes-bigquery-python

 
 

Folders and files

Repository files navigation

Example apps: Real-time data analysis using Google Kubernetes Engine, Redis or PubSub, and BigQuery

This repository contains two related example Google Kubernetes Engine (GKE) apps that show how to build a 'pipeline' to stream data into BigQuery.

The app in the pubsub directory uses Google Cloud PubSub.

The app in the redis subdirectory uses Redis.

How to use

Specify the GCP settings (pubsub topic, BQ project id, dataset and table names) and number of tweets to process in a k8s configmap named tw-conf:

$ kubectl create configmap tw-conf --from-literal=PUBSUB_TOPIC=<projects/your-project/topics/your-topic> --from-literal=BATCH_SIZE=50 --from-literal=TOTAL_TWEETS=10000000 --from-literal=NUM_RETRIES=3 --from-literal=PROJECT_ID=<project_id> --from-literal=BQ_DATASET=<bq_dataset> --from-literal=BQ_TABLE=<bq_table>

Specify the settings to your twitter credentials (consulmer key/secret, app token/secret) in a k8s secret names tw-sec:

$ kubectl create secret generic tw-sec --from-literal=CONSUMERKEY=<consumerkey> --from-literal=CONSUMERSECRET=<consumersecret> --from-literal=ACCESSTOKEN=<accesstoken> --from-literal=ACCESSTOKENSEC=<accesstokensec>

Create pods:

$ kubectl create -f pubsub/bigquery-controller.yaml
$ kubectl create -f pubsub/twitter-stream.yaml

About

Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

  • Python 100.0%