Sunbelt Computer Software

SQL Server big data clusters

Pre-requisites

Kubernetes cluster configuration & Kubectl command-line utility
Curl utility
Sqlcmd and bcp utility (Installation instructions here for Linux and here for Windows)
Azure Data Studio or SQL Server Management Studio
SQL Server 2019 big data cluster

Installation instructions for SQL Server 2019 big data cluster can be found here.

Samples Setup

Before you begin, run the CMD script called bootstrap-sample-db.cmd or the shell script bootstrap-sample-db.sh depending on your platform. This script does the following operations:

Downloads the tpcx-bb 1GB sample database
Restores the database on the SQL Master instance
Executes the bootstrap-sample-db.SQL script
Exports the web_clickstreams, inventory, customer & product_reviews tables to files
Uploads the web_clickstreams CSV file to the HDFS inside the SQL Server 2019 big data cluster

data-pool

SQL Server 2019 big data cluster contains a data pool which consists of many SQL Server instances to store data & query in a scale-out manner.

Data ingestion using Spark

The sample script data-pool/data-ingestion-spark.sql shows how to perform data ingestion from Spark into data pool table(s).

Data ingestion using sql

The sample script data-pool/data-ingestion-sql.sql shows how to perform data ingestion from T-SQL into data pool table(s).

data-virtualization

SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources.

External table over Storage Pool

SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The data-virtualization/storage-pool folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.

External table over Oracle

SQL Server 2019 uses new ODBC connectors to enable connectivity to SQL Server, Oracle, Teradata, MongoDB and generic ODBC data sources.

The data-virtualization/oracle folder contains samples that demonstrate how to query data in Oracle using external tables.

deployment

The deployment folder contains the scripts for deploying a Kubernetes cluster for SQL Server 2019 big data cluster.

machine-learning

SQL Server 2016 added support executing R scripts from T-SQL. SQL Server 2017 added support for executing Python scripts from T-SQL. SQL Server 2019 adds support for executing Java code from T-SQL. SQL Server 2019 big data cluster adds support for executing Spark code inside the big data cluster.

SQL Server Machine Learning Services

The machine-learning\sql folder contains the sample SQL scripts that show how to invoke R, Python, and Java code from T-SQL.

Spark Machine Learning

The machine-learning\spark folder contains the Spark samples.

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

SQL Server big data clusters

Pre-requisites

Samples Setup

Data ingestion using Spark

Data ingestion using sql

External table over Storage Pool

External table over Oracle

SQL Server Machine Learning Services

Spark Machine Learning

Name		Name	Last commit message	Last commit date
parent directory ..
data-pool		data-pool
data-virtualization		data-virtualization
deployment		deployment
machine-learning		machine-learning
spark		spark
README.md		README.md
bootstrap-sample-db.cmd		bootstrap-sample-db.cmd
bootstrap-sample-db.sh		bootstrap-sample-db.sh
bootstrap-sample-db.sql		bootstrap-sample-db.sql

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

FilesExpand file tree

sql-big-data-cluster

Directory actions

More options

Directory actions

More options

Latest commit

History

sql-big-data-cluster

Folders and files

parent directory

README.md

SQL Server big data clusters

Pre-requisites

Samples Setup

Data ingestion using Spark

Data ingestion using sql

External table over Storage Pool

External table over Oracle

SQL Server Machine Learning Services

Spark Machine Learning