GitHub - airbnb/tbl-diff · GitHub
Skip to content

airbnb/tbl-diff

Folders and files

Table Differ

Compare two SQL tables at different levels of granularity:

  • Row count diff
  • Aggregate table level NULL percent and approx distinct count diff for each column
  • Divide the table horizontally using dimension columns and provide diff for each non-dimension column

Installation

# Basic install
pip install -e .

# With Trino support
pip install -e ".[trino]"

# With DuckDB support
pip install -e ".[duckdb]"

# With Spark support
pip install -e ".[spark]"

# With dev tools
pip install -e ".[dev]"

# Using uv (recommended)
uv pip install -e ".[trino]"

Usage

  1. Provide left and right SQL queries without any need for join condition

Configuration

  1. Configure SQL_ENGINE in variables.py
  2. Implement the connection to SQL_ENGINE in variables.py via get_fetch_raw_query_function
  3. For S3 storage of diff results: set S3_ENABLED=True and configure S3 details in variables.py

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages