docs: replace roadmap with project-status checklist by andygrove · Pull Request #1 · andygrove/datafusion-java · GitHub
Skip to content

docs: replace roadmap with project-status checklist#1

Closed
andygrove wants to merge 15 commits into
mainfrom
docs/readme-project-status
Closed

docs: replace roadmap with project-status checklist#1
andygrove wants to merge 15 commits into
mainfrom
docs/readme-project-status

Conversation

@andygrove

Copy link
Copy Markdown
Owner

Summary

Restructure the README around a feature-status checklist covering the
three ways to execute queries today (SQL, limited DataFrame, DataFusion-Proto
logical plans) plus data-source, result-materialization, and not-yet
sections, and tighten the prose throughout.

Build prerequisites, build/test commands, and the TPC-H test-data
section move from README into CONTRIBUTING.md so the README focuses on
using the library rather than building it.

andygrove and others added 15 commits May 12, 2026 17:25
Seed the project with a minimal end-to-end JNI binding from the JVM to
Apache DataFusion, plus the build, format, and license-check tooling
needed for ongoing contribution.

Java surface (org.apache.datafusion):
- SessionContext: AutoCloseable session, sql(String) returning a lazy
  DataFrame, registerParquet(String, String) for registering local
  Parquet files as SQL tables.
- DataFrame: AutoCloseable, collect(BufferAllocator) executes the plan
  and returns result batches as an Arrow ArrowReader via the Arrow C
  Data Interface. collect() consumes the DataFrame; close() releases
  the native plan if never collected.

Native side (native/, crate datafusion-jni):
- JNI entry points for SessionContext create/close/registerParquet/
  createDataFrame and DataFrame collect/close.
- Results are exported as FFI_ArrowArrayStream so the JVM reads batches
  without per-row JNI crossings or row-by-row copies.

Build and contributor tooling:
- pom.xml with Maven wrapper, JUnit 5, Arrow 19, JDK 17 toolchain.
- apache-rat-plugin (license-header check) and spotless-maven-plugin
  (google-java-format) both bound to the verify phase.
- Makefile targets for native build, JVM build, test, clean, and TPC-H
  SF1 test data generation via tpchgen-cli.
- GitHub Actions workflow running spotless:check and cargo fmt --check
  on push and pull_request to main.
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Re-enable datafusion's default features (parquet, sql) and add arrow
dependency with the ffi feature so FFI_ArrowArrayStream, ctx.sql, and
register_parquet compile again.
feat: initial seed of Apache DataFusion Java bindings
## Summary

- Removes the entire `protected_branches` block from `.asf.yaml`.
- Removes `.github/workflows/format.yml`.

## Rationale

GitHub Actions has not yet been enabled on the apache repo, so the
`Format` workflow ends in `startup_failure` on every run and never
reports the `Java (spotless)` / `Rust (cargo fmt)` status checks.
Combined with `protected_branches.main.required_status_checks` requiring
those contexts, PRs are unmergeable.

This change clears both sides of the deadlock: the protection block is
removed so future PRs aren't blocked, and the workflow is removed so
we're not carrying a check definition that can't run. Branch protection
and CI can be reintroduced once Actions is enabled by INFRA and the
project's review/CI practices have stabilized.
apache#11)

## Summary

- Sets `github.protected_branches.main: ~` in `.asf.yaml`.

## Rationale

The previous PR (apache#7) removed the `protected_branches` block entirely,
but ASF INFRA only applies settings that are present in `.asf.yaml` — it
does not clear settings when keys are simply omitted. The required
status checks (`Java (spotless)`, `Rust (cargo fmt)`) therefore remain
configured on `main` and continue to block every PR, because the
workflow that would produce those check contexts is also gone.

Explicitly setting `protected_branches.main` to YAML null (`~`) tells
INFRA to clear the existing branch protection rules on `main`. Branch
protection can be reintroduced later once CI is in place.
## Summary

- Adds `.github/workflows/build.yml` running on pushes to `main` and on
PRs targeting `main`.
- Sets up Temurin JDK 17 and a stable Rust toolchain, caches Maven
(`~/.m2`) and cargo (`native/target`) artifacts, then runs `make test`.

## Rationale

The project currently has no CI: `format.yml` was removed in apache#7 to clear
a deadlock when GitHub Actions wasn't yet enabled, and apache#11 nulled out
the stale branch protection rules left behind. With Actions now in
place, we need at least one workflow that exercises the build on every
PR so regressions surface before merge.

`make test` is the canonical full build entry point documented in
`README.md` and `CONTRIBUTING.md` — it depends on the `native` target
(so it builds the Rust crate first) and then runs the JVM JUnit suite,
which is exactly what CI needs to cover.

## What's in this PR

- `.github/workflows/build.yml`: single `build` job on `ubuntu-latest`,
JDK 17 (Temurin) with Maven cache via `actions/setup-java`, Rust stable
via `dtolnay/rust-toolchain`, cargo cache via `Swatinem/rust-cache`
scoped to the `native` workspace, then `make test`.

## Not in this PR

- Spotless / Apache RAT / clippy / `cargo fmt` checks. These run at
Maven's `verify` phase or via separate cargo commands and aren't wired
into the Makefile yet; they can be added in a follow-up once we decide
whether to extend the Makefile or invoke them directly from CI.
- Matrix builds across OS or JDK versions — start minimal, expand if
needed.
Rewrite the README around a feature checklist covering the three query
interfaces (SQL, limited DataFrame, DataFusion-Proto logical plans),
data sources, result materialization, and not-yet items. Move build
prerequisites, build/test commands, and the TPC-H test-data section
from README into CONTRIBUTING.md so the README focuses on using the
library rather than building it.
@andygrove andygrove closed this May 13, 2026
@andygrove andygrove deleted the docs/readme-project-status branch May 13, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant