{{ message }}
docs: replace roadmap with project-status checklist#1
Closed
andygrove wants to merge 15 commits into
Closed
Conversation
Seed the project with a minimal end-to-end JNI binding from the JVM to Apache DataFusion, plus the build, format, and license-check tooling needed for ongoing contribution. Java surface (org.apache.datafusion): - SessionContext: AutoCloseable session, sql(String) returning a lazy DataFrame, registerParquet(String, String) for registering local Parquet files as SQL tables. - DataFrame: AutoCloseable, collect(BufferAllocator) executes the plan and returns result batches as an Arrow ArrowReader via the Arrow C Data Interface. collect() consumes the DataFrame; close() releases the native plan if never collected. Native side (native/, crate datafusion-jni): - JNI entry points for SessionContext create/close/registerParquet/ createDataFrame and DataFrame collect/close. - Results are exported as FFI_ArrowArrayStream so the JVM reads batches without per-row JNI crossings or row-by-row copies. Build and contributor tooling: - pom.xml with Maven wrapper, JUnit 5, Arrow 19, JDK 17 toolchain. - apache-rat-plugin (license-header check) and spotless-maven-plugin (google-java-format) both bound to the verify phase. - Makefile targets for native build, JVM build, test, clean, and TPC-H SF1 test data generation via tpchgen-cli. - GitHub Actions workflow running spotless:check and cargo fmt --check on push and pull_request to main.
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Re-enable datafusion's default features (parquet, sql) and add arrow dependency with the ffi feature so FFI_ArrowArrayStream, ctx.sql, and register_parquet compile again.
feat: initial seed of Apache DataFusion Java bindings
## Summary - Removes the entire `protected_branches` block from `.asf.yaml`. - Removes `.github/workflows/format.yml`. ## Rationale GitHub Actions has not yet been enabled on the apache repo, so the `Format` workflow ends in `startup_failure` on every run and never reports the `Java (spotless)` / `Rust (cargo fmt)` status checks. Combined with `protected_branches.main.required_status_checks` requiring those contexts, PRs are unmergeable. This change clears both sides of the deadlock: the protection block is removed so future PRs aren't blocked, and the workflow is removed so we're not carrying a check definition that can't run. Branch protection and CI can be reintroduced once Actions is enabled by INFRA and the project's review/CI practices have stabilized.
apache#11) ## Summary - Sets `github.protected_branches.main: ~` in `.asf.yaml`. ## Rationale The previous PR (apache#7) removed the `protected_branches` block entirely, but ASF INFRA only applies settings that are present in `.asf.yaml` — it does not clear settings when keys are simply omitted. The required status checks (`Java (spotless)`, `Rust (cargo fmt)`) therefore remain configured on `main` and continue to block every PR, because the workflow that would produce those check contexts is also gone. Explicitly setting `protected_branches.main` to YAML null (`~`) tells INFRA to clear the existing branch protection rules on `main`. Branch protection can be reintroduced later once CI is in place.
## Summary - Adds `.github/workflows/build.yml` running on pushes to `main` and on PRs targeting `main`. - Sets up Temurin JDK 17 and a stable Rust toolchain, caches Maven (`~/.m2`) and cargo (`native/target`) artifacts, then runs `make test`. ## Rationale The project currently has no CI: `format.yml` was removed in apache#7 to clear a deadlock when GitHub Actions wasn't yet enabled, and apache#11 nulled out the stale branch protection rules left behind. With Actions now in place, we need at least one workflow that exercises the build on every PR so regressions surface before merge. `make test` is the canonical full build entry point documented in `README.md` and `CONTRIBUTING.md` — it depends on the `native` target (so it builds the Rust crate first) and then runs the JVM JUnit suite, which is exactly what CI needs to cover. ## What's in this PR - `.github/workflows/build.yml`: single `build` job on `ubuntu-latest`, JDK 17 (Temurin) with Maven cache via `actions/setup-java`, Rust stable via `dtolnay/rust-toolchain`, cargo cache via `Swatinem/rust-cache` scoped to the `native` workspace, then `make test`. ## Not in this PR - Spotless / Apache RAT / clippy / `cargo fmt` checks. These run at Maven's `verify` phase or via separate cargo commands and aren't wired into the Makefile yet; they can be added in a follow-up once we decide whether to extend the Makefile or invoke them directly from CI. - Matrix builds across OS or JDK versions — start minimal, expand if needed.
Rewrite the README around a feature checklist covering the three query interfaces (SQL, limited DataFrame, DataFusion-Proto logical plans), data sources, result materialization, and not-yet items. Move build prerequisites, build/test commands, and the TPC-H test-data section from README into CONTRIBUTING.md so the README focuses on using the library rather than building it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Restructure the README around a feature-status checklist covering the
three ways to execute queries today (SQL, limited DataFrame, DataFusion-Proto
logical plans) plus data-source, result-materialization, and not-yet
sections, and tighten the prose throughout.
Build prerequisites, build/test commands, and the TPC-H test-data
section move from README into CONTRIBUTING.md so the README focuses on
using the library rather than building it.