This file provides guidance for AI coding agents working with the Apache Flink codebase.
- Java 11, 17 (default), or 21. Java 11 syntax must be used in all modules. Java 17 syntax (records, sealed classes, pattern matching) is only permitted in the
flink-tests-java17module. - Maven 3.8.6 (Maven wrapper
./mvnwincluded; prefer it) - Git
- Unix-like environment (Linux, macOS, WSL, Cygwin)
- Fast dev build:
./mvnw clean install -DskipTests -Dfast -Pskip-webui-build -T1C - Full build (Java 17 default):
./mvnw clean package -DskipTests -Djdk17 -Pjava17-target - Java 11:
./mvnw clean package -DskipTests -Djdk11 -Pjava11-target - Java 21:
./mvnw clean package -DskipTests -Djdk21 -Pjava21-target - Full build with tests:
./mvnw clean verify - Single module:
./mvnw clean package -DskipTests -pl flink-core-api - Single module with tests:
./mvnw clean verify -pl flink-core-api
- Single test class:
./mvnw -pl flink-core-api -Dtest=MemorySizeTest test - Single test method:
./mvnw -pl flink-core-api -Dtest=MemorySizeTest#testParseBytes test
- Format code (Java + Scala):
./mvnw spotless:apply - Check formatting:
./mvnw spotless:check - Checkstyle:
./mvnw checkstyle:check -T1C - Checkstyle config:
tools/maven/checkstyle.xml
Every module from the root pom.xml, organized by function. Flink provides three main user-facing APIs (recommended in this order: SQL, Table API, DataStream API) plus a newer DataStream v2 API.
flink-annotations— Stability annotations (@Public,@PublicEvolving,@Internal,@Experimental) and@VisibleForTestingflink-core-api— Core API interfaces (functions, state, types) shared by all APIsflink-core— Core implementation (type system, serialization, memory management, configuration)flink-runtime— Distributed runtime (JobManager, TaskManager, scheduling, network, state)flink-clients— CLI and client-side job submissionflink-rpc/— RPC frameworkflink-rpc-core— RPC interfacesflink-rpc-akka,flink-rpc-akka-loader— Pekko-based RPC implementation
flink-table/flink-sql-parser— SQL parser (extends Calcite SQL parser)flink-table-common— Shared types, descriptors, catalog interfacesflink-table-api-java— Table API for Javaflink-table-api-scala— Table API for Scalaflink-table-api-bridge-base,flink-table-api-java-bridge,flink-table-api-scala-bridge— Bridges between Table and DataStream APIsflink-table-api-java-uber— Uber JAR for Table APIflink-table-planner— SQL/Table query planning and optimization (Calcite-based)flink-table-planner-loader,flink-table-planner-loader-bundle— Classloader isolation for plannerflink-table-runtime— Runtime operators for Table/SQL queriesflink-table-calcite-bridge— Bridge to Apache Calciteflink-sql-gateway-api,flink-sql-gateway— SQL Gateway for remote SQL executionflink-sql-client— Interactive SQL CLIflink-sql-jdbc-driver,flink-sql-jdbc-driver-bundle— JDBC driver for SQL Gatewayflink-table-code-splitter— Code generation utilitiesflink-table-test-utils— Test utilities for Table/SQL
flink-streaming-java— DataStream API and stream processing operator implementations
flink-datastream-api— DataStream v2 API definitionsflink-datastream— DataStream v2 API implementation
flink-connectors/flink-connector-base— Base classes for source/sink connectorsflink-connector-files— Unified file system source and sinkflink-connector-datagen— DataGen source for testingflink-connector-datagen-test— Tests for DataGen connectorflink-hadoop-compatibility— Hadoop InputFormat/OutputFormat compatibilityflink-file-sink-common— Common file sink utilities
- Most connectors (Kafka, JDBC, Elasticsearch, etc.) live in separate repos under github.com/apache; see README.md for the full list
flink-formats/flink-json,flink-csv,flink-avro,flink-parquet,flink-orc,flink-protobuf— Serialization formatsflink-avro-confluent-registry— Avro with Confluent Schema Registryflink-sequence-file,flink-compress,flink-hadoop-bulk,flink-orc-nohive— Hadoop-related formatsflink-format-common— Shared format utilitiesflink-sql-json,flink-sql-csv,flink-sql-avro,flink-sql-parquet,flink-sql-orc,flink-sql-protobuf— SQL-layer format integrationsflink-sql-avro-confluent-registry— SQL-layer Avro with Confluent Schema Registry
flink-state-backends/flink-statebackend-rocksdb— RocksDB state backendflink-statebackend-forst— ForSt state backend (experimental; a fork of RocksDB)flink-statebackend-heap-spillable— Heap-based spillable state backendflink-statebackend-changelog— Changelog state backendflink-statebackend-common— Shared state backend utilities
flink-dstl/flink-dstl-dfs— State changelog storage (DFS-based persistent changelog for incremental checkpointing)
flink-filesystems/flink-hadoop-fs— Hadoop FileSystem abstractionflink-s3-fs-hadoop,flink-s3-fs-presto,flink-s3-fs-base— S3 file systemsflink-oss-fs-hadoop— Alibaba OSSflink-azure-fs-hadoop— Azure Blob Storageflink-gs-fs-hadoop— Google Cloud Storageflink-fs-hadoop-shaded— Shaded Hadoop dependencies
flink-queryable-state/flink-queryable-state-runtime— Server-side queryable state serviceflink-queryable-state-client-java— Client for querying operator state from running jobs
flink-kubernetes— Kubernetes integrationflink-yarn— YARN integrationflink-dist,flink-dist-scala— Distribution packagingflink-container— Container entry-point and utilities for containerized deployments
flink-metrics/flink-metrics-core— Metrics API and core implementation- Reporter implementations:
flink-metrics-jmx,flink-metrics-prometheus,flink-metrics-datadog,flink-metrics-statsd,flink-metrics-graphite,flink-metrics-influxdb,flink-metrics-slf4j,flink-metrics-dropwizard,flink-metrics-otel
flink-libraries/flink-cep— Complex Event Processingflink-state-processing-api— Offline state access (savepoint reading/writing)
flink-models— AI model integration (sub-modules:flink-model-openai,flink-model-triton)flink-python— PyFlink (Python API)flink-runtime-web— Web UI for JobManager dashboardflink-external-resources— External resource management (e.g., GPU)docs/— Documentation content (Hugo site). This is where user-facing docs are written.flink-docs— Documentation build module (auto-generated config reference docs)flink-examples— Example programsflink-quickstart— Maven archetype for new projectsflink-walkthroughs— Tutorial walkthrough projects
flink-tests— Integration testsflink-end-to-end-tests— End-to-end testsflink-test-utils-parent— Test utility classesflink-yarn-tests— YARN-specific testsflink-fs-tests— FileSystem testsflink-architecture-tests— ArchUnit architectural boundary teststools/ci/flink-ci-tools— CI tooling
- Client submits jobs to the cluster. Submission paths include the CLI (
bin/flink runviaflink-clients), the SQL Client (bin/sql-client.shviaflink-sql-client), the SQL Gateway (flink-sql-gateway, also accessible via JDBC driver), the REST API (direct HTTP to JobManager), programmatic execution (StreamExecutionEnvironment.execute()orTableEnvironment.executeSql()), and PyFlink (flink-python, wraps the Java APIs). - JobManager (
flink-runtime) orchestrates execution: receives jobs, creates the execution graph, manages scheduling, coordinates checkpoints, and handles failover. Never runs user code directly. - TaskManager (
flink-runtime) executes the user's operators in task slots. Manages network buffers, state backends, and I/O. - Table Planner (
flink-table-planner) translates SQL/Table API programs into DataStream programs. The planner is loaded in a separate classloader (flink-table-planner-loader) to isolate Calcite dependencies. - Connectors communicate with external systems. Source connectors implement the
SourceAPI (FLIP-27); sinks implement theSinkAPI (packagesink2). Most connectors are externalized to separate repositories. - State Backends persist keyed state and operator state. RocksDB is the primary backend for production use.
- Checkpointing provides exactly-once guarantees. The JobManager coordinates barriers through the data stream; TaskManagers snapshot local state to a distributed file system.
Key separations:
- Planner vs Runtime: The table planner generates code and execution plans; the runtime executes them. Changes to planning logic live in
flink-table-planner; changes to runtime operators live inflink-table-runtimeorflink-streaming-java. - API vs Implementation: Public API surfaces (
flink-core-api,flink-datastream-api,flink-table-api-java) are separate from implementation modules. API stability annotations control what users can depend on. - ArchUnit enforcement:
flink-architecture-tests/contains ArchUnit tests that enforce module boundaries. New violations should be avoided; if unavoidable, follow the freeze procedure inflink-architecture-tests/README.md.
This section maps common types of Flink changes to the modules they touch and the verification they require.
- Register in
flink-table-commoninBuiltInFunctionDefinitions.java(definition, input/output type strategies, runtime class reference) - Implement in
flink-table-runtimeunderfunctions/(extend the appropriate base class:BuiltInScalarFunction,BuiltInTableFunction,BuiltInAggregateFunction, orBuiltInProcessTableFunction) - Add tests in
flink-table-plannerandflink-table-runtime - Extend Table API support
- Document in
docs/ - See flink-table/flink-table-planner/AGENTS.md and flink-table/flink-table-runtime/AGENTS.md for detailed patterns
- Define
ConfigOption<T>in the relevant config class (e.g.,ExecutionConfigOptions.javainflink-table-api-java) - Use
ConfigOptions.key("table.exec....")builder with type, default value, and description - Add
@Documentation.TableOptionannotation for auto-generated docs - Document in
docs/if user-facing - Verify: unit test for default value, ITCase for behavior change
- Involves
flink-table-runtime(operator),flink-table-planner(ExecNode, physical/logical rules), and tests across both - See flink-table/flink-table-planner/AGENTS.md and flink-table/flink-table-runtime/AGENTS.md for detailed development order and testing patterns
- Implement the
SourceAPI (flink-connector-base):SplitEnumerator,SourceReader,SourceSplit, serializers (SimpleVersionedSerializer) - Or implement the
SinkAPI (packagesink2) for sinks - Most new connectors go in separate repos under
github.com/apache, not in the main Flink repo - Verify: unit tests + ITCase with real or embedded external system
- Changes to
TypeSerializerrequire a correspondingTypeSerializerSnapshotfor migration - Bump version in
getCurrentVersion(), handle old versions inreadSnapshot() - Snapshot must have no-arg constructor for reflection-based deserialization
- Implement
resolveSchemaCompatibility()for upgrade paths - Verify: serializer snapshot migration tests, checkpoint restore tests across versions
- New user-facing API requires a voted FLIP (Flink Improvement Proposal); this applies to
@Public,@PublicEvolving, and@Experimentalsince users build against all three - Every user-facing API class and method must carry a stability annotation
- Changes to existing
@Publicor@PublicEvolvingAPI must maintain backward compatibility @InternalAPIs can be changed freely; users should not depend on them- Update JavaDoc on the changed class/method
- Add to release notes
- Verify: ArchUnit tests pass, no new architecture violations
- Format Java files with Spotless immediately after editing:
./mvnw spotless:apply. Uses google-java-format with AOSP style. - Scala formatting: Spotless + scalafmt (config at
.scalafmt.conf, maxColumn 100). - Checkstyle:
tools/maven/checkstyle.xml(version defined in rootpom.xmlascheckstyle.version). Some modules (flink-core, flink-optimizer, flink-runtime) are not covered by checkstyle enforcement, but conventions should still be followed. - No new Scala code. All Flink Scala APIs are deprecated per FLIP-265. Write all new code in Java.
- Apache License 2.0 header required on all new files (enforced by Apache Rat). Use an HTML comment for markdown files.
- API stability annotations: Every user-facing API class and method must have a stability annotation.
@Public(stable across minor releases),@PublicEvolving(may change in minor releases),@Experimental(may change at any time). These are all part of the public API surface that users build against.@Internalmarks APIs with no stability guarantees that users should not depend on. - Logging: Use parameterized log statements (SLF4J
{}placeholders), never string concatenation. - No Java serialization for new features (except internal RPC message transport).
- Use
finalfor variables and fields where applicable. - Comments: Do not add unnecessary comments that restate what the code does. Add comments that explain "the why" where relevant.
- Reuse existing code. Before implementing new utilities or abstractions, search for existing ones in the codebase. Prioritize architecture consistency and code reusability.
- Full code style guide: https://flink.apache.org/how-to-contribute/code-style-and-quality-preamble/
- Add tests for new behavior, covering success, failure, and edge cases.
- Use JUnit 5 + AssertJ assertions. Do not use JUnit 4 or Hamcrest in new test code.
- Prefer real test implementations over Mockito mocks where possible.
- Integration tests: Name classes with
ITCasesuffix (e.g.,MyFeatureITCase.java). - Red-green verification: For bug fixes, verify that new tests actually fail without the fix before confirming they pass with it.
- Test location mirrors source structure within each module.
- Follow the testing conventions at https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing
[FLINK-XXXX][component] Descriptionwhere FLINK-XXXX is the JIRA issue number[hotfix][component] Descriptionfor typo fixes without JIRA- Each commit must have a meaningful message including the JIRA ID. If you don't know the ticket number, ask.
- Separate cleanup/refactoring from functional changes into distinct commits
- When AI tools were used: add
Generated-by: <Tool Name and Version>trailer per ASF generative tooling guidance
- Title format:
[FLINK-XXXX][component] Title of the pull request - A corresponding JIRA issue is required (except hotfixes for typos)
- Fill out the PR template completely but concisely: describe purpose, change log, testing approach, impact assessment
- Each PR should address exactly one issue
- Ensure
./mvnw clean verifypasses before opening a PR - Always push to your fork, not directly to
apache/flink - Rebase onto the latest target branch before submitting
- Disclose AI usage by checking the AI disclosure checkbox and uncommenting the
Generated-byline in the PR template - Add
Generated-by: <Tool Name and Version>to commit messages - Never add
Co-Authored-Bywith an AI agent as co-author; agents are assistants, not authors - You must be able to explain the design, code, and tests, debug them, and respond to review feedback substantively
- Reviewer-ready quality bar: the author owns PR quality. PRs that look AI-generated without author refinement (walls of unreviewed prose, scaffolding without behaviour, tests that do not exercise the change, padded commit messages) will be closed without review
- Adding or changing
@Public,@PublicEvolving, or@Experimentalannotations (these are user-facing API commitments requiring a FLIP) - Large cross-module refactors
- New dependencies
- Changes to serialization formats (affects state compatibility)
- Changes to checkpoint/savepoint behavior
- Changes that could impact performance on hot paths (per-record processing, serialization, state access)
- Commit secrets, credentials, or tokens
- Push directly to
apache/flink; always work from your fork - Mix unrelated changes into one PR
- Use Java serialization for new features
- Edit generated files by hand when a generation workflow exists
- Use the legacy
SourceFunctionorSinkFunctioninterfaces for connectors; use theSourceAPI (FLIP-27) andSinkAPI (packagesink2) instead - Add
Co-Authored-Bywith an AI agent as co-author in commit messages; AI agents are assistants, not authors. UseGenerated-by: <Tool Name and Version>instead. - Suppress or bypass checkstyle rules (no
CHECKSTYLE:ON/CHECKSTYLE:OFFcomments, no adding entries totools/maven/suppressions.xml, no@SuppressWarnings). Fix the code to satisfy checkstyle instead. - Add, change, or remove classes outside the
org.apache.flink.*package (for example, classes copied from Calcite) - Modify
Parser.jj(Calcite's generated parser grammar; expected to be removed in future Calcite upgrades) - Use destructive git operations unless explicitly requested
- README.md — Build instructions and project overview
- DEVELOPMENT.md — IDE setup and development environment
- .github/CONTRIBUTING.md — Contribution process
- .github/PULL_REQUEST_TEMPLATE.md — PR checklist
- Code Style Guide — Detailed coding guidelines
- ASF Generative Tooling Guidance — AI tooling policy
