iframe-proxy

edg956 · 2026-03-14T02:05:50Z

Summary

Implements #21475

Add support for auto-classification (PII detection) on storage service containers (S3, GCS, etc.), enabling automatic tagging of sensitive data in files stored in cloud storage.

This extends OpenMetadata's existing auto-classification capabilities from database tables to storage containers with structured data (CSV, Parquet, etc.).

Changes

Schema & API

Add sampleData field to container.json schema for storing sample data
Create storageServiceAutoClassificationPipeline.json schema defining configuration for storage service auto-classification workflows
Add REST endpoints in ContainerResource for sample data operations: PUT/GET/DELETE /{id}/sampleData

Backend (Java)

ContainerRepository: Implement sample data persistence and retrieval from entity_extension table
EntityRepository: Refactor validateColumn() to support both Table and Container column validation
PIIMasker: Extend PII masking to support Container entities with proper tag-based masking
ContainerResource: Add authorization for VIEW_SAMPLE_DATA and EDIT_SAMPLE_DATA operations

Ingestion Framework (Python)

Storage Samplers: Implement StorageSampler base class with S3 and GCS concrete implementations for reading structured files
Fetcher Strategy: Add StorageFetcherStrategy for fetching Container entities from storage services
SamplerProcessor: Extend to handle Container entities alongside Table entities
PII Processor: Update to classify container columns using ClassifiableEntityType union (Table | Container)
Metadata Sink: Add Container sample data ingestion via OMetaContainerMixin
Patch Mixin: Support Container dataModel column tag updates

Testing

Add comprehensive integration tests for container classification (MinIO/S3 with PII detection)
Add unit tests for StorageFetcherStrategy filtering and SamplerProcessor container handling
Reorganize auto-classification tests by entity type (databases/ and containers/)

Bug Fixes

Fix sample data not being retrieved when requesting container with fields=["sampleData"] - added proper field handling in ContainerRepository.setFields()

Type of Change

New feature
Bug fix (sample data retrieval)

Test Plan

Unit Tests

cd ingestion
pytest tests/unit/profiler/test_container_fetcher.py
pytest tests/unit/sampler/test_container_sampler_processor.py

Integration Tests

cd ingestion
pytest tests/integration/auto_classification/containers/

Manual Testing

Configure a storage service (S3/GCS) with structured files containing PII
Run storage service metadata ingestion
Run storage service auto-classification pipeline with storeSampleData: true
Verify containers have:
- PII tags on sensitive columns (email, SSN, credit card, etc.)
- Sample data stored and retrievable via API
- PII masking applied when user lacks authorization

Checklist

I have read the CONTRIBUTING document
I have added tests around the new logic
I have added a test that covers the bug fix scenario
For JSON Schema changes: Updated to add new pipeline type and container field
Code formatted with mvn spotless:apply, make py_format

Summary by Gitar

UI/UX Enhancements:
- Added EntityTabs.SAMPLE_DATA to CONTAINER_DEFAULT_TABS to expose the new sample data feature in the UI.

_{This will update automatically on new commits.}

github-actions · 2026-03-27T13:49:49Z

github-actions · 2026-03-27T13:56:57Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpng-dev`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng-dev`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (15)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.5	3.1.8
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-03-27T13:57:47Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Authlib`	CVE-2026-27962	🔥 CRITICAL	1.6.6	1.6.9
`Authlib`	CVE-2026-28490	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28498	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28802	🚨 HIGH	1.6.6	1.6.7
`PyJWT`	CVE-2026-32597	🚨 HIGH	2.10.1	2.12.0
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.5	3.1.8
`apache-airflow-providers-http`	CVE-2025-69219	🚨 HIGH	5.6.0	6.0.0
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`google-cloud-aiplatform`	CVE-2026-2472	🚨 HIGH	1.130.0	1.131.0
`google-cloud-aiplatform`	CVE-2026-2473	🚨 HIGH	1.130.0	1.133.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`pyasn1`	CVE-2026-30922	🚨 HIGH	0.6.1	0.6.3
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`tornado`	CVE-2026-31958	🚨 HIGH	6.5.3	6.5.5
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.5	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2025-61726	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61728	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2026-25679	🚨 HIGH	v1.25.5	1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-03-27T15:39:28Z

🔴 Playwright Results — 3 failure(s), 18 flaky

✅ 3944 passed · ❌ 3 failed · 🟡 18 flaky · ⏭️ 86 skipped

Shard	Passed	Failed	Flaky	Skipped
🔴 Shard 1	295	1	1	4
🟡 Shard 2	753	0	6	8
🟡 Shard 3	730	0	2	7
🟡 Shard 4	751	0	1	18
🟡 Shard 5	686	0	1	41
🔴 Shard 6	729	2	7	8

Genuine Failures (failed on all attempts)

❌ Pages/SearchSettings.spec.ts › Restore default search settings (shard 1)

Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

�[32m- Expected  - 0�[39m
�[31m+ Received  + 5�[39m

�[33m@@ -45,10 +45,15 @@�[39m
�[2m        "boost": 20,�[22m
�[2m        "field": "displayName.keyword",�[22m
�[2m        "matchType": "exact",�[22m
�[2m      },�[22m
�[2m      Object {�[22m
�[31m+       "boost": 20,�[39m
�[31m+       "field": "name.keyword",�[39m
�[31m+       "matchType": "exact",�[39m
�[31m+     },�[39m
�[31m+     Object {�[39m
�[2m        "boost": 10,�[22m
�[2m        "field": "name",�[22m
�[2m        "matchType": "phrase",�[22m
�[2m      },�[22m
�[2m      Object {�[22m

❌ Pages/Glossary.spec.ts › Add and Remove Assets (shard 6)

�[31mTest timeout of 180000ms exceeded.�[39m

❌ Pages/Users.spec.ts › Check permissions for Data Steward (shard 6)

ReferenceError: getApiContext is not defined

🟡 18 flaky test(s) (passed on retry)

Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
Features/DomainFilterQueryFilter.spec.ts › Subdomain assets should be visible when parent domain is selected (shard 2, 1 retry)
Features/DomainFilterQueryFilter.spec.ts › Domain filter should work with different asset types (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
Features/Glossary/MUIGlossaryMutualExclusivity.spec.ts › MUI-ME-S01: Selecting ME child should auto-deselect siblings (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Store Procedure (shard 4, 1 retry)
Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
Pages/InputOutputPorts.spec.ts › Lineage section collapse/expand (shard 6, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Data Model (shard 6, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Spreadsheet (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Extend container entity schema to support sample data storage, enabling PII detection and classification workflows on storage service containers. Changes: - Add sampleData field to container.json for storing sample data - Create storageServiceAutoClassificationPipeline.json schema defining configuration for storage service auto-classification pipelines - Update workflow.json to include StorageServiceAutoClassificationPipeline as a supported pipeline type This provides the schema foundation for running auto-classification workflows on S3, GCS, and other storage service containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implement Java backend functionality to handle sample data ingestion, storage, and PII masking for container entities. Changes: - ContainerRepository: Add sample data retrieval and storage operations - EntityRepository: Extend sample data support to container entities - ContainerResource: Add REST endpoint for container sample data ingestion - PIIMasker: Extend PII masking to support container entities This enables the backend to process and store sample data from storage service containers and apply PII masking rules during data retrieval. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add Container to the ClassifiableEntityType union, enabling PII detection and auto-classification workflows to process storage service containers alongside database tables. Changes: - Update ClassifiableEntityType from Table-only to Union[Table, Container] - Import Container entity type - Update module docstring to reflect current support This type extension allows the PII processor to handle both database tables and storage containers uniformly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implement container-specific API mixin for sample data operations and integrate it into the main OpenMetadata client. Changes: - Add OMetaContainerMixin with ingest_container_sample_data method - Handle binary data encoding (base64) and serialization errors - Register mixin in OpenMetadata class hierarchy - Mirror table sample data ingestion patterns for consistency This provides the Python API layer for ingesting sample data from storage service containers into OpenMetadata. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add sampler implementations for storage services to extract sample data from structured containers (Parquet, CSV) for auto-classification. Changes: - Create base StorageSamplerInterface for storage service sampling - Implement S3Sampler for AWS S3 containers with structured file support - Implement GCSSampler for Google Cloud Storage containers - Support column extraction and data sampling for structured formats - Handle dataModel-based column definitions from containers Storage samplers read container metadata, fetch file contents, and generate sample datasets for downstream PII detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Extend the base PII processor to handle both Table and Container entities with unified column extraction logic. Changes: - Add _get_entity_columns helper to extract columns from Table or Container - Handle Container entities with optional dataModel.columns structure - Improve column matching with safe fallback for missing columns - Use generic entity reference in error reporting - Add early return when entity has no columns to process This enables PII detection to run on storage containers the same way it processes database tables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Extend the sampler processor to handle both database and storage service entities with appropriate sampler class selection. Changes: - Detect service type from source config (Database vs Storage) - Import StorageServiceAutoClassificationPipeline - Handle both Table and Container entity types in _run method - Add column validation for Container entities (via dataModel.columns) - Create storage-specific sampler interfaces for S3 and GCS - Update sampler_interface to support Container entities - Improve error messages with entity type context The processor now dynamically selects database or storage samplers based on the pipeline configuration type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implement fetcher strategy pattern for storage services to retrieve containers for auto-classification workflows. Changes: - Add StorageFetcherStrategy to handle storage service entity fetching - Update EntityFetcher to select appropriate strategy based on service type - Support both DatabaseService and StorageService in strategy selection - Import StorageService type for service detection - Improve error messages with specific service type information The fetcher now dynamically creates database or storage-specific strategies to retrieve entities based on pipeline configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add AutoClassification pipeline support to S3 and GCS storage service specifications, enabling UI and workflow registration. Changes: - Add AutoClassification to S3ServiceSpec supported pipelines - Add AutoClassification to GCSServiceSpec supported pipelines - Import StorageServiceAutoClassificationPipeline in both specs This registers the auto-classification workflow type for storage services in the ingestion framework's service registry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

manerow · 2026-04-14T11:09:46Z

+    "supportsProfiler": {
+      "title": "Supports Profiler",
+      "$ref": "../connectionBasicType.json#/definitions/supportsProfiler"
    }


Flagging this in case it's missed (also called out in my comment on CreateIngestionPipelineImpl). This claims ADLS supports the profiler / auto-classification pipeline, but the python samplers shipped in this PR cover only S3 and GCS (ingestion/src/metadata/ingestion/source/storage/{s3,gcs}/service_spec.py).

With this flag set, the UI will let a user configure an auto-classification pipeline for an ADLS service which then fails at pipeline execution time with a "no sampler registered" style error. Same applies to customStorageConnection.json.

Suggestion is to drop the supportsProfiler addition from adlsConnection.json and customStorageConnection.json for now, and bring them back in the follow-up PRs that ship their samplers.

This is ok because the UI does not let users create ADLS services in OM. This is Collate-specific

IceS2 · 2026-04-14T11:17:03Z

+            logger.debug(
+                f"Container {container.fullyQualifiedName.root} has no dataModel, skipping column tag patch"
+            )


Shouldn't this be a Warning?

IceS2 · 2026-04-14T11:25:18Z

+                and entry.get("Key")
+                and not entry.get("Key").endswith("/")
+                and "/_delta_log/" not in entry.get("Key")
+                and not entry.get("Key").endswith("/_SUCCESS")


Could potentially be extracted

gitar-bot · 2026-04-23T21:48:52Z

Code Review 👍 Approved with suggestions 17 resolved / 18 findings

Adds auto-classification support for storage service containers with comprehensive fixes addressing 14 validation, null-safety, and error-handling issues. Consider forwarding queryText parameter to the POST aggregation path to maintain consistency across both code paths.

💡 Quality: queryText not forwarded in independent (POST) aggregation path

In getAggregationOptions, the new queryText parameter is only passed to the getAggregateFieldOptions (GET) path. When isIndependent is true, postAggregateFieldOptions is called without queryText, so the search text won't influence filter options in the independent mode. This may be intentional if the POST endpoint doesn't support it, but it's worth confirming the behavior is expected.

✅ 17 resolved

✅ Edge Case: Sample data accepted without validation when dataModel is null

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java:497
In ContainerRepository.addSampleData, column name validation is skipped when container.getDataModel() is null, but the sample data is still persisted. This means a container without a data model can accept sample data with completely arbitrary column names. TableRepository.addSampleData always enforces column validation. Consider whether containers without a data model should reject sample data entirely, or if this permissiveness is intentional.

✅ Edge Case: base_processor accesses entity.dataModel.columns without null check

📄 ingestion/src/metadata/pii/base_processor.py:93-99
In _get_entity_columns (line 98), for Container entities the code accesses entity.dataModel.columns with a ternary guard on entity.dataModel, but this is only used in _run at line 112-114 where a None result causes a silent early return. However, if dataModel exists but columns is None, the method returns None and classification is silently skipped—this seems intentional but worth noting.

More importantly, in the existing _run method at line 120, the code does next((c for c in columns if c.name == column_name), None) which correctly handles missing columns with a continue. This is good defensive coding.

✅ Edge Case: _filter_entities accesses attributes without getattr guard

📄 ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:430-439
_filter_entities at lines 430-431 accesses self.source_config.bucketFilterPattern and self.source_config.containerFilterPattern directly, while _filter_buckets (line 371) and _filter_containers (line 397-398) use getattr(..., None) for the same attributes. This inconsistency means _filter_entities would raise an AttributeError if the source config doesn't have these attributes, whereas the individual filter methods handle it gracefully.

✅ Bug: hasPiiSensitiveTag(Container) NPE when tags is null

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/mask/PIIMasker.java:305 📄 openmetadata-service/src/main/java/org/openmetadata/service/security/mask/PIIMasker.java:312-316
The new hasPiiSensitiveTag(Container) method calls container.getTags().stream() without a null check. The tags field is optional in the Container JSON schema and can be null at runtime (e.g., when not fetched/populated). This will throw a NullPointerException when PII masking is triggered on a container whose tags haven't been loaded.

Note: the existing hasPiiSensitiveTag(Table) has the same issue, but it's out of scope for this PR.

✅ Bug: addSampleData missing @transaction annotation

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java:494 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java:508-515 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java:508
ContainerRepository.addSampleData performs a DAO insert (lines 512-518) but lacks the @Transaction annotation. Both TableRepository.addSampleData and FileRepository.addSampleData consistently use @Transaction. Without it, the insert is not protected by a transaction boundary, risking partial/inconsistent state on failure.

...and 12 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Adds auto-classification support for storage service containers with comprehensive fixes addressing 14 validation, null-safety, and error-handling issues. Consider forwarding queryText parameter to the POST aggregation path to maintain consistency across both code paths.

1. 💡 Quality: queryText not forwarded in independent (POST) aggregation path

   In `getAggregationOptions`, the new `queryText` parameter is only passed to the `getAggregateFieldOptions` (GET) path. When `isIndependent` is true, `postAggregateFieldOptions` is called without `queryText`, so the search text won't influence filter options in the independent mode. This may be intentional if the POST endpoint doesn't support it, but it's worth confirming the behavior is expected.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-04-23T22:17:00Z

Quality Gate passed for 'open-metadata-ui'

Issues
1061 New issues
0 Accepted issues

Measures
17 Security Hotspots
0.0% Coverage on New Code
3.9% Duplication on New Code

See analysis details on SonarQube Cloud

sonarqubecloud · 2026-04-23T22:50:48Z

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 14, 2026

gitar-bot Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/security/mask/PIIMasker.java

gitar-bot Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java

gitar-bot Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java Outdated

Base automatically changed from feat/refactor-to-make-autoclassification-tableless to main March 18, 2026 01:20

edg956 force-pushed the feat/support-container-classification branch from d9e3d87 to 6195e0f Compare March 25, 2026 20:01

gitar-bot Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread ingestion/src/metadata/pii/base_processor.py

gitar-bot Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py Outdated

edg956 force-pushed the feat/support-container-classification branch from 3853ec8 to 25a28e0 Compare March 27, 2026 12:32

gitar-bot Bot reviewed Mar 27, 2026

View reviewed changes

Comment thread ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py

edg956 changed the title ~~wip~~ feat: Add auto-classification support for storage service containers Mar 27, 2026

edg956 self-assigned this Mar 27, 2026

edg956 marked this pull request as ready for review March 27, 2026 13:45

edg956 requested a review from a team as a code owner March 27, 2026 13:45

edg956 force-pushed the feat/support-container-classification branch from 8deee2f to 2837179 Compare March 27, 2026 13:45

github-actions Bot requested a review from a team as a code owner March 27, 2026 13:49

edg956 and others added 9 commits March 27, 2026 20:46

manerow reviewed Apr 14, 2026

View reviewed changes

IceS2 previously approved these changes Apr 14, 2026

View reviewed changes

edg956 added 3 commits April 15, 2026 15:44

Apply comments from reviews

6185e17

Merge branch 'main' into feat/support-container-classification

bf3c5bc

Extract cantidate column logic in samplers

6e3a7d0

IceS2 previously approved these changes Apr 15, 2026

View reviewed changes

manerow previously approved these changes Apr 15, 2026

View reviewed changes

edg956 added 6 commits April 16, 2026 15:34

Fix tests

f00c225

Merge branch 'main' into feat/support-container-classification

5e79e7d

Merge branch 'main' into feat/support-container-classification

455f675

Merge branch 'main' into feat/support-container-classification

637eb67

Merge branch 'main' into feat/support-container-classification

ef993c6

Merge branch 'main' into feat/support-container-classification

27cf45b

edg956 mentioned this pull request Apr 17, 2026

fix(ci): fix broken playwright and flaky operator build #27490

Merged

Merge branch 'main' into feat/support-container-classification

21c7fe6

IceS2 previously approved these changes Apr 20, 2026

View reviewed changes

edg956 added 11 commits April 21, 2026 11:30

Merge branch 'main' into feat/support-container-classification

ab2dfd9

Merge branch 'main' into feat/support-container-classification

8a1803f

Merge branch 'main' into feat/support-container-classification

37f5844

Merge branch 'main' into feat/support-container-classification

2662940

Merge branch 'main' into feat/support-container-classification

6b6e780

Merge branch 'main' into feat/support-container-classification

cbd8c65

Merge branch 'main' into feat/support-container-classification

1051a59

Merge branch 'main' into feat/support-container-classification

903cfd5

Merge branch 'main' into feat/support-container-classification

e72cbac

Merge branch 'main' into feat/support-container-classification

60926a7

Fix container customization test

7ba84fd

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

edg956 commented Mar 14, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Schema & API

Backend (Java)

Ingestion Framework (Python)

Testing

Bug Fixes

Type of Change

Test Plan

Unit Tests

Integration Tests

Manual Testing

Checklist

Summary by Gitar

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 27, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (15)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

edg956 commented Mar 14, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `usr/bin/docker`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions Bot commented Mar 27, 2026 •

edited

Loading

gitar-bot Bot commented Apr 23, 2026 •

edited

Loading