feat: add native deletion support for identity removal in gzip files by mihir20 · Pull Request #6896 · rudderlabs/rudder-server · GitHub
Skip to content

feat: add native deletion support for identity removal in gzip files#6896

Open
mihir20 wants to merge 3 commits intomasterfrom
mihir/pipe-2904
Open

feat: add native deletion support for identity removal in gzip files#6896
mihir20 wants to merge 3 commits intomasterfrom
mihir/pipe-2904

Conversation

@mihir20
Copy link
Copy Markdown
Contributor

@mihir20 mihir20 commented Apr 21, 2026

Description

GZIPLocalFileHandler.RemoveIdentity uses exec("bash -c sed ...") to strip suppressed user records from an NDJSON gzip file. Two problems with that:

  1. Shell injection surface: every suppressed user ID is interpolated into a bash command. We sanitise with regexp.QuoteMeta, but those quotes regex metacharacters — it does not escape shell
    metacharacters. Inputs like $(...) or backticks remain risky.
  2. Does not scale with suppression list size: sed runs every pattern against every line, so cost grows O(records × users). For 10k records × 100 users, it takes ~1.27s (1 MB/s).

As part of this change, we are using a feature flag and adding native Go support for removing identity. we will be iterating line by line and deleting the lines that have the suppressed userIDs.

Benchmarks

Linear Ticket

pipe-2904

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

🔒 Scanned for secrets using gitleaks 8.28.0
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

@ktgowtham ktgowtham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. as long as s3 and s3 data lake has the data in the format that this PR expects. assuming we tested that too.

@mihir20 mihir20 requested review from 0xShad3 and aris1009 April 23, 2026 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants