fix(api): set finite backlog timeout for crawl jobs by firecrawl-spring[bot] · Pull Request #3412 · firecrawl/firecrawl · GitHub
Skip to content

fix(api): set finite backlog timeout for crawl jobs#3412

Open
firecrawl-spring[bot] wants to merge 1 commit intomainfrom
micah/backlog-times-out-at-for-crawls
Open

fix(api): set finite backlog timeout for crawl jobs#3412
firecrawl-spring[bot] wants to merge 1 commit intomainfrom
micah/backlog-times-out-at-for-crawls

Conversation

@firecrawl-spring
Copy link
Copy Markdown
Contributor

@firecrawl-spring firecrawl-spring Bot commented Apr 21, 2026

Summary

  • Crawl/batch-scrape jobs enqueued into NuQ's concurrency backlog are currently stored with times_out_at = NULL. The nuq_queue_scrape_backlog_reaper cron only deletes rows where times_out_at < now(), so NULL rows live forever.
  • nuq_group_crawl_finished has a NOT EXISTS check against queue_scrape_backlog, so any stale NULL row on a group permanently blocks the crawl from flipping to completed → crawl appears stuck indefinitely in the UI.
  • This was the root cause of stuck crawl 019dae72-0d94-7723-ab1e-2c078084583b: 0 active/queued scrape rows, 4 backlog rows with times_out_at IS NULL.

Fix

Give crawl-linked backlog rows a finite 24h fallback timeout (matches crawl Redis state TTL and default group_crawl.ttl). Standalone-scrape behavior is unchanged (still uses the per-scrape timeout).

If a crawl is genuinely still producing work after 24h, promoteJobFromBacklogOrAdd already falls back to addJobIfNotExists when the backlog row has been reaped (nuq.ts:1069-1073), so we'll re-enqueue fresh rows in that edge case.

Paired with

#3413 — defensive fallback in nuq_queue_scrape_backlog_reaper to sweep NULL/stale rows that predate this fix.

Test plan

  • CI green
  • Manually verify: new crawl → inspect nuq.queue_scrape_backlog rows for that group → times_out_at is non-NULL
  • Verify standalone scrape backlog rows still use scrapeOptions.timeout

Crawl/batch-scrape jobs enqueued into the NuQ concurrency backlog were
stored with times_out_at=NULL, which nuq_queue_scrape_backlog_reaper does
not clean (its filter is times_out_at < now()). If the concurrency-queue
reconciler never promotes a row (crawl abandoned, worker restart, etc.),
the orphaned backlog row persists forever — and nuq_group_crawl_finished
blocks on its NOT EXISTS check against the backlog, leaving the group
stuck in 'active' indefinitely.

Give crawl-linked backlog rows a 24h fallback timeout (matches crawl
Redis state TTL and default group_crawl.ttl). If a crawl is still
actively producing work past 24h, promoteJobFromBacklogOrAdd already
falls back to addJobIfNotExists when the backlog row is missing, so
re-enqueue is handled.

Co-Authored-By: micahstairs <micah@sideguide.dev>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@blacksmith-sh
Copy link
Copy Markdown
Contributor

blacksmith-sh Bot commented Apr 21, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant