fix(map): apply cosine similarity before limit cutoff when search is active by octo-patch · Pull Request #3337 · firecrawl/firecrawl · GitHub
Skip to content

fix(map): apply cosine similarity before limit cutoff when search is active#3337

Open
octo-patch wants to merge 1 commit intofirecrawl:mainfrom
octo-patch:fix/issue-3335-map-search-limit-order
Open

fix(map): apply cosine similarity before limit cutoff when search is active#3337
octo-patch wants to merge 1 commit intofirecrawl:mainfrom
octo-patch:fix/issue-3335-map-search-limit-order

Conversation

@octo-patch
Copy link
Copy Markdown

@octo-patch octo-patch commented Apr 11, 2026

Fixes #3335

Problem

When a /map request includes both a search query and a limit, the limit was applied before cosine similarity ranking. This meant that only the first N results (ordered by initial link discovery, not relevance) were ranked, so pages that happened to appear later in the initial crawl order were excluded from consideration entirely.

For example, with search: "post" and limit: 5, the first 5 pages discovered (e.g. /contact, /about) were kept and ranked, while actual post pages deeper in the list were never scored.

Solution

Two targeted changes to apps/api/src/lib/map-utils.ts:

  1. Fetch more index candidates when search is active — instead of querying the index with limit (which could be very small), query with Math.min(MAX_MAP_LIMIT, Math.max(limit, maxFireEngineResults)) so enough URLs exist for meaningful similarity ranking.

  2. Reorder: cosine similarity before the cutoff — move performCosineSimilarityV2 to run before mapResults.slice(0, minimumCutoff), so the final N results are the most relevant ones rather than the first N by link order.

Testing

The fix can be verified with the reproduction case from the issue:

curl -X POST http://localhost:3002/v2/map \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://semanticbrain.net", "search": "post", "limit": 5, "ignoreCache": true}'

Before: returns generic pages (/contact, /about, etc.)
After: returns post-related pages (/post/introducing-bizml-..., etc.)


Summary by cubic

Apply cosine similarity before the limit when search is provided, and fetch more index candidates so relevant pages aren’t dropped. Fixes #3335 and makes /map with search + limit return the top-N most relevant pages.

  • Bug Fixes
    • Query the index with an expanded indexLimit when search is active: Math.min(MAX_MAP_LIMIT, Math.max(limit, maxFireEngineResults)).
    • Run performCosineSimilarityV2 before applying the minimumCutoff slice so the final N results are ranked by relevance, not discovery order.

Written for commit 00c2c23. Summary will update on new commits.

…active (fixes firecrawl#3335)

When a map request includes both a `search` query and a `limit`, the limit
was previously applied before cosine similarity ranking. This caused only
the first N results (by link order) to be ranked, excluding potentially
more relevant pages deeper in the list.

Two changes:
- Query the index with a higher candidate count when search is active so
  more URLs are available for ranking before the limit is applied.
- Move the cosine similarity ranking step to run before the minimumCutoff
  slice, ensuring the final N results are the most relevant ones.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Limit/search order of operations

1 participant