{{ message }}
fix(map): apply cosine similarity before limit cutoff when search is active#3337
Open
octo-patch wants to merge 1 commit intofirecrawl:mainfrom
Open
fix(map): apply cosine similarity before limit cutoff when search is active#3337octo-patch wants to merge 1 commit intofirecrawl:mainfrom
octo-patch wants to merge 1 commit intofirecrawl:mainfrom
Conversation
…active (fixes firecrawl#3335) When a map request includes both a `search` query and a `limit`, the limit was previously applied before cosine similarity ranking. This caused only the first N results (by link order) to be ranked, excluding potentially more relevant pages deeper in the list. Two changes: - Query the index with a higher candidate count when search is active so more URLs are available for ranking before the limit is applied. - Move the cosine similarity ranking step to run before the minimumCutoff slice, ensuring the final N results are the most relevant ones.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fixes #3335
Problem
When a
/maprequest includes both asearchquery and alimit, thelimitwas applied before cosine similarity ranking. This meant that only the first N results (ordered by initial link discovery, not relevance) were ranked, so pages that happened to appear later in the initial crawl order were excluded from consideration entirely.For example, with
search: "post"andlimit: 5, the first 5 pages discovered (e.g./contact,/about) were kept and ranked, while actual post pages deeper in the list were never scored.Solution
Two targeted changes to
apps/api/src/lib/map-utils.ts:Fetch more index candidates when search is active — instead of querying the index with
limit(which could be very small), query withMath.min(MAX_MAP_LIMIT, Math.max(limit, maxFireEngineResults))so enough URLs exist for meaningful similarity ranking.Reorder: cosine similarity before the cutoff — move
performCosineSimilarityV2to run beforemapResults.slice(0, minimumCutoff), so the final N results are the most relevant ones rather than the first N by link order.Testing
The fix can be verified with the reproduction case from the issue:
Before: returns generic pages (
/contact,/about, etc.)After: returns post-related pages (
/post/introducing-bizml-..., etc.)Summary by cubic
Apply cosine similarity before the limit when
searchis provided, and fetch more index candidates so relevant pages aren’t dropped. Fixes #3335 and makes/mapwithsearch+limitreturn the top-N most relevant pages.indexLimitwhensearchis active:Math.min(MAX_MAP_LIMIT, Math.max(limit, maxFireEngineResults)).performCosineSimilarityV2before applying theminimumCutoffslice so the final N results are ranked by relevance, not discovery order.Written for commit 00c2c23. Summary will update on new commits.