Draft PR: Configurable `timeout_ms` and `num_retries` in `model_config` by Rahik-Sikder · Pull Request #2908 · typesense/typesense · GitHub
Skip to content

Draft PR: Configurable timeout_ms and num_retries in model_config#2908

Draft
Rahik-Sikder wants to merge 1 commit into
typesense:v31from
Rahik-Sikder:configable-errors-external-embeddings
Draft

Draft PR: Configurable timeout_ms and num_retries in model_config#2908
Rahik-Sikder wants to merge 1 commit into
typesense:v31from
Rahik-Sikder:configable-errors-external-embeddings

Conversation

@Rahik-Sikder

Copy link
Copy Markdown

This is a partial implementation of #2839, covering per-field timeout and retry configuration. The fallback provider mechanism is left pending design clarification (see questions below).


Problem

When using an external embedding provider, Typesense's error handling behavior is entirely hardcoded: a 5-second CURL timeout and 2 retries, with no fallback. If a provider is unreliable, this results in slow requests (2 × 5s = 10s stall) with no way to fine tune or escape this behavior. The original issue also requests a fallback provider mechanism where if all retries against the primary fail, Typesense should attempt the request against a secondary provider. The original issue also references Plexus-style features such as provider cooldown and routing to the lowest-latency provider.

Proposed Solution

Expose timeout_ms and num_retries as optional fields in embed.model_config, allowing per-field overrides of the collection-level defaults. Fallback provider support is deferred pending design input (see open questions). As for the Plexus-like features, they currently seem out of scope but can be revisited once the fallback design is settled, possibly in a different PR.


Changes

include/field.h — Added fields::timeout_ms and fields::num_retries constants.

src/field.cpp — Validation in json_field_to_field():

  • timeout_ms: optional, must be a positive integer (0 is rejected - a zero-ms timeout would cause all calls to fail)
  • num_retries: optional, must be a non-negative integer (0 is valid - means one attempt, no retries)

src/index.cpp — In batch_embed_fields(), field-level values are read from model_config and fall back to collection-level defaults when absent. Note: num_retries is user-facing (additional attempts), while embed_documents takes num_tries (total attempts), so +1 is applied on read.

test/collection_schema_change_test.cpp — Four tests covering: valid settings, timeout_ms: 0 rejection, num_retries: 0 acceptance, and invalid num_retries type rejection.


Open Questions (Fallback Design)

Before implementing fallback providers, input on the following would be helpful:

  1. Config structure — Should model_config be able to accept an array of provider objects so fallbacks are defined inline? Or would a dedicated fallback_config field be preferable?

  2. Stickiness / cooldown — If the primary provider fails and a fallback is used, should the fallback be sticky for a cooldown period before retrying the primary (requiring failure state to be tracked)? Or should the primary always be attempted first on each new document/request?

Happy to extend this once the design direction is confirmed.

PR Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant