PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline by bewithgaurav · Pull Request #549 · microsoft/mssql-python · GitHub
Skip to content

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549

Draft
bewithgaurav wants to merge 9 commits into
mainfrom
bewithgaurav/insertmany-perf-detect-types
Draft

PERF: Add C++ DetectParamTypes + SQLExecuteFast pipeline#549
bewithgaurav wants to merge 9 commits into
mainfrom
bewithgaurav/insertmany-perf-detect-types

Conversation

@bewithgaurav

@bewithgaurav bewithgaurav commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Work Item / Issue Reference

AB#44979

GitHub Issue: #500


Summary

This pull request refactors the parameter handling logic in the execute method of mssql_python/cursor.py to introduce a more efficient "fast path" for parameter binding and execution. The new approach allows the code to skip Python-side type detection and binding when there are no inputsizes overrides, delegating the entire process to the C++ layer for better performance. The slow path is retained for cases where inputsizes are set.

Performance improvements and code simplification:

  • Added a "fast path" execution branch that uses DDBCSQLExecuteFast to handle parameter type detection, binding, and execution entirely in C++ when there are no inputsizes overrides, improving performance for common cases. (F783d0c6L1471)
  • Removed the unconditional creation of the Python-side parameter type list and its associated logic from the main execution path, now only used in the "slow path" when inputsizes are present. [1] [2]
  • Retained the existing Python-side type detection and parameter binding logic as a fallback for cases where inputsizes overrides are specified, ensuring compatibility with advanced usage. (F783d0c6L1471)

Move parameter type detection from Python into C++ using raw CPython
type checks (PyLong_CheckExact, PyFloat_CheckExact, etc.). Merge the
DetectParamTypes → BindParameters → SQLExecute pipeline into a single
DDBCSQLExecuteFast call so ParamInfo never crosses the pybind11 boundary.

- DetectParamTypes: handles int (range-detected), float, bool, str
  (unicode + geometry sniffing), bytes, datetime/date/time, Decimal
  (MONEY range + generic numeric), UUID, None, with fallback to string
- SQLExecuteFast_wrap: single pipeline with GIL release, always uses
  SQLPrepare for parameterized queries
- cursor.py: fast path routing when no setinputsizes overrides present;
  old DDBCSQLExecute path preserved for setinputsizes callers
- Named constants: MAX_INLINE_CHAR, MAX_INLINE_BINARY, MAX_NUMERIC_PRECISION,
  MONEY/SMALLMONEY ranges, PARAM_C_TYPE_TEXT platform macro
Comment thread mssql_python/pybind/ddbc_bindings.cpp Fixed
- Add complete DAE (Data-At-Execution) loop to SQLExecuteFast_wrap:
  SQL_NEED_DATA → SQLParamData/SQLPutData for large str/bytes/binary,
  matching the existing SQLExecute_wrap logic exactly
- Fix DAE type assignment: non-unicode DAE strings use SQL_C_CHAR
  (not PARAM_C_TYPE_TEXT which maps to SQL_C_WCHAR on macOS/Linux)
- Fix MONEY range lower bound: use MONEY_MIN not SMALLMONEY_MIN so
  negative decimals in MONEY range bind as VARCHAR (matches Python path)
- Raise TypeError for unknown param types instead of silent str conversion
- Add SQLFreeStmt(SQL_RESET_PARAMS) to unbind after execute
@github-actions

github-actions Bot commented Apr 29, 2026

Copy link
Copy Markdown

- Comment out use_prepare parameter name (C4100: unreferenced parameter)
- Remove unused catch variable name (C4101: unreferenced local variable)
Add explicit null pointer and zero-length guards before memcpy in
build_numeric_data to satisfy DevSkim code scanning rule DS121708.
Comment thread mssql_python/pybind/ddbc_bindings.cpp Fixed
@github-actions github-actions Bot added the pr-size: large Substantial code update label May 7, 2026
…or attrs, parity test

Six review fixes for SQLExecuteFast_wrap and DetectParamTypes:

1. Encoding key: read 'encoding' from settings dict (was 'charEncoding'
   which never matched). Only honor when ctype==SQL_C_CHAR so the default
   utf-16le doesn't corrupt SQL_C_CHAR DAE/inline byte paths.
2. Subclass support: PyLong_Check/PyFloat_Check/PyUnicode_Check/PyBytes_Check
   instead of *_CheckExact. Fixes user-defined int/str/bytes/float
   subclasses that were silently rejected with TypeError. Switched
   PyBytes_GET_SIZE to PyBytes_Size for subclass-safe length.
3. GIL release in DAE loop: SQLParamData and SQLPutData now release the
   GIL during each ODBC call, matching slow-path concurrency for large
   blobs/strings.
4. Preserve exec_rc: stash the SQLExecute return code before SQLFreeStmt
   so SUCCESS_WITH_INFO and other non-success-non-error codes are not
   clobbered by the unbind call.
5. Shallow-copy params: params = py::list(params) at function entry so
   DetectParamTypes' in-place PyList_SET_ITEM cannot mutate the caller's
   list under any future code path that might pass it directly.
6. Cursor attrs: SQLSetStmtAttr(SQL_ATTR_CURSOR_TYPE/CONCURRENCY) at
   entry to match slow-path semantics regardless of prior hstmt state.

Also adds tests/test_023_fast_path_parity.py covering int/str/bytes/float
subclasses, caller-list non-mutation, and unsupported-type TypeError.
Comment thread tests/test_023_fast_path_parity.py Fixed
bewithgaurav and others added 2 commits May 7, 2026 14:46
Eight follow-up fixes after review feedback on c5a827f.

1. Refcount leak (BLOCKER): replace PyList_SET_ITEM (uppercase, no decref of
   old slot) with PyList_SetItem (decrefs old slot before stealing the new
   reference) in DetectParamTypes time/Decimal/UUID branches. The previous
   shallow-copy defense via py::list(params) was a no-op because pybind11s
   list constructor only inc_refs an already-list argument.
2. Geometry + DAE conflict: gate the geometry-prefix override on the not-DAE
   branch so a long POLYGON/POINT/LINESTRING string does not end up with
   isDAE=true, dataPtr set, AND a non-zero columnSize.
3. Decimal NaN/Infinity: throw ValueError instead of silently binding 0 via
   build_numeric_data on an empty digits tuple.
4. Time format: always emit microseconds (HH:MM:SS.ffffff), matching slow
   path isoformat(timespec=microseconds).
5. PyObject_IsInstance: explicit equality check so a custom __instancecheck__
   that raises (returns -1) does not fall through with a Python error set.
6. Dead code: removed unused SMALLMONEY_MIN/SMALLMONEY_MAX constants and the
   unused utf16Len assignments in DetectParamTypes.
7. Encoding-key contract: only honor encoding_settings encoding when the
   user explicitly opted in via setencoding(..., ctype=SQL_C_CHAR=1). The
   Python layer SQL_C_CHAR constant is numerically -8 (real ODBC SQL_C_WCHAR),
   so by default the wide-char path is taken and encoding is irrelevant.
8. Parity test rewrite: drop the dead _force_slow_path_roundtrip helper, use
   the project cursor fixture instead of a hard-coded conn string, and add
   (a) a real fast-vs-slow parity check via setinputsizes-forced slow path,
   (b) a refcount-leak regression test using a Decimal subclass + weakref,
   (c) explicit NaN-rejection coverage.
Resolve conflicts in ddbc_bindings.cpp from main's GH-610 work:
- Keep both build_numeric_data (this PR) and ResolveNullParamType (main)
- Adopt main's BindParameters/BindParameterArray signatures that take
  SqlHandle& handle; update the SQLExecuteFast_wrap call site to pass
  *statementHandle so the fast path uses the per-handle NULL describe cache
- Migrate SQLExecuteFast_wrap from std::wstring + WStringToSQLWCHAR to
  std::u16string + reinterpretU16stringAsSqlWChar (main's uniform 16-bit
  query/param representation), dropping the platform #ifdef in both the
  prepare path and the DAE wide-char put-data loop

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: large Substantial code update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants