[Fix](decimal) Fix unaligned memory access in read_column_from_arrow#62759
[Fix](decimal) Fix unaligned memory access in read_column_from_arrow#62759linrrzqqq wants to merge 1 commit intoapache:masterfrom
Conversation
|
run buildall |
|
/review |
There was a problem hiding this comment.
Found 1 blocking issue.
- [High]
be/src/core/data_type_serde/data_type_decimal_serde.cpp:344still performs an unaligned load in theTYPE_DECIMALV2branch. This PR correctly replaces theTYPE_DECIMAL256raw-pointer dereference withmemcpy, but*reinterpret_cast<const Decimal128V2*>(concrete_array->Value(value_i))remains in the same Arrow import path.arrow::DecimalArray::Value()also returns raw bytes, so a record batch backed by an unaligned buffer can still trigger UBSan/undefined behavior when importing DECIMALV2 columns throughread_column_from_arrow(for example viaFromRecordBatchToBlockConverter).
Critical checkpoint conclusions:
- Goal of current task: Partially achieved. The Decimal256 path is fixed and a Decimal256 unit test was added, but the broader unaligned Arrow decimal-read problem is not fully eliminated because the DecimalV2 branch still uses the unsafe cast.
- Small/clear/focused: Yes. The patch is minimal and localized.
- Concurrency: Not involved. No thread-safety concerns in this change.
- Lifecycle/static initialization: No new lifecycle or static-init issues introduced.
- Configuration: None.
- Compatibility/storage format: None. No FE/BE protocol or persistence changes.
- Parallel code paths: Not fully covered. The sibling
TYPE_DECIMALV2branch in the same function still uses the old reinterpret-cast pattern. - Special conditional checks: None added.
- Test coverage: Improved but incomplete. The new unit test covers the Decimal256 case only; a DecimalV2 misalignment case is still missing.
- Test result changes: New test logic for Decimal256 looks reasonable, but it does not prove the remaining DecimalV2 path is safe.
- Observability: No additional observability needed for this local fix.
- Transaction/persistence/data-write concerns: Not applicable.
- Performance: The
memcpyapproach is appropriate here and matches the existing Decimal32/64/128I handling; no material performance concern. - Other issues: No additional blockers found beyond the remaining DecimalV2 unaligned read.
User focus points: there were no additional user-provided focus points in /tmp/review_focus.txt, so no extra focus-specific findings to report.
| column_data.emplace_back( | ||
| *reinterpret_cast<const FieldType*>(concrete_array->Value(value_i))); | ||
| FieldType decimal_value {}; | ||
| memcpy(&decimal_value, concrete_array->Value(value_i), sizeof(FieldType)); |
There was a problem hiding this comment.
This fixes the TYPE_DECIMAL256 unaligned read, but the sibling TYPE_DECIMALV2 branch a few lines above still does *reinterpret_cast<const Decimal128V2*>(concrete_array->Value(value_i)). arrow::DecimalArray::Value() has the same raw-byte contract, so an unaligned Arrow buffer can still trigger UBSan/undefined behavior for DECIMALV2 imports through the same read_column_from_arrow path. Please switch that branch to the same memcpy pattern and add a DECIMALV2 misalignment test; otherwise this PR only fixes one instance of the bug.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report |

What problem does this PR solve?
Issue Number: close #xxx
Related PR: #58002
Problem Summary:
arrow::Decimal256Array::Value()returns raw bytes (const uint8_t*), we previously reinterpret_cast it toDecimal256*which is a misaligned access .Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)