OracleDialect: Hardcoded scale of 10 in Decimal(38,10) causes silent precision loss for bare NUMBER columns · Issue #56738 · apache/spark · GitHub
Skip to content

OracleDialect: Hardcoded scale of 10 in Decimal(38,10) causes silent precision loss for bare NUMBER columns #56738

Description

@sivakumardg

Problem Statement

When reading Oracle NUMBER columns (without explicit precision/scale) via JDBC, Spark's OracleDialect maps them to DecimalType(38, 10) with a hardcoded scale of 10. This causes silent data precision loss for values that require more than 10 decimal places, with no warning or error to the user.

Current Behavior

Oracle bare NUMBER (no precision/scale defined) returns via JDBC metadata:

  • getPrecision() = 0
  • getScale() = -127

Spark's OracleDialect.scala handles this with a hardcoded scale of 10:

case 0 => Option(DecimalType(DecimalType.MAX_PRECISION, 10))

So any bare NUMBER column becomes Decimal(38, 10) in Spark — regardless of the actual data stored in the column.

Steps to Reproduce

Oracle setup:

CREATE TABLE test_number (
  col1 NUMBER,  -- stores small decimal values
  col2 NUMBER   -- stores large integer values
);

INSERT INTO test_number VALUES (0.0000123456789123456, 12345678901234567890);

PySpark read:

df = spark.read.format("jdbc") \
    .option("url", oracle_url) \
    .option("dbtable", "test_number") \
    .option("driver", "oracle.jdbc.driver.OracleDriver") \
    .load()

df.printSchema()
# root
#  |-- col1: decimal(38,10)
#  |-- col2: decimal(38,10)

df.show(truncate=False)
# col1: 0.0000123457           <- SILENT DATA LOSS!
# original value: 0.0000123456789123456
# col2: 12345678901234567890   <- OK

Expected Behavior

Spark should not silently truncate data. When the scale of a bare NUMBER column cannot be determined from JDBC metadata, the user should at minimum be warned that precision loss may occur, or provided a way to configure the fallback behavior.

Why This is Critical

  • The scale of bare NUMBER varies per column in Oracle—hardcoding 10 cannot be correct for all cases
  • The precision loss is completely silent—no warning, no error, no indication in the schema that data has been truncated
  • NUMBER(38, 25) works correctly -> Decimal(38, 25) in Spark, proving the issue is only with bare NUMBER
  • There is currently no global workaround across multiple tables without knowing all column names upfront

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions