Problem Statement
When reading Oracle NUMBER columns (without explicit precision/scale) via JDBC, Spark's OracleDialect maps them to DecimalType(38, 10) with a hardcoded scale of 10. This causes silent data precision loss for values that require more than 10 decimal places, with no warning or error to the user.
Current Behavior
Oracle bare NUMBER (no precision/scale defined) returns via JDBC metadata:
- getPrecision() = 0
- getScale() = -127
Spark's OracleDialect.scala handles this with a hardcoded scale of 10:
case 0 => Option(DecimalType(DecimalType.MAX_PRECISION, 10))
So any bare NUMBER column becomes Decimal(38, 10) in Spark — regardless of the actual data stored in the column.
Steps to Reproduce
Oracle setup:
CREATE TABLE test_number (
col1 NUMBER, -- stores small decimal values
col2 NUMBER -- stores large integer values
);
INSERT INTO test_number VALUES (0.0000123456789123456, 12345678901234567890);
PySpark read:
df = spark.read.format("jdbc") \
.option("url", oracle_url) \
.option("dbtable", "test_number") \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.load()
df.printSchema()
# root
# |-- col1: decimal(38,10)
# |-- col2: decimal(38,10)
df.show(truncate=False)
# col1: 0.0000123457 <- SILENT DATA LOSS!
# original value: 0.0000123456789123456
# col2: 12345678901234567890 <- OK
Expected Behavior
Spark should not silently truncate data. When the scale of a bare NUMBER column cannot be determined from JDBC metadata, the user should at minimum be warned that precision loss may occur, or provided a way to configure the fallback behavior.
Why This is Critical
- The scale of bare NUMBER varies per column in Oracle—hardcoding 10 cannot be correct for all cases
- The precision loss is completely silent—no warning, no error, no indication in the schema that data has been truncated
- NUMBER(38, 25) works correctly -> Decimal(38, 25) in Spark, proving the issue is only with bare NUMBER
- There is currently no global workaround across multiple tables without knowing all column names upfront
Problem Statement
When reading Oracle NUMBER columns (without explicit precision/scale) via JDBC, Spark's OracleDialect maps them to DecimalType(38, 10) with a hardcoded scale of 10. This causes silent data precision loss for values that require more than 10 decimal places, with no warning or error to the user.
Current Behavior
Oracle bare NUMBER (no precision/scale defined) returns via JDBC metadata:
Spark's OracleDialect.scala handles this with a hardcoded scale of 10:
So any bare NUMBER column becomes Decimal(38, 10) in Spark — regardless of the actual data stored in the column.
Steps to Reproduce
Oracle setup:
PySpark read:
Expected Behavior
Spark should not silently truncate data. When the scale of a bare NUMBER column cannot be determined from JDBC metadata, the user should at minimum be warned that precision loss may occur, or provided a way to configure the fallback behavior.
Why This is Critical