[SPARK-30100] Decimal Precision Inferred from JDBC via Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
- bulk-closed

Description

When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supports DECIMAL(38,18). The field from the Oracle is DECIMAL(38,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(38,10). This is happening to few fields in the dataframe where the column is fetched using a CASE statement whereas in the same query another field populates the right schema.

Tried to pass the

spark.sql.decimalOperations.allowPrecisionLoss=false

conf in the Spark-submit though didn't get the desired results.

jdbcDF = spark.read \ 
.format("jdbc") \ 
.option("url", "ORACLE") \ 
.option("dbtable", "QUERY") \ 
.option("user", "USERNAME") \ 
.option("password", "PASSWORD") \ 
.load()

So considering that the Spark infers the schema from a sample records, how does this work here? Does it use the results of the query i.e (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Joby Joje

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Dec/19 09:11

Updated:: 25/May/21 01:50

Resolved:: 25/May/21 01:44