Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30100

Decimal Precision Inferred from JDBC via Spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0
    • None
    • PySpark, SQL

    Description

      When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supportsĀ DECIMAL(38,18). The field from the Oracle is DECIMAL(38,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(38,10). This is happening to few fields in the dataframe where the column is fetched using a CASE statement whereas in the same query another field populates the right schema.

      Tried to pass the

      spark.sql.decimalOperations.allowPrecisionLoss=false

      conf in the Spark-submit though didn't get the desired results.

      jdbcDF = spark.read \ 
      .format("jdbc") \ 
      .option("url", "ORACLE") \ 
      .option("dbtable", "QUERY") \ 
      .option("user", "USERNAME") \ 
      .option("password", "PASSWORD") \ 
      .load()

      So considering that the Spark infers the schema from a sample records, how does this work here? Does it use the results of the query i.e (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives.

      Attachments

        Activity

          People

            Unassigned Unassigned
            JojeJoby Joby Joje
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: