Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29316

CLONE - schemaInference option not to convert strings with leading zeros to int/long

    XMLWordPrintableJSON

Details

    Description

      It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

      df = (sqlc.read.format('csv')
                    .option('inferSchema', True)
                    .option('header', True)
                    .option('delimiter', '|')
                    .option('leadingZeros', 'KEEP')       # this is the new proposed option
                    .option('mode', 'FAILFAST')
                    .load('csvfile_withzipcodes_to_ingest.csv')
                  )
      
      The general usage of data with trailing 0 is for Identifiers. If they are converted to int/long defeats the purpose of inferSchema. The conversion should be provided on the basis of a flag whether the data should be converted to int/long or not. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ambar.raghuvanshi Ambar Raghuvanshi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: