Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7595

Change of data type from bigint to int when parquet with multiple fragment

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Storage - Parquet
    • None

    Description

      like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment

      With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet)

      apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`;
      +----------+---------------------------+
      | Fragment | Number of records written |
      +----------+---------------------------+
      | 1_0      | 1500                      |
      +----------+---------------------------+
      apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
      +--------+
      | EXPR$0 |
      +--------+
      | BIGINT |
      +--------+
      

      With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet)

      apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
      +----------+---------------------------+
      | Fragment | Number of records written |
      +----------+---------------------------+
      | 1_1      | 934111                    |
      | 1_0      | 1488743                   |
      +----------+---------------------------+
      apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
      +--------+
      | EXPR$0 |
      +--------+
      | INT    |
      +--------+
      

       
      It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT)

      apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`manyrowfile`;
      +----------+---------------------------+
      | Fragment | Number of records written |
      +----------+---------------------------+
      | 1_1      | 934111                    |
      | 1_0      | 1488743                   |
      +----------+---------------------------+
      apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`;
      +--------+
      | EXPR$0 |
      +--------+
      | BIGINT |
      +--------+
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: