Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
Description
like on DRILL-7104, there is a bug that change the type from BIGINT to INT where a parquet have multiple fragment
With a file containing few row (all is fine (we store a BIGINT and really have a BIGINT in the Parquet)
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`fewrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_0 | 1500 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+
With a file containing "enough" row (there is a problem (we store a BIGINT but we unfortunatly have an INT in the Parquet)
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST(0 as BIGINT) AS d FROM dfs.tmp.`manyrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | INT | +--------+
It's not really satisfactory but please note that there is a Trick to avoid this problem: using a CAST('0' AS BIGINT) instead of a CAST(0 AS BIGINT)
apache drill> CREATE TABLE dfs.tmp.`out_pqt` AS (SELECT CAST('0' as BIGINT) AS d FROM dfs.tmp.`manyrowfile`; +----------+---------------------------+ | Fragment | Number of records written | +----------+---------------------------+ | 1_1 | 934111 | | 1_0 | 1488743 | +----------+---------------------------+ apache drill> SELECT typeof(d) FROM dfs.tmp.`out_pqt`; +--------+ | EXPR$0 | +--------+ | BIGINT | +--------+