Description
How to reproduce:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet/dir' STORED AS parquet select cast(1 as decimal) as decimal1;
create table test_parquet stored as parquet as select cast(1 as decimal) as decimal1;
$ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000 file: file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000 creator: parquet-mr version 1.6.0 (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb) file schema: hive_schema -------------------------------------------------------------------------------- decimal1: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 row group 1: RC:1 TS:46 OFFSET:4 -------------------------------------------------------------------------------- decimal1: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:48/46/0.96 VC:1 ENC:BIT_PACKED,PLAIN,RLE ST:[no stats for this column]
because spark still use com.twitter.parquet-hadoop-bundle.1.6.0.
May be we should refactor CreateHiveTableAsSelectCommand and InsertIntoHiveDirCommand or upgrade built-in Hive.
Attachments
Issue Links
- relates to
-
SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
- Resolved