[HIVE-19580] Hive 2.3.2 with ORC files & stored on S3 are case sensitive on EMR - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 2.3.2
Fix Version/s: 2.3.2
Component/s: None
Labels:
None
Environment:

EMR s3:// connector

Spark 2.3 but also true for lower versions

Hive 2.3.2

Target Version/s:

2.3.2

Description

Original file is csv:

COL1,COL2
1,2

ORC file are created with Spark 2.3:

scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")

scala> df.printSchema
root

– COL1: string (nullable = true)

– COL2: string (nullable = true)

scala> df.write.orc("s3://bucket/prefix")

In Hive:

hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC LOCATION ("s3://bucket/prefix");

hive> SELECT * FROM test_orc;
OK
NULL NULL

Everyfield is null. However if fields are generated using lower case in Spark schemas then everything works.

The reason why I'm raising this bug is that we have customers using Hive 2.3.2 to read files we generate through Spark and all our code base is addressing fields using upper case while this is incompatible with their Hive instance.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Arthur Baudry

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/May/18 06:51

Updated:: 20/Feb/19 12:23

Resolved:: 20/Feb/19 12:23