[SPARK-36325] Writing to hiveserver throught jdbc throws ParseException - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Add vote

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.2
Fix Version/s: None
Component/s: SQL
Labels:
- spark
- spark-sql
Environment:

OS: Debian 10

Spark version: 3.1.2

Zeppelin Notebook: 0.9.0

Jdbc driver: org.apache.hive:hive-jdbc:3.1.2

Description

Hello everyone, I am new working on Spark and this is my first post. If I make a mistake please be kind to me but I have searched in the web and I haven't found anything related. If this bug is duplicated or something please feel free to close it and tell me where to look.

I am working with Zeppelin, I got a dataframe from Solr API, I processed and I want to write to a table trough thrift and read that new table from Apache SuperSet.

I have this df with this schema:

%spark
df_linux.printSchema()
root
 |-- time: string (nullable = false)
 |-- raw_log: string (nullable = false)
 |-- service_name: string (nullable = false)
 |-- hostname: string (nullable = false)
 |-- pid: string (nullable = false)
 |-- username: string (nullable = false)
 |-- source_ip: string (nullable = false)

And this content:

%spark
df_linux.show()
+--------------------+--------------------+------------+------------------+-----+--------+---------+
| time| raw_log|service_name| hostname| pid|username|source_ip|
+--------------------+--------------------+------------+------------------+-----+--------+---------+
|2021-07-28T07:41:53Z|Jul 28 07:41:52 s...| sshd[11611]|sa3secessuperset01|11611| debian| 10.0.9.3|
|2021-07-28T07:41:44Z|Jul 28 07:41:43 s...| sshd[11590]|sa3secessuperset01|11590| debian| 10.0.9.3|
|2021-07-27T08:46:11Z|Jul 27 08:46:10 s...| sshd[16954]|sa3secessuperset01|16954| debian| 10.0.9.3|
|2021-07-27T08:44:55Z|Jul 27 08:44:54 s...| sshd[16511]|sa3secessuperset01|16511| debian| 10.0.9.3|
|2021-07-27T08:30:03Z|Jul 27 08:30:02 s...| sshd[14511]|sa3secessuperset01|14511| debian| 10.0.9.3|
+--------------------+--------------------+------------+------------------+-----+--------+---------+

When I write the dataframe through jdbc I got this error:

df_linux.write.mode("overwrite")
 .format("jdbc")
 .option("driver","org.apache.hive.jdbc.HiveDriver")
 .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:10000")
 .option("dbtable", "o365new")
 .option("createTableColumnTypes", "time VARCHAR(1024) NOT NULL, raw_log VARCHAR(1024) NOT NULL, service_name VARCHAR(1024), hostname VARCHAR(1024), pid VARCHAR(1024), username VARCHAR(1024), source_ip STRING")
 .save()

java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input '("time"'(line 1, pos 22)== SQL ==
CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL)
----------------------^^^

What I have seen it's the way it tries to create the table. If you run the generated SQL sentence in Beeline it would throw exactly the same error:

%hive
CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL);

org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input '("time"'(line 2, pos 22

Then I just removed the quotes and the table is created without any problem:

%hive
CREATE TABLE o365new (time varchar(1024) NOT NULL, raw_log varchar(1024) NOT NULL, service_name varchar(1024) NOT NULL, hostname varchar(1024) NOT NULL, pid varchar(1024) NOT NULL, username varchar(1024) NOT NULL, source_ip string NOT NULL)

So, the problem are the quotes, that's why I think this is a bug, but I don't know how to "override" the query like I do with "createTableColumnTypes". Maybe this is not the way to work and there is another approach but I don't know how to.

Best regards.