Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36325

Writing to hiveserver throught jdbc throws ParseException

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2
    • None
    • SQL
    • OS: Debian 10

      Spark version: 3.1.2

      Zeppelin Notebook: 0.9.0

      Jdbc driver: org.apache.hive:hive-jdbc:3.1.2  

    Description

      Hello everyone, I am new working on Spark and this is my first post. If I make a mistake please be kind to me but I have searched in the web and I haven't found anything related. If this bug is duplicated or something please feel free to close it and tell me where to look.

      I am working with Zeppelin, I got a dataframe from Solr API, I processed and I want to write to a table trough thrift  and read that new table from Apache SuperSet.

       

      I have this df with this schema:

      %spark
      df_linux.printSchema()
      root
       |-- time: string (nullable = false)
       |-- raw_log: string (nullable = false)
       |-- service_name: string (nullable = false)
       |-- hostname: string (nullable = false)
       |-- pid: string (nullable = false)
       |-- username: string (nullable = false)
       |-- source_ip: string (nullable = false)
      
      

       

      And this content:

       

      %spark
      df_linux.show()
      +--------------------+--------------------+------------+------------------+-----+--------+---------+
      | time| raw_log|service_name| hostname| pid|username|source_ip|
      +--------------------+--------------------+------------+------------------+-----+--------+---------+
      |2021-07-28T07:41:53Z|Jul 28 07:41:52 s...| sshd[11611]|sa3secessuperset01|11611| debian| 10.0.9.3|
      |2021-07-28T07:41:44Z|Jul 28 07:41:43 s...| sshd[11590]|sa3secessuperset01|11590| debian| 10.0.9.3|
      |2021-07-27T08:46:11Z|Jul 27 08:46:10 s...| sshd[16954]|sa3secessuperset01|16954| debian| 10.0.9.3|
      |2021-07-27T08:44:55Z|Jul 27 08:44:54 s...| sshd[16511]|sa3secessuperset01|16511| debian| 10.0.9.3|
      |2021-07-27T08:30:03Z|Jul 27 08:30:02 s...| sshd[14511]|sa3secessuperset01|14511| debian| 10.0.9.3|
      +--------------------+--------------------+------------+------------------+-----+--------+---------+
      

       

      When I write the dataframe through jdbc I got this error:

       

       

      df_linux.write.mode("overwrite")
       .format("jdbc")
       .option("driver","org.apache.hive.jdbc.HiveDriver")
       .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:10000")
       .option("dbtable", "o365new")
       .option("createTableColumnTypes", "time VARCHAR(1024) NOT NULL, raw_log VARCHAR(1024) NOT NULL, service_name VARCHAR(1024), hostname VARCHAR(1024), pid VARCHAR(1024), username VARCHAR(1024), source_ip STRING")
       .save()
      
      java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException:
      no viable alternative at input '("time"'(line 1, pos 22)== SQL ==
      CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL)
      ----------------------^^^

       

      What I have seen it's the way it tries to create the table. If you run the generated SQL sentence in Beeline it would throw exactly the same error:

       

      %hive
      CREATE TABLE o365new ("time" varchar(1024) NOT NULL, "raw_log" varchar(1024) NOT NULL, "service_name" varchar(1024) NOT NULL, "hostname" varchar(1024) NOT NULL, "pid" varchar(1024) NOT NULL, "username" varchar(1024) NOT NULL, "source_ip" string NOT NULL);
      
      org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
      no viable alternative at input '("time"'(line 2, pos 22

       

       

      Then I just removed the quotes and the table is created without any problem:

       

      %hive
      CREATE TABLE o365new (time varchar(1024) NOT NULL, raw_log varchar(1024) NOT NULL, service_name varchar(1024) NOT NULL, hostname varchar(1024) NOT NULL, pid varchar(1024) NOT NULL, username varchar(1024) NOT NULL, source_ip string NOT NULL)
      

       

       

      So, the problem are the quotes, that's why I think this is a bug, but I don't know how to "override" the query like I do with "createTableColumnTypes". Maybe this is not the way to work and there is another approach but I don't know how to.

       

      Best regards.

       

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            jrballesteros05 Jesús Ricardo Ballesteros Molina

            Dates

              Created:
              Updated:

              Slack

                Issue deployment