Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1533

Spark Kudu Rdd/Dataframe upsert

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0
    • None
    • None
    • Spark

    Description

      Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
      You mention in the documentation under "Kudu integration with Spark":
      some possible operations to perform:
      ***********************************************
      // then we can insert data into the kudu table
      df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu

      // to update existing data change the mode to 'overwrite'
      df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu
      ****************************************************************
      But there is no possibility to perform:
      kuduDataFrame.write.options(Map("kudu.master" -> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu
      ***************************************************************
      the current solution which is quit slow is:
      Call DataFrame.foreachpartition

      • open the table
      • create session
        --For each row in this partition
          • create upsert operation
          • get row from the operation
          • add all fields and values to this row
          • perform this operation
            ----------------------------------
            this solution is quit slow! so adding upsert mode to Dataframe writing function for Kudu tables could be better than open sessions and create operations as the previous solution.
            kuduDataFrame.write.options(Map("kudu.master" -> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu

      Attachments

        Activity

          People

            wdberkeley William Berkeley
            Qutiba Qutiba
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: