Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-4239

Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

    XMLWordPrintableJSON

Details

    • Important

    Description

      Hi Team ,

      We are doing a POC with Carbondata using MV .
      Our MV doesnot contain AVG function as we wanted to utilize the feature of incremental refresh.
      But with incremetnal refresh , we noticed the MV doesnot aggregate value correctly.
      If a row is inserted , it creates another row in MV instead of adding incremental value .
      As a result no. of rows in MV are almost same as raw table.
      This doesnot happen with full refresh MV.
      Below is the data in MV with 3 rows :

      scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
      --------------------------------------------------------------------------------------------------------------------------------------

      fact_365_1_eutrancell_21_tags_id fact_365_1_eutrancell_21_metric ts sum_value min_value max_value fact_365_1_eutrancell_21_ts2

      --------------------------------------------------------------------------------------------------------------------------------------

      ff6cb0f7-fba0-413... eUtranCell.HHO.X2... 2020-09-25 06:30:00 5412.6810000000005 31.345 4578.112 2020-09-25 05:30:00
      ff6cb0f7-fba0-413... eUtranCell.HHO.X2... 2020-09-25 05:30:00 1176.7035 392.2345 392.2345 2020-09-25 05:30:00
      ff6cb0f7-fba0-413... eUtranCell.HHO.X2... 2020-09-25 06:00:00 58.112 58.112 58.112 2020-09-25 05:30:00

      --------------------------------------------------------------------------------------------------------------------------------------

      Below , i am inserting data for 6th hour, and it should add incremental values to 6th hour row of MV.
      Note the data being inserted ; columns which are part of groupby clause are having same values as existing data.

      scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25 05:30:00')").show()
      21/06/28 16:01:31 AUDIT audit: {"time":"June 28, 2021 4:01:31 PM IST","username":"root","opName":"INSERT INTO","opId":"7332282307468267","opStatus":"START"}
      21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:33 AUDIT audit: {"time":"June 28, 2021 4:01:33 PM IST","username":"root","opName":"INSERT INTO","opId":"7332284066443156","opStatus":"START"}
      [Stage 40:=====================================================>(199 + 1) / 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch one more time.
      21/06/28 16:01:44 AUDIT audit: {"time":"June 28, 2021 4:01:44 PM IST","username":"root","opName":"INSERT INTO","opId":"7332284066443156","opStatus":"SUCCESS","opTime":"11343 ms","table":"default.fact_365_1_eutrancell_21_30_minute","extraInfo":{}}
      21/06/28 16:01:44 AUDIT audit: {"time":"June 28, 2021 4:01:44 PM IST","username":"root","opName":"INSERT INTO","opId":"7332282307468267","opStatus":"SUCCESS","opTime":"13137 ms","table":"default.fact_365_1_eutrancell_21","extraInfo":{}}
      ----------

      Segment ID

      ----------

      8

      ----------

      Below we can see it has added another row of 2020-09-25 06:00:00 .
      Note: All values of columns which are part of groupby caluse have same value.
      This means there should have been single row for 2020-09-25 06:00:00 .

      scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show(1000,false)
      -------------------------------------------------------------------------------------------------------------------------------------------------

      fact_365_1_eutrancell_21_tags_id fact_365_1_eutrancell_21_metric ts sum_value min_value max_value fact_365_1_eutrancell_21_ts2

      -------------------------------------------------------------------------------------------------------------------------------------------------

      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 06:30:00 5412.6810000000005 31.345 4578.112 2020-09-25 05:30:00
      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 05:30:00 1176.7035 392.2345 392.2345 2020-09-25 05:30:00
      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 06:00:00 58.112 58.112 58.112 2020-09-25 05:30:00
      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 06:00:00 118.112 118.112 118.112 2020-09-25 05:30:00

      -------------------------------------------------------------------------------------------------------------------------------------------------

      scala> carbon.sql("select * from fact_365_1_eutrancell_21").show(1000,false)
      ----------------------------------------------------------------------------------------------------------------

      ts metric tags_id value ts2

      ----------------------------------------------------------------------------------------------------------------

      2020-09-25 05:30:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 392.2345 2020-09-25 05:30:00
      2020-09-25 05:30:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 392.2345 2020-09-25 05:30:00
      2020-09-25 05:30:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 392.2345 2020-09-25 05:30:00
      2020-09-25 06:30:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 31.345 2020-09-25 05:30:00
      2020-09-25 06:40:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 745.112 2020-09-25 05:30:00
      2020-09-25 06:50:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 4578.112 2020-09-25 05:30:00
      2020-09-25 06:55:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 58.112 2020-09-25 05:30:00
      2020-09-25 06:25:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 58.112 2020-09-25 05:30:00
      2020-09-25 06:05:00 eUtranCell.HHO.X2.InterFreq.PrepAttOut ff6cb0f7-fba0-4134-81ee-55e820574627 118.112 2020-09-25 05:30:00

      ----------------------------------------------------------------------------------------------------------------

       

      after droping and creating the MV again, we can see single row with 2020-09-25 06:00:00 .

      scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show(1000,false)
      -------------------------------------------------------------------------------------------------------------------------------------------------

      fact_365_1_eutrancell_21_tags_id fact_365_1_eutrancell_21_metric ts sum_value min_value max_value fact_365_1_eutrancell_21_ts2

      -------------------------------------------------------------------------------------------------------------------------------------------------

      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 06:30:00 5412.6810000000005 31.345 4578.112 2020-09-25 05:30:00
      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 05:30:00 1176.7035 392.2345 392.2345 2020-09-25 05:30:00
      ff6cb0f7-fba0-4134-81ee-55e820574627 eUtranCell.HHO.X2.InterFreq.PrepAttOut 2020-09-25 06:00:00 176.224 58.112 118.112 2020-09-25 05:30:00

      -------------------------------------------------------------------------------------------------------------------------------------------------

       

      Please check what is the issue with incremental refresh MV.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sushantsam Sushant Sammanwar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: