Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-4322

Insert into local sort partition table select * from text table launch thousands tasks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • None
    • None

    Description

      [Reproduce steps]

      1. CREATE TABLE partitionthree1 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int) STORED AS carbondata tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
      2. CREATE TABLE partitionthree2 (empno int, doj Timestamp, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int, empname String, designation String) PARTITIONED BY (workgroupcategory int);
      3. LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 'TIMESTAMPFORMAT'='dd-MM-yyyy');
      4. set hive.exec.dynamic.partition.mode=nonstrict;
      5. insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
        insert into partitionthree2 select * from partitionthree1;
      6. insert into partitionthree1 select * from partitionthree2;

       

      [Expect Result]

      Step 6 only launches number of tasks equal to number of nodes.

       

      [Current Behavior]

      Number of tasks far larger than number of nodes.

       

      [Impact]

      In several product sites, query performance get impact significantly.

       

      [Initial analysis]

      Insert into non partition local sort table will launch number of tasks equal to number of nodes, make partition table the same.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Shreelekhya SHREELEKHYA GAMPA
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 8h 10m
                8h 10m