Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12235

test_multiple_coordinator() failed because _start_impala_cluster() returned non-zero exit status

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.3.0
    • None
    • ghx-label-11

    Description

      We found that test_multiple_coordinator() could fail because _start_impala_cluster() returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.

      Error Message

      CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
      

      Stacktrace

      custom_cluster/test_coordinators.py:41: in test_multiple_coordinators
          self._start_impala_cluster([], num_coordinators=2, cluster_size=3)
      common/custom_cluster_test_suite.py:330: in _start_impala_cluster
          check_call(cmd + options, close_fds=True)
      /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call
          raise CalledProcessError(retcode, cmd)
      E   CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
      

      The following console output shows that 'num_known_live_backends' could not reach 3 in 4 mins and thus the command that starts the cluster failed with non-zero exit status.

      -- 2023-06-21 20:54:40,594 INFO     MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50     --statestore_priority_update_frequency_ms=50     --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests --log_level=1 --impalad_args=--default_query_options=
      20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
      20:54:41 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
      20:54:42 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
      20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
      20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
      20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
      20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:54:46 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
      20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1
      20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:54:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
      20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1
      20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000
      20:54:48 MainThread: num_known_live_backends has reached value: 3
      20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
      20:54:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
      ...
      20:58:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      20:58:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001
      20:58:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2
      20:58:49 MainThread: Error starting cluster
      Traceback (most recent call last):
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py", line 931, in <module>
          expected_cluster_size - expected_catalog_delays)
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_cluster.py", line 205, in wait_until_ready
          early_abort_fn=check_processes_still_running)
        File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_service.py", line 374, in wait_for_num_known_live_backends
          assert 0, 'num_known_live_backends did not reach expected value in time'
      AssertionError: num_known_live_backends did not reach expected value in time
      -- 2023-06-21 20:58:49,141 DEBUG    MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      

      Attachments

        Activity

          People

            wzhou Wenzhe Zhou
            fangyurao Fang-Yu Rao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: