Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9124 Transparently retry queries that fail due to cluster membership changes
  3. IMPALA-9254

Queries should only be retried if all fragments fail with retryable errors

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Distributed Exec
    • None
    • ghx-label-4

    Description

      Currently, Impala only propagates an overall_status from an executor to the coordinator. The overall_status is set in the QueryState and "If multiple fragments have errors, the first fragment to hit an error is given preference.".

      The issue is that if multiple fragments fail, it is possible some of the errors should trigger a retry, while other errors shouldn't. For example, one fragment could fail due to faulty disks, but others could fail due to mem limit exceptions. These types of queries shouldn't be retried because it is likely the query will just fail again.

      This can only happen if the non-retryable error occurs in a specific time window: [when the retryable error occurs, the query is cancelled]. Since any fragment failure causes the entire query to be cancelled, this can only occur if the non-retryable error occurs after the retryable error, but before the query is cancelled.

      Attachments

        Activity

          People

            Unassigned Unassigned
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: