Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46810

Clarify error class terminology

    XMLWordPrintableJSON

Details

    Description

      We use inconsistent terminology when talking about error classes. I'd like to get some clarity on that before contributing any potential improvements to this part of the documentation.

      Consider INCOMPLETE_TYPE_DEFINITION. It has several key pieces of hierarchical information that have inconsistent names throughout our documentation and codebase:

      • 42
        • K01
          • INCOMPLETE_TYPE_DEFINITION
            • ARRAY
            • MAP
            • STRUCT

      What are the names of these different levels of information?

      Some examples of inconsistent terminology:

      • Over here we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION we call that an "error class". So what exactly is a class, the 42 or the INCOMPLETE_TYPE_DEFINITION?
      • Over here we call K01 the "subclass". But over here we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". So what exactly is a subclass?
      • On this page we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other places we refer to it as an "error class".

      I don't think we should leave this status quo as-is. I see a couple of ways to fix this.

      Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"

      One solution is to use the following terms:

      • Error class: 42
      • Error sub-class: K01
      • Error state: 42K01
      • Error condition: INCOMPLETE_TYPE_DEFINITION
      • Error sub-condition: ARRAY, MAP, STRUCT

      Pros: 

      • This terminology seems (to me at least) the most natural and intuitive.
      • It aligns most closely to the SQL standard.

      Cons:

      • We use errorClass all over our codebase – literally in thousands of places – to refer to strings like INCOMPLETE_TYPE_DEFINITION.
        • It's probably not practical to update all these usages to say errorCondition instead, so if we go with this approach there will be a divide between the terminology we use in user-facing documentation vs. what the code base uses.
        • We can perhaps rename the existing error-classes.json to error-conditions.json but clarify the reason for this divide between code and user docs in the documentation for ErrorClassesJsonReader .

      Option 2: 42 becomes an "Error Category"

      Another approach is to use the following terminology:

      • Error category: 42
      • Error sub-category: K01
      • Error state: 42K01
      • Error class: INCOMPLETE_TYPE_DEFINITION
      • Error sub-classes: ARRAY, MAP, STRUCT

      Pros:

      • We continue to use "error class" as we do today in our code base.
      • The change from calling "42" a "class" to a "category" is low impact and may not show up in user-facing documentation at all. (See my side note below.)

      Cons:

      • These terms do not align with the SQL standard.
      • We will have to retire the term "error condition", which we have already used in user-facing documentation.

      Option 3: "Error Class" and "State Class"

      • SQL state class: 42
      • SQL state sub-class: K01
      • SQL state: 42K01
      • Error class: INCOMPLETE_TYPE_DEFINITION
      • Error sub-classes: ARRAY, MAP, STRUCT

      Pros:

      • We continue to use "error class" as we do today in our code base.
      • The change from calling "42" a "class" to a "state class" is low impact and may not show up in user-facing documentation at all. (See my side note below.)

      Cons:

      • "State class" vs. "Error class" is a bit confusing.
      • These terms do not align with the SQL standard.
      • We will have to retire the term "error condition", which we have already used in user-facing documentation.

      Side note: In any case, I believe talking about "42" and "K01" – regardless of what we end up calling them – in front of users is not helpful. I don't think anybody cares what "42" by itself means, or what "K01" by itself means. Accordingly, we should limit how much we talk about these concepts in the user-facing documentation.

      Attachments

        Issue Links

          Activity

            People

              nchammas Nicholas Chammas
              nchammas Nicholas Chammas
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: