Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18838

Some fs.s3a.* config values are different in sources and documentation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.6
    • None
    • documentation, fs/s3
    • None

    Description

      For config option fs.s3a.retry.throttle.interval default value in source code is 500ms:

      public static final String RETRY_THROTTLE_INTERVAL_DEFAULT = "500ms";
      

      https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L921

      In core-default.xml it has value 100ms, but in the description 500ms:

      <property>
        <name>fs.s3a.retry.throttle.interval</name>
        <value>100ms</value>
        <description>
          Initial between retry attempts on throttled requests, +/- 50%. chosen at random.
          i.e. for an intial value of 3000ms, the initial delay would be in the range 1500ms to 4500ms.
          Backoffs are exponential; again randomness is used to avoid the thundering heard problem.
          500ms is the default value used by the AWS S3 Retry policy.
        </description>
      </property>
      

      https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L1750
      This change introduced in HADOOP-16823.

      In Hadoop-AWS module documentation it has value 1000ms:

      <property>
        <name>fs.s3a.retry.throttle.interval</name>
        <value>1000ms</value>
        <description>
          Interval between retry attempts on throttled requests.
        </description>
      </property>
      

      https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md?plain=1#L1223
      File was created in HADOOP-13786, and value is left unchanged since when.

      In performance tuning page it has up-to-date value 500ms:

      <property>
        <name>fs.s3a.retry.throttle.interval</name>
        <value>500ms</value>
        <description>
          Interval between retry attempts on throttled requests.
        </description>
      </property>
      

      https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md?plain=1#L435
      This change introduced in HADOOP-15076.

      The same issue with:

      • fs.s3a.retry.throttle.limit - in source code it has value 20, but in some documents still old value ${fs.s3a.attempts.maximum}
      • fs.s3a.connection.establish.timeout - in source code it has value 50_000, in config file & documentation 5_000
      • fs.s3a.attempts.maximum - in source code it has value 10, in config file & documentation 20
      • fs.s3a.threads.max - in source & documentation code it has value 10, in config file 64
      • fs.s3a.max.total.tasks - in source code & config it has value 32, in documentation 5
      • fs.s3a.connection.maximum - in source code & config it has value 96, in documentation 15 or 30

      Please sync these values, outdated documentation is very painful to work with.
      As an idea, is it possible to use core-default.xml directly in documentation, or generate this documentation from docstrings in Java code?

      Attachments

        Activity

          People

            Unassigned Unassigned
            dolfinus Maxim Martynov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: