Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17566 Über-jira: S3A Hadoop 3.3.2 features
  3. HADOOP-17771

S3AFS creation fails "Unable to find a region via the region provider chain."

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.3.1
    • 3.3.2
    • fs/s3

    Description

      If you don't have fs.s3a.endpoint set and lack a region set in
      env var AWS_REGION_ENV_VAR, system property aws.region or the file ~/.aws/config
      then S3A FS creation fails with the message
      "Unable to find a region via the region provider chain."

      This is caused by the move to the AWS S3 client builder API in HADOOP-13551

      This is pretty dramatic and no doubt everyone will be asking "why didn't you notice this?",

      But in fact there are some reasons.

      1. when running in EC2, all is well. Meaning our big test runs were all happy.
      2. if a developer has fs.s3a.endpoint set for the test bucket, all is well.
        Those of us who work with buckets in the "regions tend to do this, not least because it can save a HEAD request every time an FS is created.
      3. if you have a region set in ~/.aws/config then all is well

      reason #3 is the real surprise and the one which has really caught us out. Even my tests against buckets in usw-2 through central didn't fail because of course I, like my colleagues, have the AWS cli client installed locally. This was sufficient to make the problem go away. It is also why this has been an intermittent problem on test clusters outside AWS infra: it really depended on the VM/docker image whether things worked or not.

      Quick Fix: set fs.s3a.endpoint to s3.amazonaws.com

      If you have found this JIRA because you are encountering this problem, you can fix it in by explicitly declaring the endpoint in core-site.xml

      <property>
        <name>fs.s3a.endpoint</name>
        <value>s3.amazonaws.com</value>
      </property>
      

      For Apache Spark, this can be done in spark-defaults.conf

      spark.hadoop.fs.s3a.endpoint s3.amazonaws.com
      

      If you know the exact AWS region your data lives in, set the endpoint to be that region's endpoint, and so save an HTTPS request to s3.amazonaws.com every time an S3A Filesystem instance is created.

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m