XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.10.1
    • None
    • fs/s3
    • Patch

    Description

      We are using the latest version of delta-sharing which takes advantage of hadoop-aws (S3A) connector in Hadoop release version 2.10.1 to mount an AWS S3 File System. In our particular setup, all services are operated in Amazon Elastic Kubernetes Service (EKS) and need to comply to the AWS security concept IAM roles for service accounts (IRSA).

      As Delta sharing S3 connection doesn't offer any corresponding support, we patched hadoop-aws-2.10.1 to address this need via a new credentials provider class org.apache.hadoop.fs.s3a.OIDCTokenCredentialsProvider. We also upgraded dependency aws-java-sdk-bundle to its latest version 1.12.167 as AWS WebIdentityTokenCredentialsProvider class was not yet available in original version 1.11.271.

      We believe that other delta-sharing users could benefit from this short-term contribution. Then sooner or later, delta-sharing owners will have to upgrade their project to a more recent version of hadoop-aws that is probably more widely used. The effort to promote this change is probably low.

      Additional note: AWS WebIdentityTokenCredentialsProvider class is directly supported by Spark applications submitted with configuration properties `spark.hadoop.fs.s3a.aws.credentials.provider`and `spark.kubernetes.authenticate.submission.oauthToken` (doc). So bringing this support to Hadoop will primarily be interesting for non-Spark users.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jclarysse Ju Clarysse
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m