Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10940

Fix Documentation for AWS-Hadoop integration / yarn-site.xml

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      The following document on AWS-Hadoop integration specified authenticating via AWS environment variables:
      https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_the_AWS_Environment_Variables

      It provides a warning:

      Important: These environment variables are generally not propagated from client to server when YARN applications are launched. That is: having the AWS environment variables set when an application is launched will not permit the launched application to access S3 resources. The environment variables must (somehow) be set on the hosts/processes where the work is executed.

      This is somewhat cryptic. A few things need to be clarified in the doc:

      1. This is true even when Yarn is running on a single node (pseudo distributed).
      2. This also affects authentication via named profile credentials:https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Using_Named_Profile_Credentials_with_ProfileCredentialsProvider __ This method depends on AWS_PROFILE variable.
      3. Please give some pointers on how the variables can be propagated. One way is to whitelist the variable in yarn.nodemanager.env-whitelist (set in yarn-site.xml): https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons

      I was trying to figure out why hive was failing on a query (using mapred) on an external table created from S3. After a while I realized it was not getting the AWS_PROFILE variable. Eventually I realized that adding the variable to the Yarn whitelist will do the trick. Hopefully this ticket will help someone else.  

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gganesan74 Girish Ganesan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified