Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14743

Improve delegation token handling in secure clusters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • Spark Core, YARN
    • None

    Description

      In a way, I'd consider this a parent bug of SPARK-7252.

      Spark's current support for delegation tokens is a little all over the place:

      • for HDFS, there's support for re-creating tokens if a principal and keytab are provided
      • for HBase and Hive, Spark will fetch delegation tokens so that apps can work in cluster mode, but will not re-create them, so apps that need those will stop working after 7 days
      • for anything else, Spark doesn't do anything. Lots of other services use delegation tokens, and supporting them as data sources in Spark becomes more complicated because of that. e.g., Kafka will (hopefully) soon support them.

      It would be nice if Spark had consistent support for handling delegation tokens regardless of who needs them. I'd list these as the requirements:

      • Spark to provide a generic interface for fetching delegation tokens. This would allow Spark's delegation token support to be extended using some plugin architecture (e.g. Java services), meaning Spark itself doesn't need to support every possible service out there.

      This would be used to fetch tokens when launching apps in cluster mode, and when a principal and a keytab are provided to Spark.

      • A way to manually update delegation tokens in Spark. For example, a new SparkContext API, or some configuration that tells Spark to monitor a file for changes and load tokens from said file.

      This would allow external applications to manage tokens outside of Spark and be able to update a running Spark application (think, for example, a job sever like Oozie, or something like Hive-on-Spark which manages Spark apps running remotely).

      • A way to notify running code that new delegation tokens have been loaded.

      This may not be strictly necessary; it might be possible for code to detect that, e.g., by peeking into the UserGroupInformation structure. But an event sent to the listener bus would allow applications to react when new tokens are available (e.g., the Hive backend could re-create connections to the metastore server using the new tokens).

      Also, cc'ing busbey and steve_l since you've talked about this in the mailing list recently.

      Attachments

        Issue Links

          Activity

            People

              jerryshao Saisai Shao
              vanzin Marcelo Masiero Vanzin
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: