Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11473

Create a safe mode RM service to enable DB access

    XMLWordPrintableJSON

Details

    Description

      We have seen various issues where RM fails to start due to bad state leading to exceptions on startup.

      Eg: https://issues.apache.org/jira/browse/YARN-2340

      Another issue we have seen internally is with issues in the capacity scheduler config

      org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManagerjava.lang.IllegalArgumentException: Illegal queue capacity setting, (abs-capacity=0.009548) > (abs-maximum-capacity=0.0095). When label=[]

      In such cases, we can't recover until a bug fix is deployed to enable RM to start so that the data can be corrected. And during the time RM is forcefully brought up in those cases, RM can still serve client / AM requests & further complicate things. 

      Ideally we should be able to fix the database independently of RM unable to startup. But with levelDB which is an embedded database this isn't possible without RM being up. Using seperate tools like leveldb-cli isn't useful always because it requires additional code to handle specific comparators etc & requires to be deployed together with RM binaries etc.  

      A patch to delete applications from state store was implemented in https://issues.apache.org/jira/browse/YARN-3410 but that won't work for other bad entries in state store like DTs / Master keys / App attempts / CS Conf from which we can't recover

      A generic DB access will be helpful to delete / update invalid keys. 

      A better solution is to create a safe mode feature in RM which starts RM with basic functionality to enable fixing it. RM will not serve client / AM / NM requests in this mode. This mode will enable selective admin functionality only (read / write access to the state store). 

      Attachments

        Issue Links

          Activity

            People

              krishan1390 Krishan Goyal
              krishan1390 Krishan Goyal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: