Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12943

Consistent Reads from Standby Node

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.10.0, 3.3.0, 3.1.4, 3.2.2
    • hdfs
    • None
    • Reviewed
    • Hide
      Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests.

      To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode.

      Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent.

      A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests.
      Show
      Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests.

    Description

      StandbyNode in HDFS is a replica of the active NameNode. The states of the NameNodes are coordinated via the journal. It is natural to consider StandbyNode as a read-only replica. As with any replicated distributed system the problem of stale reads should be resolved. Our main goal is to provide reads from standby in a consistent way in order to enable a wide range of existing applications running on top of HDFS.

      Attachments

        1. HDFS-12943-004.patch
          353 kB
          Konstantin Shvachko
        2. HDFS-12943-003.patch
          353 kB
          Konstantin Shvachko
        3. HDFS-12943-002.patch
          354 kB
          Konstantin Shvachko
        4. HDFS-12943-001.patch
          328 kB
          Konstantin Shvachko
        5. TestPlan-ConsistentReadsFromStandbyNode.pdf
          79 kB
          Konstantin Shvachko
        6. ConsistentReadsFromStandbyNode.pdf
          396 kB
          Konstantin Shvachko
        7. ConsistentReadsFromStandbyNode.pdf
          394 kB
          Konstantin Shvachko

        Issue Links

          1.
          Tailing edits should not update quota counts on ObserverNode Sub-task Resolved Erik Krogen  
          2.
          Changes to the NameNode to support reads from standby Sub-task Resolved Chao Sun  
          3.
          Introduce ObserverReadProxyProvider Sub-task Resolved Chao Sun  
          4.
          [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC Sub-task Resolved Erik Krogen  
          5.
          Make Client field AlignmentContext non-static. Sub-task Resolved Plamen Jeliazkov  
          6.
          Add stateId to RPC headers. Sub-task Resolved Plamen Jeliazkov  
          7.
          Fine-grained locking while consuming journal stream. Sub-task Resolved Konstantin Shvachko  
          8.
          StandbyNode should upload FsImage to ObserverNode after checkpointing. Sub-task Resolved Chen Liang  
          9.
          Add haadmin commands to transition between standby and observer Sub-task Resolved Chao Sun  
          10.
          Support observer reads for WebHDFS Sub-task Open Chao Sun  
          11.
          Allow Observer to participate in NameNode failover Sub-task Open Unassigned  
          12.
          Standby NameNode should roll active edit log when checkpointing Sub-task Resolved Unassigned  
          13.
          Add lastSeenStateId to RpcRequestHeader. Sub-task Resolved Plamen Jeliazkov  
          14.
          HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients. Sub-task Resolved Simbarashe Dzinamarira

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20h 50m
          15.
          Support observer nodes in MiniDFSCluster Sub-task Resolved Konstantin Shvachko  
          16.
          Add ReadOnly annotation to methods in ClientProtocol Sub-task Resolved Chao Sun  
          17.
          [Edit Tail Fast Path Pt 1] Enhance JournalNode with an in-memory cache of recent edit transactions Sub-task Resolved Erik Krogen  
          18.
          [Edit Tail Fast Path Pt 2] Add ability for JournalNode to serve edits via RPC Sub-task Resolved Erik Krogen  
          19.
          [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC Sub-task Resolved Erik Krogen  
          20.
          [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync Sub-task Resolved Erik Krogen  
          21.
          Move RPC response serialization into Server.doResponse Sub-task Resolved Plamen Jeliazkov  
          22.
          Introduce msync API call Sub-task Resolved Chen Liang  
          23.
          NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode Sub-task Open Unassigned  
          24.
          ClientGCIContext should be correctly named ClientGSIContext Sub-task Resolved Konstantin Shvachko  
          25.
          Use getServiceStatus to discover observer namenodes Sub-task Resolved Chao Sun  
          26.
          Add msync server implementation. Sub-task Resolved Chen Liang  
          27.
          TestStateAlignmentContextWithHA should use real ObserverReadProxyProvider instead of AlignmentContextProxyProvider. Sub-task Resolved Plamen Jeliazkov  
          28.
          Implement performFailover logic for ObserverReadProxyProvider. Sub-task Resolved Erik Krogen  
          29.
          Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call. Sub-task Resolved Chen Liang  
          30.
          Unit tests for standby reads. Sub-task Resolved Unassigned  
          31.
          ObserverReadProxyProvider should work with IPFailoverProxyProvider Sub-task Resolved Konstantin Shvachko  
          32.
          Reduce logging frequency of QuorumJournalManager#selectInputStreams Sub-task Resolved Erik Krogen  
          33.
          Limit logging frequency of edit tail related statements Sub-task Resolved Erik Krogen  
          34.
          Refactor NameNode failover proxy providers Sub-task Resolved Konstantin Shvachko  
          35.
          Remove AlignmentContext from AbstractNNFailoverProxyProvider Sub-task Resolved Konstantin Shvachko  
          36.
          Only some protocol methods should perform msync wait Sub-task Resolved Erik Krogen  
          37.
          ObserverNode should reject read requests when it is too far behind. Sub-task Resolved Konstantin Shvachko  
          38.
          Add mechanism to allow certain RPC calls to bypass sync Sub-task Resolved Chen Liang  
          39.
          Throw retriable exception for getBlockLocations when ObserverNameNode is in safemode Sub-task Resolved Chao Sun  
          40.
          Add a configuration to turn on/off observer reads Sub-task Open Shweta

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          41.
          Handle BlockMissingException when reading from observer Sub-task Resolved Chao Sun  
          42.
          Unit Test for transitioning between different states Sub-task Resolved Sherwood Zheng  
          43.
          Fix crlf line endings in HDFS-12943 branch Sub-task Resolved Konstantin Shvachko  
          44.
          Test reads from standby on a secure cluster with IP failover Sub-task Resolved Chen Liang  
          45.
          TestObserverNode refactoring Sub-task Resolved Konstantin Shvachko  
          46.
          Introduce the single Observer failure Sub-task Resolved Sherwood Zheng  
          47.
          ObserverReadProxyProvider should enable observer read by default Sub-task Resolved Chen Liang  
          48.
          ObserverReadProxyProviderWithIPFailover should work with HA configuration Sub-task Resolved Chen Liang  
          49.
          Emulate Observer node falling far behind the Active Sub-task Resolved Sherwood Zheng  
          50.
          NN status discovery does not leverage delegation token Sub-task Resolved Chen Liang  
          51.
          Test reads from standby on a secure cluster with Configured failover Sub-task Resolved Plamen Jeliazkov  
          52.
          Allow manual failover between standby and observer Sub-task Resolved Chao Sun  
          53.
          Allow manual transition from Standby to Observer Sub-task Resolved Unassigned  
          54.
          Fix the order of logging arguments in ObserverReadProxyProvider. Sub-task Resolved Ayush Saxena  
          55.
          Fix class cast error in NNThroughputBenchmark with ObserverReadProxyProvider. Sub-task Resolved Chao Sun  
          56.
          ORFPP should also clone DT for the virtual IP Sub-task Resolved Chen Liang  
          57.
          Make ZKFC ObserverNode aware Sub-task Resolved xiangheng  
          58.
          Create user guide for "Consistent reads from Observer" feature. Sub-task Resolved Chao Sun  
          59.
          Move ipfailover config key out of HdfsClientConfigKeys Sub-task Resolved Chen Liang  
          60.
          Handle exception from internalQueueCall Sub-task Resolved Chao Sun  
          61.
          Adjust annotations on new interfaces/classes for SBN reads. Sub-task Resolved Chao Sun  
          62.
          Description errors in the comparison logic of transaction ID Sub-task Resolved xiangheng  
          63.
          Update "Consistent Read from Observer" User Guide with Edit Tailing Frequency Sub-task Resolved Erik Krogen  
          64.
          Document dfs.ha.tail-edits.period in user guide. Sub-task Resolved Chao Sun  
          65.
          ObserverReadInvocationHandler should implement RpcInvocationHandler Sub-task Resolved Konstantin Shvachko  
          66.
          Balancer should work with ObserverNode Sub-task Resolved Erik Krogen  
          67.
          Fix white spaces related to SBN reads. Sub-task Resolved Konstantin Shvachko  
          68.
          [SBN read] Unclear Log.WARN message in GlobalStateIdContext Sub-task Resolved Shweta  
          69.
          [SBN Read] StateId and TrasactionId not present in Trace level logging Sub-task Resolved Shweta  
          70.
          Throwing RemoteException in the time of Read Operation Sub-task Resolved Unassigned  
          71.
          [SBN Read] Add the document link to the top page Sub-task Resolved Takanobu Asanuma  
          72.
          [SBN read] Got an unexpected txid when tail editlog Sub-task Resolved Zhaohui Wang  
          73.
          Fix logging error in TestEditLog#testMultiStreamsLoadEditWithConfMaxTxns Sub-task Resolved Jonathan Hung  
          74.
          [SBN read] Change client logging to be less aggressive Sub-task Resolved Chen Liang  
          75.
          [SBN read] StanbyNode does not come out of safemode while adding new blocks. Sub-task Resolved Unassigned  
          76.
          [SBN read] reportBadBlock is rejected by Observer. Sub-task Open Unassigned  
          77.
          [SBN read] Revisit GlobalStateIdContext locking when getting server state id Sub-task Resolved Chen Liang  
          78.
          [SBN read] Allow configurably enable/disable AlignmentContext on NameNode Sub-task Resolved Chen Liang  
          79.
          Prevent Observer NameNode from becoming StandBy NameNode Sub-task Resolved Aihua Xu  
          80.
          RBF: Support observer node from Router-Based Federation Sub-task Resolved Simbarashe Dzinamarira  

          Activity

            People

              shv Konstantin Shvachko
              shv Konstantin Shvachko
              Votes:
              4 Vote for this issue
              Watchers:
              87 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 21.5h
                  21.5h