Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14032 [libhdfs++] Phase 2 improvements
  3. HDFS-8746

Reduce the latency of streaming reads by re-using DN connections

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • hdfs-client
    • None

    Description

      The current libhdfspp implementation opens a new connection for each pread. For streaming reads (especially streaming short-buffer reads coming from the C API, and especially once we get SSL handshake overhead), our throughput will be dominated by the connection latency of reconnecting to the DataNodes.

      The target use case is a multi-block file that is being sequentially streamed and processed by the client application, which consumes the data as it comes from the DN and throws it away. The data is read into moderately small buffers (~64k - ~1MB) owned by the consumer, and overall throughput is the critical metric.

      Attachments

        Activity

          People

            James Clampffer James Clampffer
            bobthansen Bob Hansen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: