Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2693

Buffer DiskRowSet flushes to more efficiently write many columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.0
    • None
    • fs, perf, tablet
    • None

    Description

      When looking at a trace of some MRS flushes on a table with 280 columns, it was observed that during the course of the flush some 695 fdatasync() calls occurred.

      One possible way to minimize the number of fsync calls would be to flush directly to memory buffers first, determine the ideal layout on disk for the flushed blocks (possibly striped across one log block container per data disk) and then potentially write the data out to the containers in parallel. This would require some memory buffer space to be reserved per maintenance manager thread, possibly 64MB since the DRS roll size is 32MB.

      According to Todd we could probably do it all in LogBlockManager by adding a new flag to CreateBlockOptions that says whether to buffer or something like that.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            mpercy Mike Percy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: