Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13233 [C++] Support ORC in Arrow Dataset
  3. ARROW-14805

[C++][Dataset] Support Count function without projections in ORC to avoid loading all columns

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      For ORC support in dataset, when execute count query without projections, just like "select count from table", it will load all columns. Because orc lib code is that https://github.com/apache/orc/blob/22828f79a526069d9629719c9476b7addad91ae6/c%2B%2B/src/Reader.cc#L120-L144.

       

      Arrow side can improve it like parquet in dataset.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhixingheyi-tian zhixingheyi-tian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: