Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1771 Enable IDistributedDataSet in .NET for Parquet files
  3. REEF-1765

Building a Parquet Reader for Potential Integrations with Other ML Frameworks

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.16
    • 0.16
    • REEF.NET IO

    Description

      Parquet file format is very common in some well-known frameworks like Hadoop and Spark. By enabling REEF to read parquet file, we could potentially integrate with those frameworks. Currently we want to only support data of non-nested types with a table-like property. This allows us to transform the data into formats like RDDs, etc.

      A draft of ParquetReader is provided here in a PR: https://github.com/apache/reef/pull/1283

      Attachments

        Activity

          People

            Unassigned Unassigned
            shouhengyi Shouheng Yi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified