Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5718

[R] auto splice data frames in record_batch() and table()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.14.0
    • R

    Description

      ARROW-3814https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94 changed the API of `record_batch()` and `arrow::table()` such that you could no longer pass in a data.frame to the function, not without massaging it yourself. That broke sparklyr integration tests with an opaque `cannot infer type from data` error, and it's unfortunate that there's no longer a direct way to go from a data.frame to a record batch, which sounds like a common need.

      In order to follow best practices (cf. the tibble package, for example), we should (1) add an as_record_batch function, which the data.frame method is probably just as_record_batch.data.frame <- function record_batch(!!!x); and (2) if a user supplies a single, unnamed data.frame as the argument to record_batch(), raise an error that says to use as_record_batch(). We may later decide that we should automatically call as_record_batch(), but in case that is too magical and prevents some legitimate use case, let's hold off for now. It's easier to add magic than remove it.

      Once this function exists, sparklyr tests can try to use as_record_batch, and if that function doesn't exist, fall back to record_batch (because that means it has an older released version of arrow that doesn't have as_record_batch, so record_batch(df) should work).

      cc javierluraschi

      Attachments

        Issue Links

          Activity

            People

              romainfrancois Romain Francois
              npr Neal Richardson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h