Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13002

Update to_dataframe API Docs to focus on schema use

Details

    • Task
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • dsl-dataframe
    • None

    Description

      The API documentation for to_dataframe (https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe) is very sparse. It also focuses on specifying a proxy, rather than relying a schema-aware PCollection as an input. This function is often people's entrypoint into the API, so we should make it very clear how to use it. Let's expand the documentation, and focus on using schemas rather than specifying a proxy.

      We should also link to the documentation for to_dataframe in https://beam.apache.org/documentation/dsls/dataframes/overview/#embedding-dataframes-in-a-pipeline

      Attachments

        Activity

          People

            Unassigned Unassigned
            bhulette Brian Hulette
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: