Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7112

Wrong contents when initializinga pyarrow.Table from boolean DataFrame

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.14.1
    • 0.15.0
    • Python
    • None
    • Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu

    Description

      When initializing a Table from a boolean pandas.DataFrame that is not in Fortran order, the contents of the resulting Table is different from the contents of the DataFrame.

      Sample:

       

      import pandas as pd
      import pyarrow as pa
      import numpy as np
      mask = np.full((3,3), False)
      mask[:,1] = True
      df = pd.DataFrame(mask)
      print(df)
      print(pa.table(df).to_pandas()) 
      

       

      The output:

       

             0     1      2
      0  False  True  False
      1  False  True  False
      2  False  True  False
             0      1      2
      0  False   True  False
      1  False  False  False
      2  False  False  False
      

      I.e., column 1 is different before and after roundtripping through pa.Table.

      If I add order='F' to the np.full invocation, the result is as expected. Also, the problem seems to disappear if I use dtype=int.

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jobh Joachim Haga
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: