Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6578

[C++] Casting int64 to string columns

    XMLWordPrintableJSON

Details

    Description

      I wanted to cast a list of a tables to the same schema so I could use concat_tables later. However, I encountered ArrowNotImplementedError:

      ---------------------------------------------------------------------------
      ArrowNotImplementedError                  Traceback (most recent call last)
      <ipython-input-11-bd4916c221bf> in <module>
      ----> 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
      
      <ipython-input-11-bd4916c221bf> in <listcomp>(.0)
      ----> 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
      
      ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi in itercolumns()
      
      ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi in pyarrow.lib.Column.cast()
      
      ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
      
      ArrowNotImplementedError: No cast implemented from int64 to string
      

      Some context: I want to read and concatenate a bunch of csv files that come from partitioning of the same table. Using cast after reading csv is usually significantly faster than specifying column_types in ConvertOptions. There are string columns that are mostly populated with integer-like values so a particular file can have an integer-only column. This situation is rather common so having an option to cast int64 column to string column would be helpful.

       

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              Igor Yastrebov Igor Yastrebov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m