Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7728

Duplicated binaries in the python package

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 0.15.1
    • None
    • Python
    • None

    Description

      Hello,

       

      I'm not sure if it is a desired feature or not, but there's no "question" issue type, so I'm opening it as a bug - please correct if necessary.

       

      Most of binary files in the python "pyarrow" package are present in two versions, e.g.:

       

      libarrow.so
      libarrow.so.15
      

      or  

      libarrow.dylib
      libarrow.15.dylib
      

      (I presume, that ".15" correspond to the version of pyarrow?).

      Which are actually identical:

      $ diff libarrow.15.dylib libarrow.dylib  # returns nothing
      

      So let me ask:

      • Is it necessary to have both of them in the distribution?
      • Which one is actually imported, and is it safe to remove another one?

       

      Out of 130 MB of full pyarrow, 105 MB are those binaries, so removing duplicates would save quite some space (especially important if using pyarrow in AWS lambdas where the function is limited in size). 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Filimonov Vladimir
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: