Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6762

[C++] JSON reader segfaults on newline

    XMLWordPrintableJSON

Details

    Description

      Using the SampleRecord.jl attachment from ARROW-6737, I notice that trying to read this file on master results in a segfault:

      In [1]: from pyarrow import json 
         ...: import pyarrow.parquet as pq 
         ...:  
         ...: r = json.read_json('SampleRecord.jl') 
      WARNING: Logging before InitGoogleLogging() is written to STDERR
      F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: (string_view(*next_partial).find_first_not_of(" \t\n\r")) == (string_view::npos) 
      *** Check failure stack trace: ***
      Aborted (core dumped)
      

      while with 0.14.1 this works fine:

      In [24]: from pyarrow import json 
          ...: import pyarrow.parquet as pq 
          ...:  
          ...: r = json.read_json('SampleRecord.jl')                                                                                                                                                                     
      
      In [25]: r                                                                                                                                                                                                         
      Out[25]: 
      pyarrow.Table
      _type: string
      provider_name: string
      arrival: timestamp[s]
      berthed: timestamp[s]
      berth: null
      cargoes: list<item: struct<movement: string, product: string, volume: string, volume_unit: string, buyer: null, seller: null>>
        child 0, item: struct<movement: string, product: string, volume: string, volume_unit: string, buyer: null, seller: null>
            child 0, movement: string
            child 1, product: string
            child 2, volume: string
            child 3, volume_unit: string
            child 4, buyer: null
            child 5, seller: null
      departure: timestamp[s]
      eta: null
      installation: null
      port_name: string
      next_zone: null
      reported_date: timestamp[s]
      shipping_agent: null
      vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight: null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: string, length: int64, mmsi: null, name: string, type: null, vessel_type: null>
        child 0, beam: null
        child 1, build_year: null
        child 2, call_sign: null
        child 3, dead_weight: null
        child 4, dwt: null
        child 5, flag_code: null
        child 6, flag_name: null
        child 7, gross_tonnage: null
        child 8, imo: string
        child 9, length: int64
        child 10, mmsi: null
        child 11, name: string
        child 12, type: null
        child 13, vessel_type: null
      
      In [26]: pa.__version__                                                                                                                                                                                            
      Out[26]: '0.14.1'
      

      cc apitrou bkietz

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m