Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13015

Dataload fails due to concurrency issue with test.jceks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.4.0
    • None
    • Infrastructure

    Description

      When doing dataload locally, it fails with this error:

      Traceback (most recent call last):
        File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in <module>
          if __name__ == "__main__": main()
        File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main
          os.remove(jceks_path)
      OSError: [Errno 2] No such file or directory: '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
      Background task Loading functional-query data (pid 501094) failed.
      

      testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, and TPC-DS in parallel, so this logic has race conditions:

        jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
        if os.path.exists(jceks_path):
          os.remove(jceks_path)

      I don't see a specific reason for this to be in bin/load-data.py. It should be moved somewhere else that doesn't run in parallel. One possible location is to add a step in testdata/bin/create-load-data.sh

      This was introduced in https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432

      Attachments

        Issue Links

          Activity

            People

              arawat Abhishek Rawat
              joemcdonnell Joe McDonnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: