Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21305

The BKM (best known methods) of using native BLAS to improvement ML/MLLIB performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Documentation, ML
    • None

    Description

      Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to improvement the performance.
      The methods to use native BLAS is important for the performance, sometimes (high opportunity) native BLAS even causes worse performance.
      For example, for the ALS recommendForAll method before SPARK 2.2 which uses BLAS gemm for matrix multiplication.
      If you only test the matrix multiplication performance of native BLAS gemm (like Intel MKL, and OpenBLAS) and netlib-java F2j BLAS gemm , the native BLAS is about 10X performance improvement. But if you test the Spark Job end-to-end performance, F2j is much faster than native BLAS, very interesting.

      I spend much time for this problem, and find we should not use native BLAS (like OpenBLAS and Intel MKL) which support multi-thread with no any setting. By default, this native BLAS will enable multi-thread, which will conflict with Spark executor. You can use multi-thread native BLAS, but it is better to disable multi-thread first.

      https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
      https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications

      I think we should add some comments in docs/ml-guilde.md for this first.

      Attachments

        Issue Links

          Activity

            People

              peng.meng@intel.com Peng Meng
              peng.meng@intel.com Peng Meng
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 504h
                  504h
                  Remaining:
                  Remaining Estimate - 504h
                  504h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified