Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44111

Prepare Apache Spark 4.0.0

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Build

    Description

      For now, this issue aims to collect ideas for planning Apache Spark 4.0.0.

      We will add more items which will be excluded from Apache Spark 3.5.0 (Feature Freeze: July 16th, 2023).

      Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3)
      Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8)
      Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x)
      Spark 4: 2024.06 (4.0.0, NEW)
      

      Attachments

        Issue Links

          1.
          Drop mesos support Sub-task Resolved Sean R. Owen
          2.
          Drop K8s v1.25 and lower version support Sub-task Resolved Dongjoon Hyun
          3.
          Drop K8s v1.26 Support Sub-task Resolved Dongjoon Hyun
          4.
          Remove shim classes for Hive prior 2.0.0 Sub-task Resolved Cheng Pan
          5.
          Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight` Sub-task Resolved Dongjoon Hyun
          6.
          Upgrade Scala to 2.13.12 Sub-task Resolved Yang Jie
          7.
          Upgrade Scala to 2.13.13 Sub-task Resolved BingKun Pan
          8.
          Upgrade Scala to 2.13.14 Sub-task Open BingKun Pan
          9.
          Enable spark.shuffle.service.removeShuffle by default Sub-task Resolved Dongjoon Hyun
          10.
          Enable spark.eventLog.compress by default Sub-task Resolved Dongjoon Hyun
          11.
          Enable spark.eventLog.rolling.enabled by default Sub-task Resolved Dongjoon Hyun
          12.
          Enable `spark.metrics.appStatusSource.enabled` by default Sub-task Resolved Dongjoon Hyun
          13.
          Update `spark.speculation.multiplier` to 3 and `spark.speculation.quantile` to 0.9 Sub-task Resolved Dongjoon Hyun
          14.
          Deprecate spark.sql.parser.escapedStringLiterals Sub-task Resolved Max Gekk
          15.
          Change default of spark.sql.legacy.timeParserPolicy from EXCEPTION to CORRECTED Sub-task Resolved Serge Rielau
          16.
          Make EventLoggingListenerSuite independent from spark.eventLog.compress conf Sub-task Resolved Dongjoon Hyun
          17.
          Fix EventLogFileWriters to handle `none` codec case Sub-task Resolved Dongjoon Hyun
          18.
          Upgrade `Volcano` to 1.8.0 Sub-task Resolved Dongjoon Hyun
          19.
          Upgrade `Volcano` to 1.8.1 Sub-task Resolved Dongjoon Hyun
          20.
          Upgrade `Volcano` to 1.8.2 Sub-task Resolved Dongjoon Hyun
          21.
          Migrate antlr4 from 4.9 to 4.10+ Sub-task Resolved Yang Jie
          22.
          Upgrade Python to 3.11 in Maven builds Sub-task Resolved Hyukjin Kwon
          23.
          Support Python 3.12 Sub-task Resolved Dongjoon Hyun
          24.
          Remove pinned version of torch for Python 3.12 support Sub-task Resolved Hyukjin Kwon
          25.
          Handle PEP-632 by checking `ModuleNotFoundError` on `setuptools` in Python 3.12 Sub-task Resolved Dongjoon Hyun
          26.
          `mypy` should have `--python-executable` parameter Sub-task Resolved Dongjoon Hyun
          27.
          Upgrade Pandas to 2.2.0 Sub-task Resolved Haejoon Lee
          28.
          Remove `distutils` usage Sub-task Resolved Dongjoon Hyun
          29.
          Remove deprecated Hadoop-2 `LocatedFileStatus` constructor Sub-task Resolved Dongjoon Hyun
          30.
          Support AWS_ENDPOINT_URL env variable Sub-task Resolved Dongjoon Hyun
          31.
          Improve InMemoryFileIndex to use FileSystem.listFiles API Sub-task Resolved Dongjoon Hyun
          32.
          Change RocksDB as default shuffle service db backend Sub-task Resolved Jia Fan
          33.
          Eliminate unnecessary reflection invocation in Hive shim classes Sub-task Resolved Cheng Pan
          34.
          Upgrade kubernetes-client to 6.9.0 for K8s 1.28 Sub-task Resolved Dongjoon Hyun
          35.
          Upgrade `kubernetes-client` to 6.9.1 Sub-task Resolved Dongjoon Hyun
          36.
          Upgrade kubernetes-client to 6.10.0 for K8s v1.29.0 Sub-task Resolved Bjørn Jørgensen
          37.
          Upgrade kubernetes-client to 6.11.0 Sub-task Resolved Bjørn Jørgensen
          38.
          Upgrade `kubernetes-client` to 6.12.0 Sub-task Resolved Dongjoon Hyun
          39.
          Upgrade kubernetes-client to 6.12.1 Sub-task Resolved Bjørn Jørgensen
          40.
          Use `built-in` storage classes in PVTestsSuite Sub-task Resolved Dongjoon Hyun
          41.
          Create and use a K8s test tag for `PersistentVolume` Sub-task Resolved Dongjoon Hyun
          42.
          Use the latest minikube in K8s IT Sub-task Resolved Dongjoon Hyun
          43.
          Remove threeten-extra exclusion in enforceBytecodeVersion rule Sub-task Resolved Dongjoon Hyun
          44.
          Update `YuniKorn` docs with v1.4 Sub-task Resolved Dongjoon Hyun
          45.
          Update `YuniKorn` docs with v1.5 Sub-task Resolved Dongjoon Hyun
          46.
          Support APP_ID and EXECUTOR_ID placeholder in labels Sub-task Resolved Xi Chen
          47.
          Upgrade Apache ORC to 2.0 Sub-task Resolved Dongjoon Hyun
          48.
          Support ORC Brotli codec Sub-task Resolved dzcxzl
          49.
          Fix ORC tests to be independent from default compression Sub-task Resolved Dongjoon Hyun
          50.
          Use `zstd` as the default ORC compression Sub-task Resolved Dongjoon Hyun
          51.
          Use the default ORC compression in OrcReadBenchmark Sub-task Resolved Dongjoon Hyun
          52.
          Improve `TPCDSQueryBenchmark` to support other file formats Sub-task Resolved Dongjoon Hyun
          53.
          Use default ORC compression in data source benchmarks Sub-task Resolved Dongjoon Hyun
          54.
          Upgrade `Parquet` to 1.14.0 Sub-task Open Fokko Driesprong
          55.
          Upgrade Avro to 1.11.3 Sub-task Resolved Dongjoon Hyun
          56.
          Add `VolumeSuite` to K8s IT Sub-task Resolved Dongjoon Hyun
          57.
          Enable `spark.ui.prometheus.enabled` by default Sub-task Resolved Dongjoon Hyun
          58.
          Document a few missed `spark.ui.*` configs to `Configuration` page Sub-task Resolved Dongjoon Hyun
          59.
          Upgrade Maven to 3.9.6 for MNG-7913 Sub-task Resolved Dongjoon Hyun
          60.
          Use Scala 2.13 Spark distribution in HiveExternalCatalogVersionsSuite Sub-task Resolved Dongjoon Hyun
          61.
          Add Apple Silicon Maven build test to GitHub Action CI Sub-task Resolved Dongjoon Hyun
          62.
          Add Daily Apple Silicon Github Action Job (Java/Scala) Sub-task Resolved Hyukjin Kwon
          63.
          Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows Sub-task Resolved Hyukjin Kwon
          64.
          Fix docker-integration-tests on Apple Chips Sub-task Resolved Kent Yao
          65.
          Attach codec extension to avro datasource files Sub-task Resolved Kent Yao
          66.
          Benchmarking Avro with Compression Codecs Sub-task Resolved Kent Yao
          67.
          Codec xz and zstandard support compression level for avro files Sub-task Resolved Kent Yao
          68.
          Disable unsupported `ExtendedLevelDBTest` on `MacOS/aarch64` Sub-task Resolved Yang Jie
          69.
          Change to use bcprov/bcpkix-jdk18on for test Sub-task Resolved Yang Jie
          70.
          Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 Sub-task Resolved Yang Jie
          71.
          Add `bcpkix-jdk18on` test dependencies to `hive` module for Hadoop 3.4.0 Sub-task Resolved Dongjoon Hyun
          72.
          Upgrade `bouncycastle` to 1.78 Sub-task Resolved Dongjoon Hyun
          73.
          Use Hadoop 3.3.5 winutils in AppVeyor build Sub-task Resolved BingKun Pan
          74.
          Upgrade Hadoop to 3.3.6 Sub-task Resolved Dongjoon Hyun
          75.
          Fix `IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+ Sub-task Resolved Dongjoon Hyun
          76.
          Exclude `logback` dependency from SBT like Maven Sub-task Resolved Dongjoon Hyun
          77.
          Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` Sub-task Resolved Dongjoon Hyun
          78.
          Upgrade Hadoop to 3.4.0 Sub-task Resolved Dongjoon Hyun
          79.
          Set spark.hadoop.fs.s3a.connection.establish.timeout to 30s Sub-task Resolved Dongjoon Hyun
          80.
          Regenerate benchmark results Sub-task Resolved Dongjoon Hyun
          81.
          Use hadoop 3.4.0 in some docs Sub-task Resolved BingKun Pan
          82.
          Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor Sub-task Resolved Hyukjin Kwon
          83.
          Use R 4.3.3 in `windows` R GitHub Action job Sub-task Resolved Dongjoon Hyun
          84.
          Use `Ubuntu 22.04` in `dev/infra/Dockerfile` Sub-task Resolved Dongjoon Hyun
          85.
          Support MergeInto in DataFrameWriterV2 Sub-task Resolved Huaxin Gao
          86.
          Upgrade Arrow to 14.0.0 Sub-task Resolved Yang Jie
          87.
          Upgrade Arrow to 14.0.1 Sub-task Resolved Dongjoon Hyun
          88.
          Upgrade Arrow to 14.0.2 Sub-task Resolved Dongjoon Hyun
          89.
          Upgrade pyarrow to 14 Sub-task Resolved Ruifeng Zheng
          90.
          Upgrade Arrow to 15.0.0 Sub-task Resolved Yang Jie
          91.
          Upgrade Arrow to 15.0.2 Sub-task Resolved BingKun Pan
          92.
          Upgrade pyarrow to 15.0.0 Sub-task Resolved Ruifeng Zheng
          93.
          Upgrade `Arrow` to 16.0.0 Sub-task Resolved dzcxzl
          94.
          Upgrade the minimum version of PyArrow to 10.0.0 Sub-task Resolved Haejoon Lee
          95.
          Upgrade the minimum version of `arrow` R package to 10.0.0 Sub-task Resolved Dongjoon Hyun
          96.
          Move `o.a.s.variant` to `o.a.s.types.variant` Sub-task Resolved Dongjoon Hyun
          97.
          Remove Spark 3.0~3.2 pyspark/version.py workaround from release scripts Sub-task Resolved Dongjoon Hyun
          98.
          Add `slf4j-api` jar to the class path first before the others of `jars` directory Sub-task Resolved Dongjoon Hyun
          99.
          Use Java 21 instead of 21-jre in K8s Dockerfile Sub-task Resolved Dongjoon Hyun
          100.
          Make Spark build with -release instead of -target Sub-task Resolved Yang Jie
          101.
          Use `HiveConf.getConfVars` or Hive conf names directly Sub-task Resolved Dongjoon Hyun
          102.
          Upgrade hive-service-rpc 4.0.0 Sub-task Resolved Cheng Pan
          103.
          Upgrade Kafka to 3.7.0 Sub-task Resolved BingKun Pan
          104.
          Fix `GenerateMIMAIgnore.isPackagePrivateModule` to return false instead of failing Sub-task Resolved Dongjoon Hyun
          105.
          Remove redundant rules from `MimaExcludes` Sub-task Resolved Dongjoon Hyun
          106.
          Skip deleting pod from k8s if the pod does not exists Sub-task Resolved leesf
          107.
          Run `ANSI` SQL CI twice per day Sub-task Resolved Dongjoon Hyun
          108.
          Use ANSI SQL mode by default Sub-task Resolved Dongjoon Hyun
          109.
          Switch ANSI SQL CI job to NON-ANSI SQL CI job Sub-task Resolved Dongjoon Hyun
          110.
          Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` Sub-task Resolved Dongjoon Hyun
          111.
          Fix a bug in try_divide function when with decimals Sub-task Resolved Gengliang Wang
          112.
          Regenerate benchmark results after turning ANSI on Sub-task Resolved Kent Yao
          113.
          Remove install_scala from build/mvn Sub-task Resolved Cheng Pan
          114.
          Use Hive tables explicitly for Hive table capability tests Sub-task Resolved Dongjoon Hyun
          115.
          Parameterize max limits of `spark.sql.test.randomDataGenerator` Sub-task Resolved Dongjoon Hyun
          116.
          Fix StreamingQueryHashPartitionVerifySuite to update golden files correctly Sub-task Resolved Dongjoon Hyun
          117.
          Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default Sub-task Resolved Dongjoon Hyun
          118.
          Enable `spark.stage.ignoreDecommissionFetchFailure` by default Sub-task Resolved Dongjoon Hyun
          119.
          Introduces a universal BinaryFormatter to make binary output consistent Sub-task Resolved Kent Yao
          120.
          Support SPARK_SQL_LEGACY_CREATE_HIVE_TABLE env variable Sub-task Resolved Dongjoon Hyun
          121.
          Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules Sub-task Resolved Dongjoon Hyun
          122.
          Disable a flaky `SparkSessionE2ESuite.interrupt tag` test Sub-task Resolved Dongjoon Hyun
          123.
          `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` Sub-task Resolved Dongjoon Hyun
          124.
          Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` Sub-task Reopened Unassigned
          125.
          Switch `spark.history.store.serializer` to use `PROTOBUF` by default Sub-task In Progress Dongjoon Hyun
          126.
          Use Magic Committer for all S3 buckets by default Sub-task In Progress Dongjoon Hyun
          127.
          Support Hive 4.0 metastore Sub-task Open Unassigned
          128.
          Spark to support S3 Express One Zone Storage Sub-task Open Unassigned
          129.
          Enable `spark.authenticate` by default in K8s environment Sub-task Open Unassigned
          130.
          Remove/Reduce usage of TypeTag in public APIs Sub-task Open Unassigned
          131.
          Upgrade `pyarrow` to 16.0.0 in GitHub Action CI Sub-task Open Unassigned
          132.
          TPCDSBenchmark fails with divide by zero in q90 for ANSI Sub-task Open Unassigned
          133.
          Re-enable `SparkSessionE2ESuite.interrupt tag` Sub-task Open Unassigned
          134.
          Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` Sub-task Resolved Unassigned
          135.
          Add toArrow() DataFrame method to PySpark Sub-task Resolved Ian Cook

          Activity

            People

              Unassigned Unassigned
              dongjoon Dongjoon Hyun
              Votes:
              2 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

                Created:
                Updated: