Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16942

S3A creating folder level delete markers

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.8.3, 3.2.1
    • None
    • fs/s3
    • None

    Description

      Using S3A URL scheme while writing out data from Spark to S3 is creating many folder level delete markers.

      Writing the same with S3 URL scheme, does not create any delete markers at all.

       

      Spark - 2.4.4

      Hadoop - 3.2.1

      EMR version - 6.0.0

      Write Mode - Append

      [hadoop@ip-192-0-161-212 ~]$ spark-shell
      Setting default log level to "WARN".
      To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
      20/03/27 07:37:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
      Spark context Web UI available at http://ip-192-0-161-212.ec2.internal:4040
      Spark context available as 'sc' (master = yarn, app id = application_1585294390030_0003).
      Spark session available as 'spark'.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
            /_/
               
      Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> val df = spark.sql("select 1 as a")
      df: org.apache.spark.sql.DataFrame = [a: int]
      
      scala> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3://my-bucket/tmp/vijayant/test/s3/")
                                                                                      
      scala> df.write.mode(org.apache.spark.sql.SaveMode.Append).save("s3a://my-bucket/tmp/vijayant/test/s3a/")
                                                                                      
      scala> 
      

      Getting delete markers from `s3` write

      aws s3api list-object-versions --bucket my-bucket --prefix tmp/vijayant/test/s3/
      {
          "Versions": [
              {
                  "LastModified": "2020-03-27T07:38:17.000Z", 
                  "VersionId": "V06OzeE7j221Tq7keSGj8bveCYyJFIcf", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3/_SUCCESS", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:38:16.000Z", 
                  "VersionId": "dLYtHDugLhFIdw2YHLFmoFOxXkm.21Wo", 
                  "ETag": "\"26e70a1e26c709e3e8498acd49cfaaa3-1\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3/part-00000-9d9a8925-f119-415d-b547-b742396e2ca7-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 384
              }
          ]
      } 
      

      Getting delete markers from `s3a` write

      aws s3api list-object-versions --bucket my-bucket --prefix tmp/vijayant/test/s3a/
      
      {
          "DeleteMarkers": [
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "NJWRZMcb_eYYwCJh_isX4H1Ox6W362Wb", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "F0h0mLcVVwkMtcHxd95Hj7BACL4Up_Q9", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": ".sBcE6cXeggekOnSgZ4n7pyCDHnsLERK", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "nzm39jiUPC4H0ZaS.5Shp0FYPnR8wNf9", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "BPM65R1HkZngPDYtDL3zPZYPw_G_m9Ic", 
                  "Key": "tmp/vijayant/test/s3a/", 
                  "LastModified": "2020-03-27T07:39:08.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "LJt8_MVDOiD4UdgUqEMycxjvtinJlTNt", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "RqunJTn8Od0PgFR4yu44PX4kL54k6EDv", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "4vY8cnqUI5VJAk3VfEt_VD_KEczo3bmY", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/", 
                  "LastModified": "2020-03-27T07:39:08.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "ln47YYy.yiE.k70cvqvfgYCEQoYFnKQW", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "5Bsrt7s1caM90mzGNgk0MsTU9q8UjTTA", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "pN3HzDfnmqIqrMwAL2jqKEBkvoHZALor", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "wg91poO1KXReXxvsZHzZXrHR1IgIX8t2", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "cv5Noykq3sMilQqJXAH3E.N7qAWnIBx7", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "LastModified": "2020-03-27T07:39:11.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "VersionId": "6xzt9SxlCUJaOLD8krkE3yXfQU14rErX", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "LastModified": "2020-03-27T07:39:09.000Z"
              }, 
              {
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "VersionId": "wGmJAo7x_gkLWAiHzxPGdPMVSus7Wcp1", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "LastModified": "2020-03-27T07:39:10.000Z"
              }
          ], 
          "Versions": [
              {
                  "LastModified": "2020-03-27T07:39:11.000Z", 
                  "VersionId": "2py_ZXKl7yh6fwhzksAx8Os1BriDJCBb", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_SUCCESS", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:08.000Z", 
                  "VersionId": "lDqTnLCqDYtjrOiY.V7E6AKTRQLKrqUT", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:10.000Z", 
                  "VersionId": "g.rGoTDdmrGrNjrLchvwz3jMmGePkgiD", 
                  "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 0
              }, 
              {
                  "LastModified": "2020-03-27T07:39:09.000Z", 
                  "VersionId": ".ZCpY2UW4hRlbLL87dFUJRuk021Hyq8p", 
                  "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/_temporary/0/_temporary/attempt_20200327073907_0001_m_000000_1/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": false, 
                  "Size": 384
              }, 
              {
                  "LastModified": "2020-03-27T07:39:10.000Z", 
                  "VersionId": "JSNjTDHSQqe9zSAV93bc6TXPuqA.vDJE", 
                  "ETag": "\"3def7238a0858c17c62d7045290175cf\"", 
                  "StorageClass": "STANDARD", 
                  "Key": "tmp/vijayant/test/s3a/part-00000-3923e1b1-406c-4202-b9a8-3bd7cb2d97b2-c000.snappy.parquet", 
                  "Owner": {
                      "DisplayName": "sysops+stage", 
                      "ID": "08939105f417dc74b1fa237e211185ff2d9f528d54b1380501de07bd0657b5e1"
                  }, 
                  "IsLatest": true, 
                  "Size": 384
              }
          ]
      }
      
      

      This in turn makes listing objects slow and we have even noticed timeouts due to too many delete markers.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Vijayant vijayant soni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment