Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16670

Couldn't restore a backed up config set from S3

    XMLWordPrintableJSON

Details

    Description

       
      Solr 9.1.x doesn't currently allow me to make a full restore of a backup where the data and config set are stored on a S3 bucket. The error I have received each run is "The specified key does not exist". Additionally, the full message is:
       

      An AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=2C6] [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key does not exist.] 

       
      After investigating the problem further, I have found that the path used to control whether it's a directory or not in the isDirectory method makes the `S3Client.headObject` method panic. On line 324, the path pointing to a file is transformed into a path leading to a slash. When a path, for example, is "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json", `sanitizedDirPath` adds a slash "/" character to the end of the path as "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json/". Although I'm able to restore the backup if the cluster already has the config schema definition in the zk, I cannot restore the backed up config schema files while creating an empty cluster due to this error.
        
      For the sake of this question, here I am describing the other parts;
       
      Backup definition:

        <backup>
          <repository name="s3-repo" class="org.apache.solr.s3.S3BackupRepository" default="false">
            <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
            <str name="s3.region">us-east-2</str>
          </repository>
        </backup> 

       
      The backup folder structure on S3:
       

      .
      └── bucket-name
          └── path1
              └── path2
                  └── backup-name
                      └── collection-name
                          ├── backup_0.properties
                          ├── index ...
                          ├── shard_backup_metadata
                          │   └── md_shard1_0.json
                          └── zk_backup_0
                              ├── collection_state.json
                              └── configs
                                  └── config-set-v1
                                      ├── configoverlay.json
                                      ├── solrconfig.xml
                                      ├── stopwords.txt
                                      └── synonyms.txt 

       

       
      The cURL request I use for restore:

      curl -i -X POST \
         -H "Content-Type:application/json" \
         -d \
      '{
        "restore-collection": {
          "name": "backup-name",
          "collection": "collection-name-restored",
          "location": "path1/path2/"
          "repository": "s3-pro",
        }
      }' \
       'http://localhost:8983/api/c' 

      This is the original question providing the same description.
       

       

      Attachments

        Issue Links

          Activity

            People

              houston Houston Putman
              ozlerhakan Hakan Özler
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m