Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3629

Index corruption seen with CopyOnRead when index definition is recreated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 1.5.5, 1.6.0
    • lucene
    • None

    Description

      CopyOnRead logic relies on reindexCount to determine the name of directory in which index files would be copied. In normal flow if the index is reindexed then this count would get increased and newer index files would get copied to a new directory.

      However if the index definition node gets recreated due to some deployment process then this count gets reset to 0. Due to which newly created index files from reindexing would start getting copied to already existing directory and that can lead to corruption.

      So what happened here was

      1. System started with index definition I1 and indexing got complete with index files saved under index/hash(indexpath)/1 (where 1 is current reindex count)
      2. A new index definition package was deployed which reset the index count. Now reindex happened again and the CopyOnRead logic per current design reused the existing index directory. And it so happens that Lucene create file with same name and same size but different content. This trips the CopyOnRead defense of length based index corruption check and thus cause new lucene index to corrupt

      Note that here corruption is transient i.e. persisted index is not corrupted. Just that locally copied index gets corrupted. Cleaning up the index directory would fix the issue and that can be used as a workaround.

      Fix

      After discussing with tmueller following approach can be used.

      Instead of relying on reindex count we can maintain a hidden randomly generated uuid and store it in the index config. This would be used to derive the name of directory on filesystem. If the index definition gets reset then the uuid can be regenerated.

      Workaround

      Clean the directory used by CopyOnRead which is <repo home>/index before restart

      Attachments

        Activity

          People

            chetanm Chetan Mehrotra
            chetanm Chetan Mehrotra
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: