[SOLR-6184] Replication fetchLatestIndex always failed, that will occur the recovery error. - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.6, 4.6.1
Fix Version/s: None
Component/s: SolrCloud
Labels:
- difficulty-medium
- impact-medium
Environment:

the index file size is more than 70G

Description

Usually the copy full index 70G need 20 minutes at least, 100M read/write network or disk r/w. If in the 20 minutes happen one hard commit, that means the copy full index snap pull will be failed, the temp folder will be removed because it is failed pull task.
In the production, update index will happen in every minute, redo pull task always failed because index always change.

And also always redo the pull it will occur the network and disk usage keep the high level.

For my suggestion, the fetchLatestIndex can be do again in some frequency. Don't need remove the tmp folder, and copy the largest index at first. Redo the fetchLatestIndex don't download the same biggest file again, only will copy the commit index just now, at last the task will be easy success.