When a cluster is passive (receiving edits only via replication) in a cyclic replication setup of 2 clusters, OldWALs size keeps on growing. On analysing, we observed the following behaviour.
- New entry is added to WAL (Edit replicated from other cluster).
- ReplicationSourceWALReaderThread(RSWALRT) reads and applies the configured filters (due to cyclic replication setup, ClusterMarkingEntryFilter discards new entry from other cluster).
- Entry is null, RSWALRT neither updates the batch stats (WALEntryBatch.lastWalPosition) nor puts it in the entryBatchQueue.
- ReplicationSource thread is blocked in entryBachQueue.take().
- So ReplicationSource#updateLogPosition has never invoked and WAL file is never cleared from ReplicationQueue.
- Hence LogCleaner on the master, doesn't deletes the oldWAL files from hadoop.
NOTE: When a new edit is added via hbase-client, ReplicationSource thread process and clears the oldWAL files from replication queues and hence master cleans up the WALs
Please provide us a solution