A segment starts from a non-zero sequence number (UNREGISTERED_PRODUCER).
A gap exists between or within segments (MISSING).
Data within a segment is corrupted (CORRUPT).
Producers have produced duplicate messages, which is expected due to retries (DUPLICATE).
Different behaviors before and after End Of Push (EOP)#
End Of Push (EOP) is a control message in the Kafka topic sent once per partition at bulk load end, after all data producers come online.
When pushing a new store version, if DIV detects fatal data validation issues (including UNREGISTERED_PRODUCER, MISSING, and CORRUPT) while consuming batch push data before EOP, the ingestion task errors and aborts.
If issues appear after receiving EOP, the ingestion task continues, but DIV logs a warning about data integrity issues.
Performs validation using drainer DIV (all topics)
Persists data to storage engine
Checkpoints offsets to disk.
Consumer is always ahead of drainer
The consumer thread reads from Kafka faster than the drainer can persist to disk. There's buffering in between.
Kafka producer buffers (for leaders)
StoreBufferService queue etc.
Time T1: Consumer validates message at offset 1000 → Consumer DIV state = offset 1000
Time T2: Consumer validates message at offset 2000 → Consumer DIV state = offset 2000
Time T3: Drainer persists message at offset 1000 → Drainer DIV state = offset 1000
If we only had one consumer DIV, we couldn't checkpoint the correct state to disk.
DIV State Must Match Persisted Data
The DIV checkpoint saved to disk must exactly match what's actually persisted in the storage engine. Since the drainer is responsible for persistence, only the Drainer DIV state can be safely checkpointed.
Leader Double Validation
Consumer DIV (in consumer thread): Validates RT messages before producing to VT
Drainer DIV (in drainer thread): Validates same messages again before persisting (unnecessary)
OFFLINE -> STANDBY: DIV state restoration. The DIV state is restored from the persisted OffsetRecord (drainerDiv). In STANDBY mode, the DIV continues to validate incoming messages against the restored state and keeps it updated.
STANDBY -> OFFLINE: The DIV state is cleared immediately without being persisted to disk. Any DIV state accumulated since the last checkpoint is lost The system relies on periodic checkpoints during consumption, not on-demand checkpoints during state transitions. This design choice prioritizes fast un-subscription over preserving the absolute latest DIV state.
STANDBY -> LEADER: Wipes all DIV state for the partition in the consumer DIV. Copies the producer states from the drainer's DIV validator to the consumer DIV.
LEADER -> STANDBY: The drainer DIV maintains producer states that have been validated and persisted. If this replica becomes leader again, it will clone the drainer DIV state.