Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for example, we don't write the same record twice in a VM failure.
Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a VM failure resulting in data duplication.
Correct fix is to do a Reshuffle to checkpoint unique IDs once they are generated, similar to how Java BQ sink operates.
Pablo, can you do an initial assessment here ?
I think this is a relatively small fix but I might be wrong.