In each row we present the source speaker sample, the conversion to the target speaker and the same text as recorded by the target speaker.