The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

Phat Do1, Matt Coler1, Jelske Dijkstra2, Esther Klabbers3
1Language, Technology and Culture Department, Campus Fryslân, University of Groningen (the Netherlands)
2Fryske Akademy/Mercator Research Centre (the Netherlands)
3ReadSpeaker (the Netherlands)


This is the companion webpage for our paper "The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech", which has been accepted for presentation at INTERSPEECH 2023.

Below we present some of the Frisian audio samples synthesized in our experiments. We used a large number of samples (700 samples in total: 100 randomly-selected utterances * 7 systems) for more statistical robustness, but for practical purposes, here we only present 140 (20 * 7) randomly-selected samples for reference. In the order of appearance in the paper's charts:

  • Resynthesized (resynth) - Vocoded from ground-truth spectrogram (should be closest to the real recording)
  • Phoneme input - ground-truth dictionary (ph-gt) - Sample from the TTS system with phone labels as input and with a real Frisian dictionary
  • Phoneme input - multilingual g2p (ph-g2p) - Same as above, but used multilingual G2P for transcriptions
  • Phoneme input - phone recognition (ph-rec) - Same as above, but used universal phone recognition to build a makeshift dictionary
  • Feature input - ground-truth dictionary (ft-gt) - Sample from the TTS system with articulatory features as input and with a real Frisian dictionary
  • Feature input - multilingual g2p (ft-g2p) - Same as above, but used multilingual G2P for transcriptions
  • Feature input - phone recognition (ft-rec) - Same as above, but used universal phone recognition to build a makeshift dictionary

  • # Text Resynthesized
    resynth
    Phone labels input (ph-) Articulatory features input (ft-)
    ph-gt ph-g2p ph-rec ft-gt ft-g2p ft-rec
    1 Pas de oare deis wie it safier.
    2 Se hie it al tocht, se kin Lize gewoan priuwe.
    3 De stêd doarde ik net mear yn. Stel je foar dat ik Aldert nochris tsjin it liif rinne soe. Wêr wenne er eins? Sels dat wist ik net.
    4 Ik woe wolris sjen wêr't dat kring wenne.
    5 Mar der wie healwei de middei net in soad folk yn de saak, gjin kop taalde nei syn boeken.
    6 De kompetysje dan? bringt er úteinlik yn. Nee man. It slaan. Dát mis ik. Sláán.
    7 Dokter, sikehûs, hechtsje, it hiele pakket.
    8 Wylst menear Kuperus oan syn didaktise kwaliteiten wurke troch út en troch in ekstra sigaret op te stekken, fermakken syn learlingen har yn Itaalje.
    9 In skoft wurdt der neat sein. Ien fan de hûnen blaft de stilte fuort, Bonita wrinzget.
    10 Bûten is it stil wurden. De beammen hawwe harren spegeling ferlern, de grêften bochtsje as tsjustere paden tusken de bebouwing troch.
    11 Fuortdaalks fielde ik my skuldich, want ík hie harren dizze reis kado dien.
    12 Sis dat wol. Dat rinnen fan dy, hoe sit dat, man?
    13 Wendy reagearret net.
    14 It is in eangst dy'tst earder field hast. Wér tinkst werom oan de rivier, oan de fiskersboat fan dyn heit.
    15 Wêr is se ferstoarn? wol Yme de spanning brekke.
    16 Wy sille it sjen.
    17 Daalks kom ik wer, sei ik ekstra lûd, om fuort myn oandacht nei Inge te ferpleatsen, dy't noch altyd ferwoeden besocht om Hindrik syn riem los te krijen.
    18 in situaasje dêr't er him by dellei.
    19 Mar Kor en ik sieten yn dy tunnel, wy tochten werklik dat er ús bedondere.
    20 Myn oare konklúzjes dielde ik net, want ik wie der net wis fan wêr't heit syn winsk presys wei kaam.