Azure Speech To Text Phoneme Detection

Hello, I am working on a very niche speech detection app that Azure has been very helpful for but I still have some large hurdles to cross.

I would like to be able to detect a user sounding out individual phonemes. Right now, Azure’s STT can split up words into phonemes for you but it refuses to provide transcriptions of phonemes by themselves. For instance, Azure will happily translate audio of you saying “la” as phonemes /l/ and /a/, but if you exclusively make the “L” sound with no vowel, azure will not respond with any phoneme data and will continue waiting for more audio. Is there any way to force Azure STT responses to be as granular as possible? I would like to be able detect isolated phonemes even when they do not combine to become a word. I am interfacing with Azure through Unity FYI.

Thanks

Hello, I am working on a very niche speech detection app that Azure has been very helpful for but I still have some large hurdles to cross. I would like to be able to detect a user sounding out individual phonemes. Right now, Azure’s STT can split up words into phonemes for you but it refuses to provide transcriptions of phonemes by themselves. For instance, Azure will happily translate audio of you saying “la” as phonemes /l/ and /a/, but if you exclusively make the “L” sound with no vowel, azure will not respond with any phoneme data and will continue waiting for more audio. Is there any way to force Azure STT responses to be as granular as possible? I would like to be able detect isolated phonemes even when they do not combine to become a word. I am interfacing with Azure through Unity FYI.Thanks Read More

Cart

Cart