Even as speech-to-text transcription seems to have made subtitling far easier and convenient, it fails to hit the bull's eye on matters of censorship.
Speech to text is mostly an auto-generated mechanism for most video streams and turns into a problem when transcriptions for children's shows go awry as a recent pre-print study on "Inadvertent Unsafe Transcription Of Kids’ Content On YouTube" shows.
Often, speech to text converting software have slip-ups where similar sounding words may turn into inappropriate content. For eg. beach might turn into "bitch" on screen or crab turns into crap and fluffy becomes f*****g in ASR systems, a major problem for children's shows.
As the study also points out, most software are designed to censor expletives and adult content (including words) from the source. However in this case, the ASR software may not detect anything since the source file is clean and it is in transcript that the adult language has crept in.
Rochester institute of technology's Ashiqur KhudaBukhsh, assistant professor Sumeet Kumar of Indian School of Business in Hyderabad and Krithika Ramesh of Manipal University, who have authored this paper, dub this inadvertent production of text content inappropriate for kids while transcribing videos "inappropriate content hallucination."
After studying over 7,000 videos on YouTube Kids shows, the study released a set of 652 audio inputs where ASR software have hallucinated taboo words.
Google Speech-to-text and amazon transcribe are two of the main ASR systems.
"These patterns tell us that whenever you have a machine language model trying to predict something, the predictions are influenced on what kind of data it is trained on. Most likely it is possible they don’t have enough examples of kid speech or baby talk in the data they are trained on,” KhudaBukhsh told The Indian Express.
“See ‘I love porn’ is a more likely sentence than ‘I love corn’ when two adults have a conversation. One of the reasons some of these adult words are creeping into transcription is because maybe the ASR are trained more on speech examples coming from adults,” he told the publication.
The study is crucial because it flags the fallibility of AI systems and how even widely watched shows that otherwise adhere to safety guidelines for children may fall into this trap unintentionally.
Check out the latest DH videos here: