ADVERTISEMENT
Google’s AI podcast tool is dazzling. But is it useful?The male and female AI-generated hosts not only have sonorous, FM-radio voices but punctuate their conversations with 'ums,' pauses and catchy phrases like 'get this.'
Bloomberg Opinion
Last Updated IST
<div class="paragraphs"><p>Google logo.</p></div>

Google logo.

Credit: Reuters File Photo

By Parmy Olson

ADVERTISEMENT

Google is having its very own ChatGPT moment.

Technologists, scientists and OpenAI founder Sam Altman have been praising a feature added in September by NotebookLM, a free online research tool that Alphabet Inc.’s core business released last year. Uploading documents to the site allows users to answer questions about their content or synthesise it into summaries, briefing notes and more. Now it can also turn that content into an eerily human-sounding podcast. The male and female AI-generated hosts not only have sonorous, FM-radio voices but punctuate their conversations with “ums,” pauses and catchy phrases like “get this.” The banter sounds so seamless that you’d be forgiven for thinking the conversation was between people.

I’ve used the tool to generate a 15-minute podcast about a 208-page presentation, which would have taken an hour or more to read, while others have used it to generate deep dives into research papers or their own diaries. NotebookLM has inspired a burst of viral experimentation similar to the kind that first met ChatGPT.

The system runs on Google’s flagship AI model Gemini 1.5, which also powers the “AI overviews” that are now replacing the top results of many Google searches; but it also has its own secret sauce to make the voices sound so human. “There’s some new audio technology in there that is, I don’t think, fully public,” Steven Johnson, Google’s editorial director of NotebookLM, tells me. “It’s the most realistic conversation that a computer has ever generated.” He added that there had been a “huge spike” in NotebookLM’s usage since it added the podcast-generator.

Commentators have called the feature mind-blowing, while Andrej Karpathy, a co-founder of OpenAI and former head of AI at Tesla Inc., said it was “now my favorite podcast.” Presumably, this is how Karpathy consumes much of his content now. That indeed may be where the real disruption potential for this technology lies – not in replacing podcasters, but in adding a new way to assimilate information. Wireless earbud shipments will grow 11 per cent this year and 16 per cent in 2025, according to market-research firm Canalys, suggesting more people might gravitate toward that method too.

Credit: Bloomberg Photo

My own take: The voices are extraordinary and display a level of realism above any other AI-generated audio I’ve heard before. But the user interface for NotebookLM is infuriating to navigate, and after listening to several of its AI podcasts I also found it difficult to pay full attention to some of the conversations.

Perhaps there’s an intangible connection that humans have through voice that naturally keeps us attentive. During my early years in radio, a veteran told me that the secret to great news reading wasn’t any vocal inflection, but to simply pay attention to what you were reading. For some reason, listeners found themselves more engaged. (Try it yourself when reading something aloud.) It’s hard to see how a computer could replicate that same phenomenon.

The bigger question for Google is whether it will turn its magical feature into something useful for business. The company has a history of failing to execute on its own innovations. Its researchers, for instance, famously invented a key algorithm called the Transformer — the T in ChatGPT – but OpenAI capitalised on the tech. Perhaps we should expect as much from a conglomerate cobbled together by acquisitions like DeepMind, Android, YouTube and DoubleClick, and which has been hamstrung by the innovator’s dilemma: Make AI searches too good and Google risks cannibalising its lucrative search business.

The “wow factor” in AI can also lead to hype and overspending, which means investors should be cautious about novel hits. Wall Street is already becoming wary of the gap between the awe-inspiring experiences people first had with ChatGPT and generative AI’s business utility.

Google will eventually add other voices to its podcast generator, and Johnson tells me the company will eventually sell a premium version, including one aimed at businesses. In that sense, the audio overviews may simply act as a neat marketing trick for NotebookLM, whose utility is far more obvious: a straightforward tool for using Google’s AI model on your own documents and data. That fine-tuning process, known as RAG (or Retrieval-Augmented Generation) in the industry is typically more costly and complex when carried out as part of an official subscription to Google’s Gemini or other AI models.

If lifelike AI voices get more people using NotebookLM and Gemini, Google will have turned its magic into revenue. But businesses are still grappling with the true return on investment of generative AI, and one of the field’s biggest skeptics, Daron Acemoglu, just won a Nobel prize for economics, lending credibility to looming questions about AI’s real utility. For Google, that spells an uphill battle.

ADVERTISEMENT
(Published 17 October 2024, 11:44 IST)