OpenAI on Monday (May 13) unveiled the new generative Artificial Intelligence (gen AI) chatbot model GPT-4o; the 'o' called omni, refers to its capability to understand queries in any combination of text, images and audio and respond with the same mode.
The previous iteration, the ChatGPT Turbo model is capable of understanding queries in text and images and respond with text prompts and if required, it can read out the response.
The latest new chatbot GPT-4o is capable of performing tasks and respond like a personal assistant. Going by the demos of live interaction and real-time translation, OpenAI's latest multimodal chatbot looks way superior to the current crop of virtual assistants such as Google Assistant, Amazon's Alexa and Apple's Siri.
GPT-4o: Notable aspects of OpenAI's latest omnimodal chatbot
-- Voice interaction with the GPT-4o is similar to how humans communicate with each other. It takes as little as 232 milliseconds, with an average of 320 milliseconds for the GPT-4o to respond to a query, the same time taken by humans to respond after listening to another person.
-- GPT-4o is capable of adjusting emotional tone when speaking with humans. During the demo, it was able to seamlessly switch to natural human voice with different emotions. For every request, it instantly adjusted tones from normal voice to dramatic storytelling mode. It was even able to sing witty limericks and change to a cold robotic tone without any hassle. The live demo reminded one of the classic sci-fi movie 'her', where the protagonist Theodore Twombly (played by Joaquin Phoenix) falls in love with the virtual assistant Samantha (voiced by Scarlett Johansson).
The new voice (and video) mode is the best computer interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change. The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpfulSam Altman, CEO, OpenAI
-- GPT-4o can match GPT-4 Turbo's performance in terms of text prompts in English and coding. With non-English languages, it is said to be significantly better and faster than the Turbo model.
-- With Vision capability, it can read a person's emotional state based on facial expressions such as happy and depressed. Add to that, by looking at equations on paper, it is capable of offering a step-by-step guide to the user to solve the complex trigonometry problems.
--Similarly, it can even understand what application the user is coding by just looking at the software code on the desktop's screen. And, is capable of offering tips to improve the efficiency of the programme in real time.
-- Currently, GPT-4o supports 50 languages including Italian, Spanish, French, Kannada, Tamil, Telugu, Hindi, Gujarati, Marathi and more.
--For now, OpenAI is offering access to GPT-4o with just text and image capabilities. It is available on the free-tier of the ChatGPT (with 10-16 messages limit per three hours) and premium ChatGPT Plus (with 5X higher message limits).
The company said that the audio and video capabilities of GPT-4o still need testing. Initially, it will be made available to a small group of trusted partners in API soon.
Get the latest news on new launches, gadget reviews, apps, cybersecurity, and more on personal technology only on DH Tech.