I leave ChatGPT’s Advanced Voice Mode on while writing this article as an ambient AI companion. Occasionally, I’ll ask it to provide a synonym for an overused word, or some encouragement. Around half an hour in, the chatbot interrupts our silence and starts speaking to me in Spanish, unprompted. I giggle a bit and ask what’s going on. “Just a little switch up? Gotta keep things interesting,” says ChatGPT, now back in English.
While testing Advanced Voice Mode as part of the early alpha, my interactions with ChatGPT’s new audio feature were entertaining, messy, and surprisingly varied. Though, it’s worth noting that the features I had access to were only half of what OpenAI demonstrated when it launched the GPT-4o model in May. The vision aspect we saw in the livestreamed demo is now scheduled for a later release, and the enhanced Sky voice, which Her actor Scarlett Johanssen pushed back on, has been removed from Advanced Voice Mode and is still no longer an option for users.
So, what’s the current vibe? Right now, Advanced Voice Mode feels reminiscent of when the original text-based ChatGPT dropped, late in 2022. Sometimes it leads to unimpressive dead ends or devolves into empty AI platitudes. But other times the low-latency conversations click in a way that Apple’s Siri or Amazon’s Alexa never have for me, and I feel compelled to keep chatting out of enjoyment. It’s the kind of AI tool you’ll show your relatives during the holidays for a laugh.
OpenAI gave a few WIRED reporters access to the feature a week after the initial announcement, but pulled it the next morning, citing safety concerns. Two months later, OpenAI soft launched Advanced Voice Mode to a small group of users and released GPT-4o’s system card, a technical document that outlines red teaming efforts, what the company considers to be safety risks, and mitigation steps the company has taken to reduce harm.
Curious to give it a go yourself? Here’s what you need to know about the larger rollout of Advanced Voice Mode, and my first impressions of ChatGPT’s new voice feature to help you get started.
So, When’s the Full Roll Out?
OpenAI released an audio-only Advanced Voice Mode to some ChatGPT Plus users at the end of July, and the alpha group still seems relatively small. The company currently plans to enable it for all subscribers sometime this fall. Niko Felix, a spokesperson for OpenAI, shared no additional details when asked about the release timeline.
Screen and video sharing were a core part of the original demo, but they are not available in this alpha test. OpenAI still plans to add those aspects eventually, but it’s also not clear when that will actually happen.
If you’re a ChatGPT Plus subscriber, you’ll receive an email from OpenAI when the Advanced Voice Mode is available to you. After it’s on your account, you can switch between Standard and Advanced at the top of the app’s screen when ChatGPT’s voice mode is open. I was able to test the alpha version on an iPhone as well as a Galaxy Fold.
My First Impressions on ChatGPT’s Advanced Voice Mode
Within the very first hour of speaking with it, I learned that I love interrupting ChatGPT. It’s not how you would talk with a human, but having the new ability to cut off ChatGPT mid-sentence and request a different version of the output feels like a dynamic improvement and a stand-out feature.
Early adopters who were excited by the original demos may be frustrated getting access to a version of Advanced Voice Mode restricted with more guardrails than anticipated. For example, although generative AI singing was a key component of the launch demos, with whispered lullabies and multiple voices attempting to harmonize, AI serenades are currently absent from the alpha version.
Read the full article here