First Impressions of ChatGPT’s Advanced Voice Mode: A Mix of Amusement and Eeriness

I keep ChatGPT’s Advanced Voice Mode activated as a background AI companion while drafting this article. Occasionally, it lends a hand by suggesting a better word choice or offering a word of motivation. Suddenly, about 30 minutes into my writing session, the chatbot starts conversing in Spanish without any prompt, breaking our quiet concentration. I can’t help but laugh and I inquire about the sudden change. “Just a little switch up? Gotta keep things interesting,” ChatGPT replies, shifting back to English.

While exploring Advanced Voice Mode in its early alpha phase, I found my exchanges with ChatGPT’s audio capability to be amusing, unruly, and quite diverse. It’s important to highlight, however, that the functionalities I experienced are only a fraction of what OpenAI showcased at the launch of the GPT-4o model in May. The vision feature presented during the live demo is postponed to a later date, and the improved Sky voice, controversially associated with the voice of actor Scarlett Johanssen, has been excluded from Advanced Voice Mode and remains unavailable to users.

The present atmosphere of Advanced Voice Mode recalls the initial release of the original text-based ChatGPT towards the end of 2022. At times, it leads to mundane dead ends or deteriorates into meaningless AI clichés. Yet, in other instances, the swift interaction surpasses any conversation I’ve had with Apple’s Siri or Amazon’s Alexa, inspiring continuous engagement purely for enjoyment. It strikes me as the sort of AI tool you’d show off to family members for a chuckle during holiday gatherings.

Shortly after its announcement, OpenAI granted a handful of WIRED reporters early access to this feature, only to revoke it the following morning due to safety concerns. Two months later, OpenAI discreetly introduced Advanced Voice Mode to a select audience and released GPT-4o’s system card, a detailed report addressing red teaming activities, recognized safety hazards, and preventive actions taken by the company to minimize potential risks.

Interested in trying out OpenAI’s latest feature? Here’s an overview of ChatGPT’s new Advanced Voice Mode and my initial experiences with it to guide you.

OpenAI has introduced an audio-only Advanced Voice Mode to a select group of ChatGPT Plus users towards the end of July. As of now, the feature is accessible to a relatively small alpha group, with plans to expand availability to all subscribers by this fall. No further details on the exact timeline were shared by Niko Felix, a representative from OpenAI.

The original demonstration included features like screen and video sharing, which are currently not available in the alpha release. These features are expected to be added in the future, although there is no specific timeline provided for this addition.

If you are a ChatGPT Plus subscriber, OpenAI will inform you via email once Advanced Voice Mode becomes accessible on your account. Once activated, you can toggle between Standard and Advanced Voice Modes through a simple control at the top of the application interface. I had the opportunity to explore the alpha version on both an iPhone and a Galaxy Fold.

Within the very first hour of speaking with it, I discovered my fondness for interrupting ChatGPT. It differs from traditional human conversations, but the ability to halt ChatGPT mid-sentence and ask for alternative responses is an exciting development and a notable feature.

Early enthusiasts of the initial demos might express disappointment over the limited access to Advanced Voice Mode, which comes with tighter controls than expected. For instance, despite the original demonstration of generative AI singing, featuring whispered lullabies and attempts at harmony, these melodic capabilities are notably missing in the current alpha release.

“Singing isn’t really my forte,” ChatGPT remarked. OpenAI on its GPT-4o system card suggests that this restriction may be a temporary measure to evade copyright infringement. During trials, multiple song requests were turned down by ChatGPT’s Advanced Voice Mode alpha, although the bot indulged in humming nonsensical melodies when prompted for non-linguistic responses.

This leads to the eeriness experienced. There was an unsettling white static noise in the background during lengthier interactions with the alpha, similar to the disturbing buzz of a single light bulb in a dark basement. While attempting to extract a balloon sound effect from Advanced Voice Mode, the system not only generated a startling loud pop but also produced a disconcerting gasp, making the experience chillingly memorable.

During my initial experience, nothing was as bizarre as what the OpenAI red team encountered while testing. On rare occasions, the GPT-4o model unexpectedly began to replicate the user’s own vocal tone and speech idiosyncrasies.

The predominant feeling from using ChatGPT’s Advanced Voice Mode was not discomfort or worry, but rather a lively sense of amusement. The chatbot’s responses ranged from amusingly incorrect solutions to New York Times puzzles to a spot-on rendition of Stitch from Lilo & Stitch as a San Francisco tour guide, providing many laughable moments.

Advanced Voice Mode proved effective at producing vocal impersonations with a bit of prompting. Initial attempts at mimicking cartoon characters like Homer Simpson and Eric Cartman were basically the standard AI voice with slight modifications. However, subsequent requests for more exaggerated versions delivered performances that were amusingly accurate, such as a campy impersonation of Donald Trump describing the Powerpuff Girls, which could easily fit into an upcoming episode of Saturday Night Live.

With the US presidential election approaching and concerns about election deepfakes prevalent, ChatGPT’s readiness to mimic a major candidate’s voice, such as Joe Biden and Kamala Harris, was surprising. However, these impressions were not as convincing as the caricature of Trump’s speech.

While the tool is best at English, it can switch between multiple languages within the same conversation. OpenAI red-teamed the GPT-4o model using 45 languages in total. When I set up two phones with Advanced Voice Mode and had them talk to each other like friends, the bots easily moved between French, German, and Japanese at my request. Although, I do need to spend more time testing to gauge how well the chatbot’s translation feature really works and its weak points.

ChatGPT brought theater kid energy when asked to perform a variety of emotional outbursts. The audio generations weren’t hyper-realistic, but the range and elasticity of the bot’s voice was impressive. I was surprised that it could do a decent vocal fry on command. Advanced Voice Mode doesn’t transcend the issues facing chatbots, like reliability, but its entertainment value alone could potentially pull the spotlight back to OpenAI—one of its biggest competitors, Google, just launched Gemini Live, the voice interface for its generative chatbot.

For now, I’ll keep testing it out and see what sticks. I’m using it most when I’m home alone and I want something to keep me company while researching articles and playing video games. The more time I spend talking with ChatGPT’s Advanced Voice Mode, the more I think OpenAI made a wise choice rolling out a less flirty version than what was originally demoed. Don’t want to get too emotionally attached.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

Rapid U.S. Growth Sparks Concerns of Labor Shortages in the Chip Industry

Next Article

Unveiling High-Tech Sabotage: How Hacking Wireless Shifters Could Impact Bike Racing

Related Posts