OpenAI Introduces Voice Engine, a Voice Cloning Tool: It Can Replicate the Human Voice in Just a 15-Second Sample!

OpenAI has announced a new voice generation tool called Voice Engine. This tool can realistically replicate human voices and translate text into audio.

OpenAI is undoubtedly the first company that comes to mind when it comes to artificial intelligence. The technology giant is pioneering the new era we have entered with its models in many different fields, from chat bots to image generation. Most recently, we saw its “Sora” model, which creates realistic videos that will leave everyone’s mouths agape.

OpenAI has now announced a brand new model. This tool, called “Voice Engine”, replicates the human voice in a realistic way.

In 15 seconds it can clone a real human voice

This is not the first time the company has focused on voice. It already had AI-powered voice tools. The feature that allows us to talk to ChatGPT is the best example. Voice Engine is part of the company’s efforts to generate voice from text. According to OpenAI, this tool has actually been tested on a small group since 2022.

Let’s briefly explain what Voice Engine is. This tool is actually a voice cloning model. With just a 15-second sample, it can indistinguishably copy the voice of a real person. After that, the user can enter a text of his/her choice and have the copy voice produced by artificial intelligence read what he/she wants. It is also possible to translate the cloned voice into different languages.

In its blog post, OpenAI also provided information on where the model can be used. These include reading assistance, content translation, and helping people with speech impairments.

OpenAI not yet rolling out Voice Engine due to security concerns

Voice Engine has not been made widely available. The company says the main reason for this is the risks associated with such a cloning technology. It even adds that it originally planned to create a program for developers to participate in, but canceled it due to potential problems. The group currently being tested has already signed contracts prohibiting the use of voice without consent.

Security is a really serious problem. We know how advanced deepfake technologies are today. We see fake images and sounds everywhere. Inappropriate content impersonating celebrities, fraudulent images and selser are the best examples. That’s why OpenAI’s Voice Engine model is very risky. And the company knows it. That’s why they don’t have any plans for a widespread release yet. So we don’t know when it’s coming.

OpenAI has shared voice recordings created with Voice Engine on its website. You can access examples from the video above. In the recordings shared in pairs or trios, the ones at the top belong to real people, while the ones at the bottom, shared as “Generated audio”, belong to artificial intelligence.