Microsoft’s AI voice generator
January 10, 2023
Unfortunately for many aspiring artists, artificial intelligence is becoming increasingly adept at creative tasks. The latest evidence of this trend was provided by a team of researchers from the American Cornell University, who latio of Microsoft’s AI tools WALL-E. Its purpose is generating authentic audio recordings of the human voice based on a very short sample. The software was trained on 60,000 hours of English, and it was enough for this experiment only three seconds of footage in order to make a fake that, according to the authorities, not only does he say words that his role model is not, but he also guesses the color of his flesh-and-blood colleague’s voice, as well as the emotion with which he reads a certain text. While that sounds impressive, a few examples posted on GitHub belie it somewhat. There are those who imitate reliably, but also those where it is obvious that it is a fake. Admittedly, if the sample had been larger, the result would probably have been far more favorable in favor of the machine.
For now, WALL-E not available to the public, otherwise there would certainly be abuse. Legislators of several countries around the world are aware of this and do not look kindly on AI generators. So, as of today, it is forbidden to make it in China deepfake (or as they call it “synthetic”) content. Officials at China’s Cyberspace Administration say it has the potential to undermine national security, as it is an excellent tool for disseminating fake news. The companies there that are involved in the development of such software are expected to prevent the misuse of their algorithms, and they are obliged to post any synthetic content on the video watermark. Similar restrictive measures are implemented in the European Union, Britain and several other American states.
Although the apprehension is justified, the technology is tempting for a reason. Thus, for example, in the Apple library you can find audio books with the words “Narrated by Apple Books”, which means that the reading was done by artificial intelligence. Why pay the actors 100 times, when you can pay the software once (and he drinks less).
Tags: WALLE Microsofts voice generation tool