Digital Einstein uses AI to move and speak almost in real-time.
Digitalized Einstein Alforithmic
Say hello to Albert Einstein. And guess what? He'll say hi back. Well, a digitalized version of Einstein will, at least.
Audio content production company Aflorithmic and digital humans company UneeQ teamed up to create a digital version of the famous genius, Albert Einstein.
This proof of concept uses Uneeq's character rendering to make Einstein look incredibly realistic, as well as an advanced computational knowledge engine to act as Einstein's new brain. AI at its best!
The team chose Einstein for its project as he's so well-known, was a real genius, was interested in technology, and most people would be keen to strike up a conversation with him.
And now they can, of sorts.
Thanks to Uneeq and Alforithmic's work, you can ask Einstein almost anything. From the basic concepts of physics to personal life questions, the German-born theoretical physicist's digitalized version will answer your questions.
One of the trickiest parts of the project, the team said, was creating Einstein's voice in as realistic a way as possible.
In this regard, the team took a little extra leeway. Given Einstein had a relatively thick German accent, and that voice recordings from his speeches were not very clear, the team decided to lighten his accent to make him easier to understand.
The next step was also crucial: Ensuring the digitalized version answered in believable time. After all, the experience should feel like a flowing conversation rather than a stunted computer system.
In the end, Einstein's synthesized voice has a lighter German accent but still keeps the genius' friendly, slow, and slightly high-pitched voice and manner of speaking. And to top it all off, he responds in near-real-time — in under three seconds.
So, now you can "speak" with Albert Einstein himself via the team's Digital Einstein Experience, and ask him your most pressing questions.
Aflorithmic, an audio content production company and digital human creators UneeQ have come together to synthesize the voice of legendary scientist Albert Einstein.
The organizations wish to allow the users to ask AI questions to a more realistic Einstein similar to how they would be engaging with the real-life physicist himself. The companies have chosen Einstein due to his famous reputation as an actual genius, historical icon, technology enthusiast, and someone they felt many people would want to ask many questions.
UneeQ has combined visual character rendering techniques with an advanced computational knowledge engine to make this prototype as realistic as possible. However, in terms of resurrecting an authentic voice based on the real Albert Einstein, researchers had to face a few setbacks. The primary reason being that the only accounts they managed to uncover from historical records reported Einstein to have a heavy German accent and that he spoke, slowly, wisely, and kindly in a high-pitched tone.
Accept Non Necessary cookies to view the content.
Due to Einstein’s thick accent and the poor quality of the old recordings of his voice, development teams struggled to capture a solid frame of reference for how he may have sounded. The organizations, however, do not expect that too many users will worry that much about the accuracy of Einstein’s voice when it comes to this new bot. Thus, researchers plan to create a unique voice for Einstein that might not be identical to the physicist’s voice but will become one that users of this bot will recognize.
This new version of Einstein’s voice has the physicist still speaking with a German accent with an added sense of dry humor and friendliness to reflect that of his real-life personality. Researchers have also given this AI the ability to speak as if reflecting upon his knowledge when interacting with users.
Along with the voice cloning aspect, researchers also developed Digital Einstein to respond quickly to user questions, similar to a customer service chatbot or personal assistant. They created a real-time turnaround for input text received by the computational knowledge engine to Aflorithmic’s API to achieve this. Since then, in merely two weeks, the research teams managed to decrease the Einstein bot’s response time from 12 seconds to less than three seconds. The companies predict that the Digital Einstein project is just the beginning of the interactive potential of conversational AI with humans.
Woman With Motor Neurone Disease Has Voice Reconstructed From Game Show Recordings
“It is wonderful being able to talk to people and sound normal and not like a machine,” said Ms Whitelaw after appearing on Good Morning Britain to reunite with Tipping Point host Ben Shepard, show off her new voice, and tell her story. "My frustration has vanished and I can now have satisfactory conversations with everyone." Motor neurone disease is an incurable, neurodegenerative condition that affects 2.6 women per 100,000 in the USA. It rose to prominence over the past few decades after legendary scientist Professor Stephen Hawking brought the condition to the attention of the public. In 2014, the ALS Ice Bucket Challenge went viral on social media, in which people would throw a bucket of ice water over their heads to raise money and awareness for MND. Although it seemed trivial, the challenge raised over $135 million worldwide and helped contribute to an array of therapies to attempt to help people with MND, including an antibody treatment with ongoing clinical trials.In a beautiful display of how modern technology can impact people’s lives, a woman has had her voice reconstructed using her appearance on a game show after she lost it to motor neurone disease (MND).
Helen Whitelaw, a 76-year-old lady from Glasgow, developed MND and rapidly lost her voice as the degenerative neurological disorder took hold. Unable to speak without the aid of a machine, Ms Whitelaw hated the robotic voice that she now sounded like, and wished for an alternative. “The diagnosis was devastating for my whole family,” she told STV news in an interview. “I wanted people to know what I was saying and I did not want to sound like a machine.” The family approached a voice reconstruction company, Speak Unique – but by this point, Ms Whitelaw’s voice had deteriorated too far. Luckily, back in 2019, she entered onto the ITV gameshow Tipping Point, where she won almost £3,000. The show had clips of her happily joking with the host, Ben Shephard, and could be perfect for a voice reconstruction. This is not common practice, so needless to say the company was apprehensive. “We were apprehensive about how we could be able to use it,” said Alice Smith, CEO of Speak Unique, in a statement to STV. “We were sort of joking that she’d definitely be able to say, ‘drop zone four’, as that was such a catchphrase during the show. But we were so pleased that we did manage to get it to work with her appearance on Tipping Point.” Ms Whitelaw’s voice was recreated from the video recordings, bringing to life her text-to-speech system in a way unlike any other. Her new voice is almost indistinguishable from a human voice, and she claims the company has “given her voice back”.
You can now speak using someone else’s voice with Deep Learning
As you can see, I’ve set the text I want the computer to read on the right side as: “Did you know that the Toronto Raptors are Basketball champions? Basketball is a great sport.”You can click on the “Random” buttons under each section to randomise the voice input, then click “Load” in load the voice input into the system.Dataset selects the dataset from which you will select voice samples, Speaker selects the person who is talking, and Utterance selects the phrase which is spoken by the input voice. To hear how the input voice sounds, simply click “Play”.Once you press the button “Synthesize and vocode” the algorithm will run. Once it’s finished you’ll here the input Speaker reading your text out-loud.You can even record your own voice as an input but clicking on the “Record one” button, which is quite fun to play around with!
Text-to-Speech (TTS) Synthesis refers to the artificial transformation of text to audio. A human performs this task simply by reading. The goal of a good TTS system is to have a computer do it automatically.One very interesting choice that one makes when creating such a system is the selection of which voice to use for the generated audio. Should it be a man or a woman? A loud voice or a soft one?This used to present a restriction when doing TTS with Deep Learning. You’d have to collect a dataset of text-speech pairs. The set of speakers who recorded that speech is fixed — you can’t have unlimited speakers!So if you wanted create audio of your voice, or someone else’s, the only way to do it would have been to collect a whole new dataset.AI research from Google nicknamed Voice Cloning makes it possible for a computer to read out-loud using any voice.
How Voice Cloning Works
It’s clear that in order for a computer to be able to read out-loud with any voice, it needs to somehow understand 2 things: what it’s reading and how it reads it.Thus, Google researchers designed the voice cloning system to have 2 inputs: the text we want to be read and a sample of the voice which we want to read the text.For example, if we wanted Batman to read the phrase “I love pizza”, then we’d give the system two things: text that says “I love pizza” and a short sample of Batman’s voice so it knows what Batman should sound like. The output should then be an audio of Batman’s voice saying the words “I love pizza”!From a technical view, the system is then broken down into 3 sequential components:(1) Given a small audio sample of the voice we wish to use, encode the voice waveform into a fixed dimensional vector representation(2) Given a piece of text, also encode it into a vector representation. Combine the two vectors of speech and text, and decode them into a Spectrogram(3) Use a Vocoder to transform the spectrogram into an audio waveform that we can listen to.
In the paper, the three components are trained independently.Text-to-speech systems have gotten a lot of research attention in the Deep Learning community over the past few years. And indeed, there are many proposed solutions for Text-to-Speech that work quite well, being based on Deep Learning.The big key here is that the system is able to take the “knowledge” that the speaker encoder learns from the voice and apply it to the text.After being separately encoded, the speech and the text are combined in a common embedding space, and then decoded together to create the final output waveform. Code to clone voicesThanks to the beauty of the open source mindset in the AI community, there is a publicly available implementation of this voice cloning yourself!