Google DeepMind released blockbuster WaveNet Technology: the robot’s face, the voice of an angel – Sohu of science and technology recently, Google’s artificial Intelligent Company DeepMind released its latest research in the field of computer speech synthesis breakthrough — WaveNet. WaveNet is a method of using neural network to the original audio waveform (Raw SoundWave) modeling technology, the audio quality is better than the current generation of all the text to speech (Text-to-Speech, TTS) system, the computer output audio and natural human voice to narrow the gap between the 50%, known as the best in the world! DeepMind in the twitter posting said: "let the human machine dialogue is long and the field of human-computer interaction since the dream!" we usually hear a computer or mobile phone output text to speech (TTS) audio will feel awkward and uncomfortable, even strange. The DeepMind launch of the new voice synthesis system WaveNet will greatly improve this situation, so that the machine output audio more natural, more like human voice. Of course, it’s nothing new to get a computer to make a sound. The most common method of TTS may be segmented speech synthesis (Concatenative Synthesis): a large number of pre recorded single speaker speech segments, build a large corpus, then simply from the selection and synthesis of complete large segments of audio, words and sentences. This "mechanical" method makes the computer audio output often produces voice tone change, burr, strange and even stammer, and can not adjust the voice of stressed syllables or emotion. Another approach is the so-called parametric approach, which uses mathematical models to arrange and assemble the known sounds into words or sentences. This technique is not easy to produce sound glitches, so the sound does not make the machine output sounds so mechanized. However, the common points of these two techniques are: simply and mechanically splicing the speech segments instead of creating the entire audio waveform from scratch. Audio jump WaveNet is a technology to create the entire audio waveform output from scratch. WaveNet the use of real human voice clips and the corresponding language, speech features to train the convolutional neural network (convolutional neural networks), allowing it to distinguish these two aspects (Language and speech) audio mode. In the use of the WaveNet system to enter a new text information, that is, the corresponding new phonetic features, the WaveNet system will generate the entire original audio waveform to describe the new text information. WaveNet organizational structure WaveNet is the gradual operation: first, generate an audio waveform samples; and then processing, the formation of an audio waveform samples, gradually. One of the most important is that the new sample of Chengdu will be affected by the results of the previous sample, that is, each step相关的主题文章: