PolyVoice:

Language Models for Speech to Speech Translation

[Paper]


Qianqian Dong∗, Zhiying Huang∗, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang,
Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

ByteDance

Abstract. We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model. This grants our framework the ability to preserve the voice characteristics and the speaking style of the original speech. We examine our system on Chinese->English and English->Spanish pairs. Experimental results show that our system can generate speech with high translation quality and audio quality.

This page is for research demonstration purposes only.

Contents

Model Overview

Figure. Overview of PolyVoice. The framework consists of two LM-based components: a S2UT front-end for translation and a U2S back-end for synthesis.

Zero-Shot Cross-Lingual Unit to Speech


English UTS with Chinese prompts


English Speech (extract unit) Chinese Speaker Prompt U-SLM

And in this last action he falls into the classic sin of pride.

这些努力获得的反馈相当踊跃。

And in this last action he falls into the classic sin of pride.

Edison held that the electricity sold must be measured just like gas or water and he proceeded to develop a meter.

白色大鼠为野生褐家鼠的变种。

Edison held that the electricity sol must be measured just like gas or water and he proceeded to develop a meter.

But they have nothing to do with the interpretation of plato and in spirit they are opposed to him.

甘肃省的作家有什么?

But they have nothing to do with the interpretation of plato and in spirit they are opposed to him.

But it is surmised that you will find difficulties in the way of your entering at once upon your government.

摩洛哥农业在向欧洲出口时享受特殊待遇。

But it is surmised that you will find difficulties in the way of your entering at once upon your governor.

Grace be to you and peace from god the father and from our lord jesus christ.

我们支持国内经济市场。

Grace be to you in peace from god the father and from our lord jesus christ.

He began a confused complaint against the wizard who had vanished behind the curtain on the left.

这个议会代表着欧洲民众。

He began a confused complaint against the wizard who had vanished behind the curtain on the left.


Spanish UTS with English prompts


Spanish Speech (extract unit) English Speaker Prompt U-SLM

La niña chole estaba maldita como mirra y como salomé.

Besides these fees, legitimate and illegitimate, there were others which must be paid before release.

La niña chole estaba maldita como mirra y como salomé.

No puedes dejar que las pesadillas te hagan mella.

Visits to Other Texas Cities

No puedes dejar que las pesadillas te hagan media.

Adela me trae de cabeza y me he enamorado.

It was the strongest circumstantial evidence which had ever been brought forward in a murder case.

Adela me trae de cabeza y me he enamorado.

Hablando de esta suerte nos hizo cruzar un largo corredor.

It truly is asserted the magician.

Hablando de esta suerte nos hizo cruzar un largo corredor.

Hay tres tipos de hombres: los que se lanzan sin complejos.

They count six and this singer is as good as nothing.

Hay tres tipos de opas los que se lanzan sin complejos.

El viaje se iba a realizar en septiembre.

This was at the march election eighteen fifty five.

El viaje se iba a realizar en septiembre.


Zero-Shot Speech-to-Speech Translation


Written language (Chinese to English Translation on EMIME dataset)


Chinese Speech English Ground Truth PolyVoice

立法已经到位,以便达成这些目标。

Legislation is already in place to achieve these aims.

Legislation is already in place to achieve these goals.

我们支持国内经济市场。

We support the internal economic market.

We support the domestic economic market

这个议会代表着欧洲民众。

This parliament represents the people of europe.

This parliament represents the european people.

可再生能源也不例外。

Renewable energy is no exception.

Renewable energy is no exception.

各项报导反映出这种主流态度。

The various reports reflect this mainstream attitude.

Various reports reflect this mainstream attitude.

摩洛哥农业在向欧洲出口时享受特殊待遇。

Moroccan agriculture enjoys special treatment when exporting to europe.

Moraccan agriculture enjoys special treatment when exporting to europe.


Unwritten language (English to Spanish Translation on CVSS dataset)


English Speech Spanish Ground Truth PolyVoice

The japanese developed the city especially the port.

Los japoneses desarrollaron la ciudad especialmente el puerto.

Los japoneses desarrollaron la ciudad especialmente el puerto.

This album sounds modern and fresh.

Este álbum suena moderno y fresco.

Este álbum suena moderno y fresco.

The band has had several formations.

A banda ha tenido varias formaciones.

La banda ha tenido varias formaciones.

Israeli explorers come from all sectors of the country's society

Los exploradores israelíes provienen de todos los sectores de la sociedad del país.

Los exploradores israelíes provienen de todos los sectores de la sociedad del país.

Political science professor at the university of paris.

Profesor de ciencias políticas en la universidad de parís.

Profesor de ciencias políticas en la universidad de parís.

The last brother's name is unknown.

El nombre del último hermano es desconocido.

El nombre del último hermano es desconocido.