PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

Anonymous submission to INTERSPEECH 2023

1. Abstract

Style transfer has shown improved performance in recent years. The style control is often restricted to discrete style categories and expressive speech recordings. However, in practical situations, users may be interested in transfer style with no reference speech in the target style just by typing text descriptions of desired styles. The text-guided content generation techniques have drawn wide attention recently. In this work, we explore the possibility of controllable style transfer with natural language descriptions. Specifically, we propose PromptStyle, a text prompt-guided cross-speaker style transfer system. PromptStyle consists of an improved VITS and a cross-modal style encoder. The cross-modal style encoder constructs a shared space of stylistic and semantic representation through a two-stage training process. Experiments show that PromptStyle can achieve the goal of style transfer with text prompts while maintaining relatively high stability and speaker similarity.

2. Audio Samples

The following parts are the audio samples on PromptStyle and two baseline systems. Note that we show PromptStyle from 2 parts as follows.

Performance on the Style Transfer: We aim to show that PromptStyle can control transfered style with different reference speech.

Ablation: We aim to verify the influence of the style embedding and speaker embedding added to different modules as conditions.

Style Control by Text Prompt: We aim to show that PromptStyle can control transfered style with different natural language descriptions.

2.1 Performance on the Style Transfer

reference speech target speaker CST-TTS GST-MLTTS PromptStyle

2.2 Ablation

reference speech target speaker M1 M2 M3 Proposed

2.3 Style Control by Text Prompt

style prompt reference speech control with reference speech control with style prompt

感到难以置信夹杂着赞许

Feeling incredulous mixed with a few admiration

真的很失望,很恼火,太气愤了

Utterly dismayed, frustrated, and incensed by the situation.

以凄凉和无助的语气恳求道

A poignant and helpless tone to the expression, with a sense of earnest entreaty

为他人取得好成绩感到开心

Feeling happy for others when they achieve good results.

温柔的语气包含着世间美好

The gentle tone contains the beauty of the world

以一种轻松诙谐的语气问道

asked in a light-hearted and witty tone

风趣诙谐又清晰的说明信息

Witty and humorous explanations that are also clear and concise convey information