微软语音转文本api

cooolr 于 2021-06-10 发布

文本转语音文档: [https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech)

ssml语法文档: [https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=python](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=python)

windows静默播放: [https://qastack.cn/superuser/101974/play-a-sound-maybe-wav-from-windows-line-command](https://qastack.cn/superuser/101974/play-a-sound-maybe-wav-from-windows-line-command)

import requests

subscription_key = 'xxxxxxxxxxxxxxxxxxxxxxxxx'
fetch_token_url = '[https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken](https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken)'
url = "[https://eastus.tts.speech.microsoft.com/cognitiveservices/v1](https://eastus.tts.speech.microsoft.com/cognitiveservices/v1)"

def get_token(subscription_key):
    headers = {'Ocp-Apim-Subscription-Key': subscription_key}
    response = requests.post(fetch_token_url, headers=headers)
    return str(response.text)

token = get_token(subscription_key)

headers = {
    'Authorization': f'Bearer {token}',
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': 'audio-24khz-160kbitrate-mono-mp3'
}

text = "你好,我是微软小娜"

payload = f'''
<speak version="1.0" xmlns="[http://www.w3.org/2001/10/synthesis](http://www.w3.org/2001/10/synthesis)" xmlns:mstts="[https://www.w3.org/2001/mstts](https://www.w3.org/2001/mstts)" xml:lang="zh-CN">
    <voice name='zh-CN-XiaoxiaoNeural'>
        <mstts:express-as style="chat">
            {text}
        </mstts:express-as>
    </voice>
</speak>
'''.encode('utf-8')

response = requests.post(url, headers=headers, data=payload)

with open(r"temp.mp3","wb") as f:
    f.write(response.content)