博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
IBM Cloud Speech to Text 语音识别
阅读量:6324 次
发布时间:2019-06-22

本文共 9537 字,大约阅读时间需要 31 分钟。

 

 https://speech-to-text-demo.ng.bluemix.net/

 点击首页紫色的那个「Star for free in IBM Cloud」按钮,注册IBM Cloud并登陆

然后添加SPEECH TO TEXT 服务。

点击左侧service credentials, 创建new credentials。

复制,保存你的credentials。

{

"apikey": "xxxx",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:speech-to-text:au-syd:xxx::",
"iam_apikey_name": "auto-generated-apikey-xxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::xxxx",
"url": "https://gateway-syd.watsonplatform.net/speech-to-text/api"
}

REF:

https://console.bluemix.net/apidocs/speech-to-text?language=python

PARAMETERS

  • An AudioSource object that provides the audio that is to be transcribed.

  • The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

    Allowable values: [application/octet-streamaudio/basicaudio/flacaudio/g729audio/l16audio/mp3audio/mpegaudio/mulawaudio/oggaudio/ogg;codecs=opusaudio/ogg;codecs=vorbisaudio/wavaudio/webmaudio/webm;codecs=opusaudio/webm;codecs=vorbis]

  • RecognizeCallback object that defines methods to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.

  • The identifier of the model that is to be used for the recognition request. See .

    Allowable values: [ar-AR_BroadbandModel,de-DE_BroadbandModel,en-GB_BroadbandModel,en-GB_NarrowbandModel,en-US_BroadbandModel,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,fr-FR_BroadbandModel,fr-FR_NarrowbandModel,ja-JP_BroadbandModel,ja-JP_NarrowbandModel,ko-KR_BroadbandModel,ko-KR_NarrowbandModel,pt-BR_BroadbandModel,pt-BR_NarrowbandModel,zh-CN_BroadbandModel,zh-CN_NarrowbandModel]

    Default: en-US_BroadbandModel

  • The customization ID (GUID) of a custom language model that is to be used for the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See .

    Note: Use this parameter instead of the deprecated customization_id parameter.

  • The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See .

  • If you specify a customization ID, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

    Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

    The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

    See .

  • The version of the specified base model that is to be used for the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See .

  • The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See .

    Default: 30

  • If true, the service returns interim results as a stream of JSON SpeechRecognitionResults objects. If false, the service returns a single SpeechRecognitionResults object with final results only. See .

    Default: false

  • An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords. See .

  • A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords. See .

  • The maximum number of alternative transcripts that the service is to return. By default, a single transcription is returned. See .

    Default: 1

  • A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No alternative words are computed if you omit the parameter. See .

  • If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, no word confidence measures are returned. See .

    Default: false

  • If true, the service returns time alignment for each word. By default, no timestamps are returned. See .

    Default: false

  • If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only. See .

  • If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, no smart formatting is performed. Applies to US English, Japanese, and Spanish transcription only. See .

    Default: false

  • If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, no speaker labels are returned. Specifying true forces the timestampsparameter to be true, regardless of whether you specify false for that parameter.

    To determine whether a language model supports speaker labels, use the Get a modelmethod and check that the attribute speaker_labels is set to true. See .

    Default: false

  • If you are passing requests through a proxy, specify the host name of the proxy server. Use the http_proxy_port parameter to specify the port number at which the proxy listens. Omit both parameters if you are not using a proxy.

    Default: None

  • If you are passing requests through a proxy, specify the port number at which the proxy service listens. Use the http_proxy_hostparameter to specify the host name of the proxy. Omit both parameters if you are not using a proxy.

    Default: None

  • Deprecated. Use thelanguage_customization_id parameter to specify the customization ID (GUID) of a custom language model that is to be used with the request. Do not specify both parameters with a request.

  • The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See .

  • If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, no redaction is performed.

    When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_thresholdparameters) and returns only a single final transcript (forces the max_alternativesparameter to be 1).

    See .

    Default: false

 

转载于:https://www.cnblogs.com/watermarks/p/10336687.html

你可能感兴趣的文章
Ubuntu 19.04(Disco Dingo)将采用 Linux 5.0 内核
查看>>
《Python编程:从入门到实践》 第3章习题
查看>>
模仿Tomcat的BIO,NIO线程模型
查看>>
react native一键分享功能实现&原理和注意点(支持微信、qq、新浪微博等)
查看>>
第十四章:绝对布局(七)
查看>>
鸢尾花数据集实验
查看>>
C语言内存优化——继续含泪总结
查看>>
Android事件分发机制详解
查看>>
一款数据加密共享与签名方案
查看>>
SpringBoot-05-之上传文件
查看>>
查看与修改链接学习笔记
查看>>
红黑树
查看>>
Python数据类型和变量
查看>>
nginx配置http和https共存
查看>>
Firefox 将添加画中画功能
查看>>
JDK8 和 JDK9 的安装目录的区别
查看>>
python引包module出现No module named XXX,以及爬虫中文乱码问题
查看>>
Android不编译某个模块
查看>>
Kotlin使用泛型搭建一个MVP最简单实例
查看>>
Jmeter的下载安装和环境变量配置(Windows10系统)
查看>>