copyright | lastupdated | keywords | subcollection | content-type | account-plan | completion-time | ||
---|---|---|---|---|---|---|---|---|
|
2023-03-14 |
speech to text,IBM cloud,getting started,tutorial,transcribe audio,speech recognition |
speech-to-text |
tutorial |
lite |
10m |
{{site.data.keyword.attribute-definition-list}}
{: #gettingStarted} {: toc-content-type="tutorial"} {: toc-completion-time="10m"}
The {{site.data.keyword.speechtotextfull}} service transcribes audio to text to enable speech transcription capabilities for applications. This curl
-based tutorial can help you get started quickly with the service. The examples show you how to call the service's POST /v1/recognize
method to request a transcript.
{: shortdesc}
The tutorial uses the curl
command-line utility to demonstrate REST API calls. For more information about curl
, see Using curl with Watson examples.
{: note}
[IBM Cloud]{: tag-ibm-cloud} Watch the following video for a visual summary of getting started with the {{site.data.keyword.speechtotextshort}} service.
{: video output="iframe" data-script="none" id="watsonmediaplayer" width="560" height="315" scrolling="no" allowfullscreen webkitallowfullscreen mozAllowFullScreen frameborder="0" style="border: 0 none transparent;"}
{: #getting-started-before-you-begin}
{: #getting-started-before-you-begin-cloud}
[IBM Cloud]{: tag-ibm-cloud}
-
Create an instance of the service: {: hide-dashboard}
- Go to the {{site.data.keyword.speechtotextshort}}{: external} page in the {{site.data.keyword.cloud_notm}} catalog.
- Sign up for a free {{site.data.keyword.cloud_notm}} account or log in.
- Read and agree to the terms of the license agreement.
- Click Create.
-
Copy the credentials to authenticate to your service instance:
-
View the Manage page for the service instance:
- If you are on the Getting started page for your service instance, click the Manage entry in the list of topics.
- If you are on the Resource list page, expand the AI / Machine Learning grouping in the Name column, and click the name of your service instance.
-
On the Manage page, click Show Credentials in the Credentials box.
-
Copy the
API Key
andURL
values for the service instance.
-
This tutorial uses an API key to authenticate. In production, use an IAM token. For more information see Authenticating to IBM Cloud. {: tip}
{: #getting-started-before-you-begin-icpd}
[IBM Cloud Pak for Data]{: tag-cp4d}
The {{site.data.keyword.speechtotextshort}} for {{site.data.keyword.icp4dfull_notm}} must be installed and configured before beginning this tutorial. For more information, see Watson Speech services on Cloud Pak for Data{: external}.
- Create an instance of the service by using the web client, the API, or the command-line interface. For more information about creating a service instance, see Creating a Watson Speech services instance{: external}.
- Follow the instructions in Creating a Watson Speech services instance to obtain a Bearer token for the instance. This tutorial uses a Bearer token to authenticate to the service.
{: #getting-started-transcribe} {: step}
Call the POST /v1/recognize
method to request a basic transcript of a FLAC audio file with no additional request parameters.
-
Download the sample audio file audio-file.flac{: external}.
-
Issue the following command to call the service's
/v1/recognize
method for basic transcription with no parameters. The example uses theContent-Type
header to indicate the type of the audio,audio/flac
. The example uses the default language model,en-US_BroadbandModel
, for transcription.[IBM Cloud]{: tag-ibm-cloud}
- Replace
{apikey}
and{url}
with your API key and URL. {: hide-dashboard} - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize"
{: pre}
[IBM Cloud Pak for Data]{: tag-cp4d}
- Replace
{token}
and{url}
with the access token and URL for your service instance. - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize"
{: pre}
- Replace
The service returns the following transcription results:
{
"result_index": 0,
"results": [
{
"alternatives": [
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
}
],
"final": true
}
]
}
{: codeblock}
{: #getting-started-transcribe-options} {: step}
Call the POST /v1/recognize
method to transcribe the same FLAC audio file, but specify two transcription parameters.
-
If necessary, download the sample audio file audio-file.flac{: external}.
-
Issue the following command to call the service's
/v1/recognize
method with two extra parameters. Set thetimestamps
parameter totrue
to indicate the beginning and end of each word in the audio stream. Set themax_alternatives
parameter to3
to receive the three most likely alternatives for the transcription. The example uses theContent-Type
header to indicate the type of the audio,audio/flac
, and the request uses the default model,en-US_BroadbandModel
.[IBM Cloud]{: tag-ibm-cloud}
- Replace
{apikey}
and{url}
with your API key and URL. {: hide-dashboard} - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST -u "apikey:{apikey}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize?timestamps=true&max_alternatives=3"
{: pre}
[IBM Cloud Pak for Data]{: tag-cp4d}
- Replace
{token}
and{url}
with the access token and URL for your service instance. - Modify
{path_to_file}
to specify the location of theaudio-file.flac
file.
curl -X POST \ --header "Authorization: Bearer {token}" \ --header "Content-Type: audio/flac" \ --data-binary @{path_to_file}audio-file.flac \ "{url}/v1/recognize?timestamps=true&max_alternatives=3"
{: pre}
- Replace
The service returns the following results, which include timestamps and three alternative transcriptions:
{
"result_index": 0,
"results": [
{
"alternatives": [
{
"timestamps": [
["several":, 1.0, 1.51],
["tornadoes":, 1.51, 2.15],
["touch":, 2.15, 2.5],
. . .
]
},
{
"confidence": 0.96
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
},
{
"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
},
{
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado and Sunday "
}
],
"final": true
}
]
}
{: codeblock}
{: #getting-started-next-steps}
- To try an example application that transcribes text from streaming audio input or from a file that you upload, see the {{site.data.keyword.speechtotextshort}} demo{: external}.
- For more information about the service's interfaces and features, see Service features.
- For more information about all methods of the service's interfaces, see the API & SDK reference{: external}.