Gunakan StartTranscriptionJob dengan AWS SDK atau CLI - Amazon Transcribe

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

Gunakan StartTranscriptionJob dengan AWS SDK atau CLI

Contoh kode berikut menunjukkan cara menggunakanStartTranscriptionJob.

Contoh tindakan adalah kutipan kode dari program yang lebih besar dan harus dijalankan dalam konteks. Anda dapat melihat tindakan ini dalam konteks dalam contoh kode berikut:

.NET
AWS SDK for .NET
catatan

Ada lebih banyak tentang GitHub. Temukan contoh lengkapnya dan pelajari cara mengatur dan menjalankannya di AWS Repositori Contoh Kode.

/// <summary> /// Start a transcription job for a media file. This method returns /// as soon as the job is started. /// </summary> /// <param name="jobName">A unique name for the transcription job.</param> /// <param name="mediaFileUri">The URI of the media file, typically an Amazon S3 location.</param> /// <param name="mediaFormat">The format of the media file.</param> /// <param name="languageCode">The language code of the media file, such as en-US.</param> /// <param name="vocabularyName">Optional name of a custom vocabulary.</param> /// <returns>A TranscriptionJob instance with information on the new job.</returns> public async Task<TranscriptionJob> StartTranscriptionJob(string jobName, string mediaFileUri, MediaFormat mediaFormat, LanguageCode languageCode, string? vocabularyName) { var response = await _amazonTranscribeService.StartTranscriptionJobAsync( new StartTranscriptionJobRequest() { TranscriptionJobName = jobName, Media = new Media() { MediaFileUri = mediaFileUri }, MediaFormat = mediaFormat, LanguageCode = languageCode, Settings = vocabularyName != null ? new Settings() { VocabularyName = vocabularyName } : null }); return response.TranscriptionJob; }
CLI
AWS CLI

Contoh 1: Untuk mentranskripsikan file audio

start-transcription-jobContoh berikut mentranskripsikan file audio Anda.

aws transcribe start-transcription-job \ --cli-input-json file://myfile.json

Isi dari myfile.json:

{ "TranscriptionJobName": "cli-simple-transcription-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" } }

Untuk informasi selengkapnya, lihat Memulai (Antarmuka Baris AWS Perintah) di Panduan Pengembang Amazon Transcribe.

Contoh 2: Untuk mentranskripsikan file audio multi-saluran

start-transcription-jobContoh berikut mentranskripsikan file audio multi-saluran Anda.

aws transcribe start-transcription-job \ --cli-input-json file://mysecondfile.json

Isi dari mysecondfile.json:

{ "TranscriptionJobName": "cli-channelid-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "Settings":{ "ChannelIdentification":true } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-channelid-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "StartTime": "2020-09-17T16:07:56.817000+00:00", "CreationTime": "2020-09-17T16:07:56.784000+00:00", "Settings": { "ChannelIdentification": true } } }

Untuk informasi selengkapnya, lihat Mentranskripsikan Audio Multi-Saluran di Panduan Pengembang Amazon Transcribe.

Contoh 3: Untuk mentranskripsikan file audio dan mengidentifikasi speaker yang berbeda

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan mengidentifikasi speaker dalam output transkripsi.

aws transcribe start-transcription-job \ --cli-input-json file://mythirdfile.json

Isi dari mythirdfile.json:

{ "TranscriptionJobName": "cli-speakerid-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "Settings":{ "ShowSpeakerLabels": true, "MaxSpeakerLabels": 2 } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-speakerid-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "StartTime": "2020-09-17T16:22:59.696000+00:00", "CreationTime": "2020-09-17T16:22:59.676000+00:00", "Settings": { "ShowSpeakerLabels": true, "MaxSpeakerLabels": 2 } } }

Untuk informasi selengkapnya, lihat Mengidentifikasi Pembicara di Panduan Pengembang Amazon Transcribe.

Contoh 4: Untuk mentranskripsikan file audio dan menutupi kata-kata yang tidak diinginkan dalam output transkripsi

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan menggunakan filter kosakata yang sebelumnya Anda buat untuk menutupi kata-kata yang tidak diinginkan.

aws transcribe start-transcription-job \ --cli-input-json file://myfourthfile.json

Isi dari myfourthfile.json:

{ "TranscriptionJobName": "cli-filter-mask-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "Settings":{ "VocabularyFilterName": "your-vocabulary-filter", "VocabularyFilterMethod": "mask" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-filter-mask-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension" }, "StartTime": "2020-09-18T16:36:18.568000+00:00", "CreationTime": "2020-09-18T16:36:18.547000+00:00", "Settings": { "VocabularyFilterName": "your-vocabulary-filter", "VocabularyFilterMethod": "mask" } } }

Untuk informasi selengkapnya, lihat Memfilter Transkripsi di Panduan Pengembang Amazon Transcribe.

Contoh 5: Untuk mentranskripsikan file audio dan menghapus kata-kata yang tidak diinginkan dalam output transkripsi

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan menggunakan filter kosakata yang sebelumnya Anda buat untuk menutupi kata-kata yang tidak diinginkan.

aws transcribe start-transcription-job \ --cli-input-json file://myfifthfile.json

Isi dari myfifthfile.json:

{ "TranscriptionJobName": "cli-filter-remove-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "Settings":{ "VocabularyFilterName": "your-vocabulary-filter", "VocabularyFilterMethod": "remove" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-filter-remove-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "StartTime": "2020-09-18T16:36:18.568000+00:00", "CreationTime": "2020-09-18T16:36:18.547000+00:00", "Settings": { "VocabularyFilterName": "your-vocabulary-filter", "VocabularyFilterMethod": "remove" } } }

Untuk informasi selengkapnya, lihat Memfilter Transkripsi di Panduan Pengembang Amazon Transcribe.

Contoh 6: Untuk mentranskripsikan file audio dengan akurasi yang meningkat menggunakan kosakata khusus

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan menggunakan filter kosakata yang sebelumnya Anda buat untuk menutupi kata-kata yang tidak diinginkan.

aws transcribe start-transcription-job \ --cli-input-json file://mysixthfile.json

Isi dari mysixthfile.json:

{ "TranscriptionJobName": "cli-vocab-job", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "Settings":{ "VocabularyName": "your-vocabulary" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-vocab-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "the-language-of-your-transcription-job", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "StartTime": "2020-09-18T16:36:18.568000+00:00", "CreationTime": "2020-09-18T16:36:18.547000+00:00", "Settings": { "VocabularyName": "your-vocabulary" } } }

Untuk informasi selengkapnya, lihat Memfilter Transkripsi di Panduan Pengembang Amazon Transcribe.

Contoh 7: Untuk mengidentifikasi bahasa file audio dan menuliskannya

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan menggunakan filter kosakata yang sebelumnya Anda buat untuk menutupi kata-kata yang tidak diinginkan.

aws transcribe start-transcription-job \ --cli-input-json file://myseventhfile.json

Isi dari myseventhfile.json:

{ "TranscriptionJobName": "cli-identify-language-transcription-job", "IdentifyLanguage": true, "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-identify-language-transcription-job", "TranscriptionJobStatus": "IN_PROGRESS", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/Amazon-S3-prefix/your-media-file-name.file-extension" }, "StartTime": "2020-09-18T22:27:23.970000+00:00", "CreationTime": "2020-09-18T22:27:23.948000+00:00", "IdentifyLanguage": true } }

Untuk informasi selengkapnya, lihat Mengidentifikasi Bahasa di Panduan Pengembang Amazon Transcribe.

Contoh 8: Untuk mentranskripsikan file audio dengan informasi yang dapat diidentifikasi secara pribadi disunting

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dan menyunting informasi identitas pribadi apa pun dalam keluaran transkripsi.

aws transcribe start-transcription-job \ --cli-input-json file://myeighthfile.json

Isi dari myeigthfile.json:

{ "TranscriptionJobName": "cli-redaction-job", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension" }, "ContentRedaction": { "RedactionOutput":"redacted", "RedactionType":"PII" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-redaction-job", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension" }, "StartTime": "2020-09-25T23:49:13.195000+00:00", "CreationTime": "2020-09-25T23:49:13.176000+00:00", "ContentRedaction": { "RedactionType": "PII", "RedactionOutput": "redacted" } } }

Untuk informasi selengkapnya, lihat Redaksi Konten Otomatis di Panduan Pengembang Amazon Transcribe.

Contoh 9: Untuk menghasilkan transkrip dengan informasi identitas pribadi (PII) yang disunting dan transkrip yang tidak disunting

start-transcription-jobContoh berikut menghasilkan dua transkripsi file audio Anda, satu dengan informasi yang dapat diidentifikasi secara pribadi disunting, dan yang lainnya tanpa redaksi apa pun.

aws transcribe start-transcription-job \ --cli-input-json file://myninthfile.json

Isi dari myninthfile.json:

{ "TranscriptionJobName": "cli-redaction-job-with-unredacted-transcript", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension" }, "ContentRedaction": { "RedactionOutput":"redacted_and_unredacted", "RedactionType":"PII" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-redaction-job-with-unredacted-transcript", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://Amazon-S3-Prefix/your-media-file.file-extension" }, "StartTime": "2020-09-25T23:59:47.677000+00:00", "CreationTime": "2020-09-25T23:59:47.653000+00:00", "ContentRedaction": { "RedactionType": "PII", "RedactionOutput": "redacted_and_unredacted" } } }

Untuk informasi selengkapnya, lihat Redaksi Konten Otomatis di Panduan Pengembang Amazon Transcribe.

Contoh 10: Untuk menggunakan model bahasa kustom yang sebelumnya Anda buat untuk mentranskripsikan file audio.

start-transcription-jobContoh berikut mentranskripsikan file audio Anda dengan model bahasa khusus yang telah Anda buat sebelumnya.

aws transcribe start-transcription-job \ --cli-input-json file://mytenthfile.json

Isi dari mytenthfile.json:

{ "TranscriptionJobName": "cli-clm-2-job-1", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/your-audio-file.file-extension" }, "ModelSettings": { "LanguageModelName":"cli-clm-2" } }

Output:

{ "TranscriptionJob": { "TranscriptionJobName": "cli-clm-2-job-1", "TranscriptionJobStatus": "IN_PROGRESS", "LanguageCode": "language-code", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/your-audio-file.file-extension" }, "StartTime": "2020-09-28T17:56:01.835000+00:00", "CreationTime": "2020-09-28T17:56:01.801000+00:00", "ModelSettings": { "LanguageModelName": "cli-clm-2" } } }

Untuk informasi selengkapnya, lihat Meningkatkan Akurasi Transkripsi Khusus Domain dengan Model Bahasa Khusus di Panduan Pengembang Amazon Transcribe.

Java
SDK untuk Java 2.x
catatan

Ada lebih banyak tentang GitHub. Temukan contoh lengkapnya dan pelajari cara mengatur dan menjalankannya di AWS Repositori Contoh Kode.

public class TranscribeStreamingDemoApp { private static final Region REGION = Region.US_EAST_1; private static TranscribeStreamingAsyncClient client; public static void main(String args[]) throws URISyntaxException, ExecutionException, InterruptedException, LineUnavailableException { client = TranscribeStreamingAsyncClient.builder() .credentialsProvider(getCredentials()) .region(REGION) .build(); CompletableFuture<Void> result = client.startStreamTranscription(getRequest(16_000), new AudioStreamPublisher(getStreamFromMic()), getResponseHandler()); result.get(); client.close(); } private static InputStream getStreamFromMic() throws LineUnavailableException { // Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono int sampleRate = 16000; AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(TargetDataLine.class, format); if (!AudioSystem.isLineSupported(info)) { System.out.println("Line not supported"); System.exit(0); } TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info); line.open(format); line.start(); InputStream audioStream = new AudioInputStream(line); return audioStream; } private static AwsCredentialsProvider getCredentials() { return DefaultCredentialsProvider.create(); } private static StartStreamTranscriptionRequest getRequest(Integer mediaSampleRateHertz) { return StartStreamTranscriptionRequest.builder() .languageCode(LanguageCode.EN_US.toString()) .mediaEncoding(MediaEncoding.PCM) .mediaSampleRateHertz(mediaSampleRateHertz) .build(); } private static StartStreamTranscriptionResponseHandler getResponseHandler() { return StartStreamTranscriptionResponseHandler.builder() .onResponse(r -> { System.out.println("Received Initial response"); }) .onError(e -> { System.out.println(e.getMessage()); StringWriter sw = new StringWriter(); e.printStackTrace(new PrintWriter(sw)); System.out.println("Error Occurred: " + sw.toString()); }) .onComplete(() -> { System.out.println("=== All records stream successfully ==="); }) .subscriber(event -> { List<Result> results = ((TranscriptEvent) event).transcript().results(); if (results.size() > 0) { if (!results.get(0).alternatives().get(0).transcript().isEmpty()) { System.out.println(results.get(0).alternatives().get(0).transcript()); } } }) .build(); } private InputStream getStreamFromFile(String audioFileName) { try { File inputFile = new File(getClass().getClassLoader().getResource(audioFileName).getFile()); InputStream audioStream = new FileInputStream(inputFile); return audioStream; } catch (FileNotFoundException e) { throw new RuntimeException(e); } } private static class AudioStreamPublisher implements Publisher<AudioStream> { private final InputStream inputStream; private static Subscription currentSubscription; private AudioStreamPublisher(InputStream inputStream) { this.inputStream = inputStream; } @Override public void subscribe(Subscriber<? super AudioStream> s) { if (this.currentSubscription == null) { this.currentSubscription = new SubscriptionImpl(s, inputStream); } else { this.currentSubscription.cancel(); this.currentSubscription = new SubscriptionImpl(s, inputStream); } s.onSubscribe(currentSubscription); } } public static class SubscriptionImpl implements Subscription { private static final int CHUNK_SIZE_IN_BYTES = 1024 * 1; private final Subscriber<? super AudioStream> subscriber; private final InputStream inputStream; private ExecutorService executor = Executors.newFixedThreadPool(1); private AtomicLong demand = new AtomicLong(0); SubscriptionImpl(Subscriber<? super AudioStream> s, InputStream inputStream) { this.subscriber = s; this.inputStream = inputStream; } @Override public void request(long n) { if (n <= 0) { subscriber.onError(new IllegalArgumentException("Demand must be positive")); } demand.getAndAdd(n); executor.submit(() -> { try { do { ByteBuffer audioBuffer = getNextEvent(); if (audioBuffer.remaining() > 0) { AudioEvent audioEvent = audioEventFromBuffer(audioBuffer); subscriber.onNext(audioEvent); } else { subscriber.onComplete(); break; } } while (demand.decrementAndGet() > 0); } catch (Exception e) { subscriber.onError(e); } }); } @Override public void cancel() { executor.shutdown(); } private ByteBuffer getNextEvent() { ByteBuffer audioBuffer = null; byte[] audioBytes = new byte[CHUNK_SIZE_IN_BYTES]; int len = 0; try { len = inputStream.read(audioBytes); if (len <= 0) { audioBuffer = ByteBuffer.allocate(0); } else { audioBuffer = ByteBuffer.wrap(audioBytes, 0, len); } } catch (IOException e) { throw new UncheckedIOException(e); } return audioBuffer; } private AudioEvent audioEventFromBuffer(ByteBuffer bb) { return AudioEvent.builder() .audioChunk(SdkBytes.fromByteBuffer(bb)) .build(); } } }
JavaScript
SDK untuk JavaScript (v3)
catatan

Ada lebih banyak tentang GitHub. Temukan contoh lengkapnya dan pelajari cara mengatur dan menjalankannya di AWS Repositori Contoh Kode.

Mulai pekerjaan transkripsi.

// Import the required AWS SDK clients and commands for Node.js import { StartTranscriptionJobCommand } from "@aws-sdk/client-transcribe"; import { transcribeClient } from "./libs/transcribeClient.js"; // Set the parameters export const params = { TranscriptionJobName: "JOB_NAME", LanguageCode: "LANGUAGE_CODE", // For example, 'en-US' MediaFormat: "SOURCE_FILE_FORMAT", // For example, 'wav' Media: { MediaFileUri: "SOURCE_LOCATION", // For example, "https://transcribe-demo.s3-REGION.amazonaws.com/hello_world.wav" }, OutputBucketName: "OUTPUT_BUCKET_NAME" }; export const run = async () => { try { const data = await transcribeClient.send( new StartTranscriptionJobCommand(params) ); console.log("Success - put", data); return data; // For unit tests. } catch (err) { console.log("Error", err); } }; run();

Buat klien.

import { TranscribeClient } from "@aws-sdk/client-transcribe"; // Set the AWS Region. const REGION = "REGION"; //e.g. "us-east-1" // Create an Amazon Transcribe service client object. const transcribeClient = new TranscribeClient({ region: REGION }); export { transcribeClient };
Python
SDK untuk Python (Boto3)
catatan

Ada lebih banyak tentang GitHub. Temukan contoh lengkapnya dan pelajari cara mengatur dan menjalankannya di AWS Repositori Contoh Kode.

def start_job( job_name, media_uri, media_format, language_code, transcribe_client, vocabulary_name=None, ): """ Starts a transcription job. This function returns as soon as the job is started. To get the current status of the job, call get_transcription_job. The job is successfully completed when the job status is 'COMPLETED'. :param job_name: The name of the transcription job. This must be unique for your AWS account. :param media_uri: The URI where the audio file is stored. This is typically in an Amazon S3 bucket. :param media_format: The format of the audio file. For example, mp3 or wav. :param language_code: The language code of the audio file. For example, en-US or ja-JP :param transcribe_client: The Boto3 Transcribe client. :param vocabulary_name: The name of a custom vocabulary to use when transcribing the audio file. :return: Data about the job. """ try: job_args = { "TranscriptionJobName": job_name, "Media": {"MediaFileUri": media_uri}, "MediaFormat": media_format, "LanguageCode": language_code, } if vocabulary_name is not None: job_args["Settings"] = {"VocabularyName": vocabulary_name} response = transcribe_client.start_transcription_job(**job_args) job = response["TranscriptionJob"] logger.info("Started transcription job %s.", job_name) except ClientError: logger.exception("Couldn't start transcription job %s.", job_name) raise else: return job

Untuk daftar lengkap panduan pengembang AWS SDK dan contoh kode, lihatMenggunakan layanan ini dengan AWS SDK. Topik ini juga mencakup informasi tentang memulai dan detail tentang versi SDK sebelumnya.