本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用有毒語音檢測
在批量轉錄中使用有毒語音檢測
若要將有毒語音偵測與批次轉錄搭配使用,請參閱下列範例:
-
在導覽窗格中,選擇轉錄工作」,然後選取建立工作(右上)。這將打開指定工作詳情頁面。
-
在「」指定工作詳情頁面中,您也可以根據需要啟用 PII 密文。請注意,毒性偵測不支援其他列出的選項。選取 下一步。這將帶您到設定工作-選擇性頁面。在音訊設定」面板中,選取毒性檢測。
-
选择建立工作執行您的轉錄工作。
-
轉錄工作完成後,您可以從下載轉錄工作詳細資訊頁面中的下拉式選單。
此範例使用start-transcription-jobToxicityDetection
參數。如需詳細資訊,請參閱 StartTranscriptionJob
及 ToxicityDetection
。
aws transcribe start-transcription-job \ --region
us-west-2
\ --transcription-job-namemy-first-transcription-job
\ --media MediaFileUri=s3://DOC-EXAMPLE-BUCKET/my-input-files/my-media-file.flac
\ --output-bucket-nameDOC-EXAMPLE-BUCKET
\ --output-keymy-output-files/
\ --language-code en-US \ --toxicity-detection ToxicityCategories=ALL
這是另一個使用start-transcription-job
aws transcribe start-transcription-job \ --region
us-west-2
\ --cli-input-jsonfile://filepath/my-first-toxicity-job.json
該文件my-first-toxicity-job.json包含下列要求主體。
{ "TranscriptionJobName": "
my-first-transcription-job
", "Media": { "MediaFileUri": "s3://DOC-EXAMPLE-BUCKET/my-input-files/my-media-file.flac
" }, "OutputBucketName": "DOC-EXAMPLE-BUCKET
", "OutputKey": "my-output-files/
", "LanguageCode": "en-US", "ToxicityDetection": [ { "ToxicityCategories": [ "ALL" ] } ] }
此範例使用AWS SDK for Python (Boto3)以啟用ToxicityDetection
為開始轉錄工作StartTranscriptionJob
及 ToxicityDetection
。
如需使用AWSSDK (包括特定功能、案例和跨服務範例) 請參閱Amazon Transcribe 使用的代碼示例 AWS SDKs章節。
from __future__ import print_function import time import boto3 transcribe = boto3.client('transcribe', '
us-west-2
') job_name = "my-first-transcription-job
" job_uri = "s3://DOC-EXAMPLE-BUCKET/my-input-files/my-media-file.flac
" transcribe.start_transcription_job( TranscriptionJobName = job_name, Media = { 'MediaFileUri': job_uri }, OutputBucketName = 'DOC-EXAMPLE-BUCKET
', OutputKey = 'my-output-files/
', LanguageCode = 'en-US', ToxicityDetection = [ { 'ToxicityCategories': ['ALL'] } ] ) while True: status = transcribe.get_transcription_job(TranscriptionJobName = job_name) if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']: break print("Not ready yet...") time.sleep(5) print(status)
範例輸出
有毒語音在您的轉錄輸出中被標記並分類。有毒語音的每個執行個體都會分類並指派一個置信度分數 (介於 0 和 1 之間的值)。較大的可信度值表示內容在指定類別中是有毒語音的可能性就越大。
以下是 JSON 格式的範例輸出,顯示分類的有毒語音與相關聯的可信度分數。
{ "jobName": "
my-toxicity-job
", "accountId": "111122223333
", "results": { "transcripts": [...], "items":[...], "toxicity_detection": [ { "text": "What the * are you doing man? That's why I didn't want to play with your * . man it was a no, no I'm not calming down * man. I well I spent I spent too much * money on this game.", "toxicity": 0.7638, "categories": { "profanity": 0.9913, "hate_speech": 0.0382, "sexual": 0.0016, "insult": 0.6572, "violence_or_threat": 0.0024, "graphic": 0.0013, "harassment_or_abuse": 0.0249 }, "start_time": 8.92, "end_time": 21.45 }, Items removed for brevity { "text": "What? Who? What the * did you just say to me? What's your address? What is your * address? I will pull up right now on your * * man. Take your * back to , tired of this **.", "toxicity": 0.9816, "categories": { "profanity": 0.9865, "hate_speech": 0.9123, "sexual": 0.0037, "insult": 0.5447, "violence_or_threat": 0.5078, "graphic": 0.0037, "harassment_or_abuse": 0.0613 }, "start_time": 43.459, "end_time": 54.639 }, ] }, ... "status": "COMPLETED" }