Using Amazon Translate to translate large documents - Amazon Translate

Using Amazon Translate to translate large documents

You can split large documents into smaller parts to keep the total document size below the document size limit. For more information about document size limits, see Service quotas. The following Java program breaks long text documents into individual sentences and then translates each sentence from the source language to the target language. The program contains two sections:

  • The SentenceSegmenter class that is responsible for breaking the source string into individual sentences. The sample uses the Java BreakIterator class.

  • The main function that calls the Translate operation for each sentence in the source string. The main function also handles authentication with Amazon Translate.

To configure the example
  1. Install and configure the AWS SDK for Java. For instructions for installing the SDK for Java, see Set up the AWS SDK for Java.

  2. Make sure you have the minimum required permissions to run this example. For the required permissions policies, see Identity-based policies for Amazon Translate.

  3. Set up the credentials needed to run the sample. For instructions, see Set up AWS Credentials and Region for Development in the AWS SDK for Java Developer Guide.

  4. Create a new project in your Java IDE and copy the source code.

  5. Change the region to the region where you want to run the Amazon Translate operation. For a list of supported regions for Amazon Translate, see AWS Regions and Endpoints in the AWS General Reference.

  6. Change the source and target languages to the languages to translate between.

  7. Run the sample to see the translated text on standard output.

import com.amazonaws.auth.AWSCredentialsProviderChain; import com.amazonaws.auth.EnvironmentVariableCredentialsProvider; import com.amazonaws.auth.SystemPropertiesCredentialsProvider; import com.amazonaws.auth.profile.ProfileCredentialsProvider; import; import; import; import; import java.text.BreakIterator; import java.util.ArrayList; import java.util.List; import java.util.Locale; public class MultiSentenceTranslator { public static void main(String[] args) { // Define the text to be translated here String region = "region"; String text = "Text to be translated"; String sourceLang = "source language"; String targetLang = "target language"; // Break text into sentences SentenceSegmenter sentenceSegmenter = new SentenceSegmenter(); List<String> sentences = new ArrayList<>(); try { sentences = sentenceSegmenter.segment(text, sourceLang); } catch (Exception e) { System.out.println(e); System.exit(1); } // Create credentials using a provider chain that will evaluate in order; // a) Any Java system properties // b) Any environment variables // c) Any profile file AWSCredentialsProviderChain DefaultAWSCredentialsProviderChain = new AWSCredentialsProviderChain( new SystemPropertiesCredentialsProvider(), new EnvironmentVariableCredentialsProvider(), new ProfileCredentialsProvider() ); // Create an Amazon Translate client AmazonTranslate translate = AmazonTranslateClient.builder() .withCredentials(DefaultAWSCredentialsProviderChain) .withRegion(region) .build(); // Translate sentences and print the results to stdout for (String sentence : sentences) { TranslateTextRequest request = new TranslateTextRequest() .withText(sentence) .withSourceLanguageCode(sourceLang) .withTargetLanguageCode(targetLang); TranslateTextResult result = translate.translateText(request); System.out.println("Original text: " + sentence); System.out.println("Translated text: " + result.getTranslatedText()); } } } class SentenceSegmenter { public List<String> segment(final String text, final String lang) throws Exception { List<String> res = new ArrayList<>(); BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(new Locale(lang)); sentenceIterator.setText(text); int prevBoundary = sentenceIterator.first(); int curBoundary =; while (curBoundary != BreakIterator.DONE) { String sentence = text.substring(prevBoundary, curBoundary); res.add(sentence); prevBoundary = curBoundary; curBoundary =; } return res; } }