쿠키 기본 설정 선택

당사는 사이트와 서비스를 제공하는 데 필요한 필수 쿠키 및 유사한 도구를 사용합니다. 고객이 사이트를 어떻게 사용하는지 파악하고 개선할 수 있도록 성능 쿠키를 사용해 익명의 통계를 수집합니다. 필수 쿠키는 비활성화할 수 없지만 '사용자 지정' 또는 ‘거부’를 클릭하여 성능 쿠키를 거부할 수 있습니다.

사용자가 동의하는 경우 AWS와 승인된 제3자도 쿠키를 사용하여 유용한 사이트 기능을 제공하고, 사용자의 기본 설정을 기억하고, 관련 광고를 비롯한 관련 콘텐츠를 표시합니다. 필수가 아닌 모든 쿠키를 수락하거나 거부하려면 ‘수락’ 또는 ‘거부’를 클릭하세요. 더 자세한 내용을 선택하려면 ‘사용자 정의’를 클릭하세요.

Amazon Data Firehose를 사용하여 개별 레코드 및 배치 레코드 처리

포커스 모드
Amazon Data Firehose를 사용하여 개별 레코드 및 배치 레코드 처리 - Amazon Data Firehose

기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다.

기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다.

다음 코드 예제에서는 Firehose를 사용하여 개별 및 배치 레코드를 처리하는 방법을 보여줍니다.

Java
SDK for Java 2.x
참고

GitHub에 더 많은 내용이 있습니다. AWS 코드 예시 리포지토리에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요.

이 예제에서는 개별 및 배치 레코드를 Firehose에 넣습니다.

/** * Amazon Firehose Scenario example using Java V2 SDK. * * Demonstrates individual and batch record processing, * and monitoring Firehose delivery stream metrics. */ public class FirehoseScenario { private static FirehoseClient firehoseClient; private static CloudWatchClient cloudWatchClient; public static void main(String[] args) { final String usage = """ Usage: <deliveryStreamName> Where: deliveryStreamName - The Firehose delivery stream name. """; if (args.length != 1) { System.out.println(usage); return; } String deliveryStreamName = args[0]; try { // Read and parse sample data. String jsonContent = readJsonFile("sample_records.json"); ObjectMapper objectMapper = new ObjectMapper(); List<Map<String, Object>> sampleData = objectMapper.readValue(jsonContent, new TypeReference<>() {}); // Process individual records. System.out.println("Processing individual records..."); sampleData.subList(0, 100).forEach(record -> { try { putRecord(record, deliveryStreamName); } catch (Exception e) { System.err.println("Error processing record: " + e.getMessage()); } }); // Monitor metrics. monitorMetrics(deliveryStreamName); // Process batch records. System.out.println("Processing batch records..."); putRecordBatch(sampleData.subList(100, sampleData.size()), 500, deliveryStreamName); monitorMetrics(deliveryStreamName); } catch (Exception e) { System.err.println("Scenario failed: " + e.getMessage()); } finally { closeClients(); } } private static FirehoseClient getFirehoseClient() { if (firehoseClient == null) { firehoseClient = FirehoseClient.create(); } return firehoseClient; } private static CloudWatchClient getCloudWatchClient() { if (cloudWatchClient == null) { cloudWatchClient = CloudWatchClient.create(); } return cloudWatchClient; } /** * Puts a record to the specified Amazon Kinesis Data Firehose delivery stream. * * @param record The record to be put to the delivery stream. The record must be a {@link Map} of String keys and Object values. * @param deliveryStreamName The name of the Amazon Kinesis Data Firehose delivery stream to which the record should be put. * @throws IllegalArgumentException if the input record or delivery stream name is null or empty. * @throws RuntimeException if there is an error putting the record to the delivery stream. */ public static void putRecord(Map<String, Object> record, String deliveryStreamName) { if (record == null || deliveryStreamName == null || deliveryStreamName.isEmpty()) { throw new IllegalArgumentException("Invalid input: record or delivery stream name cannot be null/empty"); } try { String jsonRecord = new ObjectMapper().writeValueAsString(record); Record firehoseRecord = Record.builder() .data(SdkBytes.fromByteArray(jsonRecord.getBytes(StandardCharsets.UTF_8))) .build(); PutRecordRequest putRecordRequest = PutRecordRequest.builder() .deliveryStreamName(deliveryStreamName) .record(firehoseRecord) .build(); getFirehoseClient().putRecord(putRecordRequest); System.out.println("Record sent: " + jsonRecord); } catch (Exception e) { throw new RuntimeException("Failed to put record: " + e.getMessage(), e); } } /** * Puts a batch of records to an Amazon Kinesis Data Firehose delivery stream. * * @param records a list of maps representing the records to be sent * @param batchSize the maximum number of records to include in each batch * @param deliveryStreamName the name of the Kinesis Data Firehose delivery stream * @throws IllegalArgumentException if the input parameters are invalid (null or empty) * @throws RuntimeException if there is an error putting the record batch */ public static void putRecordBatch(List<Map<String, Object>> records, int batchSize, String deliveryStreamName) { if (records == null || records.isEmpty() || deliveryStreamName == null || deliveryStreamName.isEmpty()) { throw new IllegalArgumentException("Invalid input: records or delivery stream name cannot be null/empty"); } ObjectMapper objectMapper = new ObjectMapper(); try { for (int i = 0; i < records.size(); i += batchSize) { List<Map<String, Object>> batch = records.subList(i, Math.min(i + batchSize, records.size())); List<Record> batchRecords = batch.stream().map(record -> { try { String jsonRecord = objectMapper.writeValueAsString(record); return Record.builder() .data(SdkBytes.fromByteArray(jsonRecord.getBytes(StandardCharsets.UTF_8))) .build(); } catch (Exception e) { throw new RuntimeException("Error creating Firehose record", e); } }).collect(Collectors.toList()); PutRecordBatchRequest request = PutRecordBatchRequest.builder() .deliveryStreamName(deliveryStreamName) .records(batchRecords) .build(); PutRecordBatchResponse response = getFirehoseClient().putRecordBatch(request); if (response.failedPutCount() > 0) { response.requestResponses().stream() .filter(r -> r.errorCode() != null) .forEach(r -> System.err.println("Failed record: " + r.errorMessage())); } System.out.println("Batch sent with size: " + batchRecords.size()); } } catch (Exception e) { throw new RuntimeException("Failed to put record batch: " + e.getMessage(), e); } } public static void monitorMetrics(String deliveryStreamName) { Instant endTime = Instant.now(); Instant startTime = endTime.minusSeconds(600); List<String> metrics = List.of("IncomingBytes", "IncomingRecords", "FailedPutCount"); metrics.forEach(metric -> monitorMetric(metric, startTime, endTime, deliveryStreamName)); } private static void monitorMetric(String metricName, Instant startTime, Instant endTime, String deliveryStreamName) { try { GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder() .namespace("AWS/Firehose") .metricName(metricName) .dimensions(Dimension.builder().name("DeliveryStreamName").value(deliveryStreamName).build()) .startTime(startTime) .endTime(endTime) .period(60) .statistics(Statistic.SUM) .build(); GetMetricStatisticsResponse response = getCloudWatchClient().getMetricStatistics(request); double totalSum = response.datapoints().stream().mapToDouble(Datapoint::sum).sum(); System.out.println(metricName + ": " + totalSum); } catch (Exception e) { System.err.println("Failed to monitor metric " + metricName + ": " + e.getMessage()); } } public static String readJsonFile(String fileName) throws IOException { try (InputStream inputStream = FirehoseScenario.class.getResourceAsStream("/" + fileName); Scanner scanner = new Scanner(inputStream, StandardCharsets.UTF_8)) { return scanner.useDelimiter("\\\\A").next(); } catch (Exception e) { throw new RuntimeException("Error reading file: " + fileName, e); } } private static void closeClients() { try { if (firehoseClient != null) firehoseClient.close(); if (cloudWatchClient != null) cloudWatchClient.close(); } catch (Exception e) { System.err.println("Error closing clients: " + e.getMessage()); } } }
  • API 세부 정보는 AWS SDK for Java 2.x API 참조의 다음 주제를 참조하십시오.

Python
SDK for Python (Boto3)
참고

GitHub에 더 많은 내용이 있습니다. AWS 코드 예시 리포지토리에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요.

이 스크립트는 개별 레코드 및 배치 레코드를 Firehose에 넣습니다.

import json import logging import random from datetime import datetime, timedelta import backoff import boto3 from config import get_config def load_sample_data(path: str) -> dict: """ Load sample data from a JSON file. Args: path (str): The file path to the JSON file containing sample data. Returns: dict: The loaded sample data as a dictionary. """ with open(path, "r") as f: return json.load(f) # Configure logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class FirehoseClient: """ AWS Firehose client to send records and monitor metrics. Attributes: config (object): Configuration object with delivery stream name and region. delivery_stream_name (str): Name of the Firehose delivery stream. region (str): AWS region for Firehose and CloudWatch clients. firehose (boto3.client): Boto3 Firehose client. cloudwatch (boto3.client): Boto3 CloudWatch client. """ def __init__(self, config): """ Initialize the FirehoseClient. Args: config (object): Configuration object with delivery stream name and region. """ self.config = config self.delivery_stream_name = config.delivery_stream_name self.region = config.region self.firehose = boto3.client("firehose", region_name=self.region) self.cloudwatch = boto3.client("cloudwatch", region_name=self.region) @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record(self, record: dict): """ Put individual records to Firehose with backoff and retry. Args: record (dict): The data record to be sent to Firehose. This method attempts to send an individual record to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ try: entry = self._create_record_entry(record) response = self.firehose.put_record( DeliveryStreamName=self.delivery_stream_name, Record=entry ) self._log_response(response, entry) except Exception: logger.info(f"Fail record: {record}.") raise @backoff.on_exception( backoff.expo, Exception, max_tries=5, jitter=backoff.full_jitter ) def put_record_batch(self, data: list, batch_size: int = 500): """ Put records in batches to Firehose with backoff and retry. Args: data (list): List of data records to be sent to Firehose. batch_size (int): Number of records to send in each batch. Default is 500. This method attempts to send records in batches to the Firehose delivery stream. It retries with exponential backoff in case of exceptions. """ for i in range(0, len(data), batch_size): batch = data[i : i + batch_size] record_dicts = [{"Data": json.dumps(record)} for record in batch] try: response = self.firehose.put_record_batch( DeliveryStreamName=self.delivery_stream_name, Records=record_dicts ) self._log_batch_response(response, len(batch)) except Exception as e: logger.info(f"Failed to send batch of {len(batch)} records. Error: {e}") def get_metric_statistics( self, metric_name: str, start_time: datetime, end_time: datetime, period: int, statistics: list = ["Sum"], ) -> list: """ Retrieve metric statistics from CloudWatch. Args: metric_name (str): The name of the metric. start_time (datetime): The start time for the metric statistics. end_time (datetime): The end time for the metric statistics. period (int): The granularity, in seconds, of the returned data points. statistics (list): A list of statistics to retrieve. Default is ['Sum']. Returns: list: List of datapoints containing the metric statistics. """ response = self.cloudwatch.get_metric_statistics( Namespace="AWS/Firehose", MetricName=metric_name, Dimensions=[ {"Name": "DeliveryStreamName", "Value": self.delivery_stream_name}, ], StartTime=start_time, EndTime=end_time, Period=period, Statistics=statistics, ) return response["Datapoints"] def monitor_metrics(self): """ Monitor Firehose metrics for the last 5 minutes. This method retrieves and logs the 'IncomingBytes', 'IncomingRecords', and 'FailedPutCount' metrics from CloudWatch for the last 5 minutes. """ end_time = datetime.utcnow() start_time = end_time - timedelta(minutes=10) period = int((end_time - start_time).total_seconds()) metrics = { "IncomingBytes": self.get_metric_statistics( "IncomingBytes", start_time, end_time, period ), "IncomingRecords": self.get_metric_statistics( "IncomingRecords", start_time, end_time, period ), "FailedPutCount": self.get_metric_statistics( "FailedPutCount", start_time, end_time, period ), } for metric, datapoints in metrics.items(): if datapoints: total_sum = sum(datapoint["Sum"] for datapoint in datapoints) if metric == "IncomingBytes": logger.info( f"{metric}: {round(total_sum)} ({total_sum / (1024 * 1024):.2f} MB)" ) else: logger.info(f"{metric}: {round(total_sum)}") else: logger.info(f"No data found for {metric} over the last 5 minutes") def _create_record_entry(self, record: dict) -> dict: """ Create a record entry for Firehose. Args: record (dict): The data record to be sent. Returns: dict: The record entry formatted for Firehose. Raises: Exception: If a simulated network error occurs. """ if random.random() < 0.2: raise Exception("Simulated network error") elif random.random() < 0.1: return {"Data": '{"malformed": "data"'} else: return {"Data": json.dumps(record)} def _log_response(self, response: dict, entry: dict): """ Log the response from Firehose. Args: response (dict): The response from the Firehose put_record API call. entry (dict): The record entry that was sent. """ if response["ResponseMetadata"]["HTTPStatusCode"] == 200: logger.info(f"Sent record: {entry}") else: logger.info(f"Fail record: {entry}") def _log_batch_response(self, response: dict, batch_size: int): """ Log the batch response from Firehose. Args: response (dict): The response from the Firehose put_record_batch API call. batch_size (int): The number of records in the batch. """ if response.get("FailedPutCount", 0) > 0: logger.info( f'Failed to send {response["FailedPutCount"]} records in batch of {batch_size}' ) else: logger.info(f"Successfully sent batch of {batch_size} records") if __name__ == "__main__": config = get_config() data = load_sample_data(config.sample_data_file) client = FirehoseClient(config) # Process the first 100 sample network records for record in data[:100]: try: client.put_record(record) except Exception as e: logger.info(f"Put record failed after retries and backoff: {e}") client.monitor_metrics() # Process remaining records using the batch method try: client.put_record_batch(data[100:]) except Exception as e: logger.info(f"Put record batch failed after retries and backoff: {e}") client.monitor_metrics()

이 파일에는 위 스크립트에 대한 구성이 포함되어 있습니다.

class Config: def __init__(self): self.delivery_stream_name = "ENTER YOUR DELIVERY STREAM NAME HERE" self.region = "us-east-1" self.sample_data_file = ( "../../../../../scenarios/features/firehose/resources/sample_records.json" ) def get_config(): return Config()
  • API 세부 정보는 AWS SDK for Python (Boto3) API 참조의 다음 주제를 참조하세요.

SDK for Java 2.x
참고

GitHub에 더 많은 내용이 있습니다. AWS 코드 예시 리포지토리에서 전체 예시를 찾고 설정 및 실행하는 방법을 배워보세요.

이 예제에서는 개별 및 배치 레코드를 Firehose에 넣습니다.

/** * Amazon Firehose Scenario example using Java V2 SDK. * * Demonstrates individual and batch record processing, * and monitoring Firehose delivery stream metrics. */ public class FirehoseScenario { private static FirehoseClient firehoseClient; private static CloudWatchClient cloudWatchClient; public static void main(String[] args) { final String usage = """ Usage: <deliveryStreamName> Where: deliveryStreamName - The Firehose delivery stream name. """; if (args.length != 1) { System.out.println(usage); return; } String deliveryStreamName = args[0]; try { // Read and parse sample data. String jsonContent = readJsonFile("sample_records.json"); ObjectMapper objectMapper = new ObjectMapper(); List<Map<String, Object>> sampleData = objectMapper.readValue(jsonContent, new TypeReference<>() {}); // Process individual records. System.out.println("Processing individual records..."); sampleData.subList(0, 100).forEach(record -> { try { putRecord(record, deliveryStreamName); } catch (Exception e) { System.err.println("Error processing record: " + e.getMessage()); } }); // Monitor metrics. monitorMetrics(deliveryStreamName); // Process batch records. System.out.println("Processing batch records..."); putRecordBatch(sampleData.subList(100, sampleData.size()), 500, deliveryStreamName); monitorMetrics(deliveryStreamName); } catch (Exception e) { System.err.println("Scenario failed: " + e.getMessage()); } finally { closeClients(); } } private static FirehoseClient getFirehoseClient() { if (firehoseClient == null) { firehoseClient = FirehoseClient.create(); } return firehoseClient; } private static CloudWatchClient getCloudWatchClient() { if (cloudWatchClient == null) { cloudWatchClient = CloudWatchClient.create(); } return cloudWatchClient; } /** * Puts a record to the specified Amazon Kinesis Data Firehose delivery stream. * * @param record The record to be put to the delivery stream. The record must be a {@link Map} of String keys and Object values. * @param deliveryStreamName The name of the Amazon Kinesis Data Firehose delivery stream to which the record should be put. * @throws IllegalArgumentException if the input record or delivery stream name is null or empty. * @throws RuntimeException if there is an error putting the record to the delivery stream. */ public static void putRecord(Map<String, Object> record, String deliveryStreamName) { if (record == null || deliveryStreamName == null || deliveryStreamName.isEmpty()) { throw new IllegalArgumentException("Invalid input: record or delivery stream name cannot be null/empty"); } try { String jsonRecord = new ObjectMapper().writeValueAsString(record); Record firehoseRecord = Record.builder() .data(SdkBytes.fromByteArray(jsonRecord.getBytes(StandardCharsets.UTF_8))) .build(); PutRecordRequest putRecordRequest = PutRecordRequest.builder() .deliveryStreamName(deliveryStreamName) .record(firehoseRecord) .build(); getFirehoseClient().putRecord(putRecordRequest); System.out.println("Record sent: " + jsonRecord); } catch (Exception e) { throw new RuntimeException("Failed to put record: " + e.getMessage(), e); } } /** * Puts a batch of records to an Amazon Kinesis Data Firehose delivery stream. * * @param records a list of maps representing the records to be sent * @param batchSize the maximum number of records to include in each batch * @param deliveryStreamName the name of the Kinesis Data Firehose delivery stream * @throws IllegalArgumentException if the input parameters are invalid (null or empty) * @throws RuntimeException if there is an error putting the record batch */ public static void putRecordBatch(List<Map<String, Object>> records, int batchSize, String deliveryStreamName) { if (records == null || records.isEmpty() || deliveryStreamName == null || deliveryStreamName.isEmpty()) { throw new IllegalArgumentException("Invalid input: records or delivery stream name cannot be null/empty"); } ObjectMapper objectMapper = new ObjectMapper(); try { for (int i = 0; i < records.size(); i += batchSize) { List<Map<String, Object>> batch = records.subList(i, Math.min(i + batchSize, records.size())); List<Record> batchRecords = batch.stream().map(record -> { try { String jsonRecord = objectMapper.writeValueAsString(record); return Record.builder() .data(SdkBytes.fromByteArray(jsonRecord.getBytes(StandardCharsets.UTF_8))) .build(); } catch (Exception e) { throw new RuntimeException("Error creating Firehose record", e); } }).collect(Collectors.toList()); PutRecordBatchRequest request = PutRecordBatchRequest.builder() .deliveryStreamName(deliveryStreamName) .records(batchRecords) .build(); PutRecordBatchResponse response = getFirehoseClient().putRecordBatch(request); if (response.failedPutCount() > 0) { response.requestResponses().stream() .filter(r -> r.errorCode() != null) .forEach(r -> System.err.println("Failed record: " + r.errorMessage())); } System.out.println("Batch sent with size: " + batchRecords.size()); } } catch (Exception e) { throw new RuntimeException("Failed to put record batch: " + e.getMessage(), e); } } public static void monitorMetrics(String deliveryStreamName) { Instant endTime = Instant.now(); Instant startTime = endTime.minusSeconds(600); List<String> metrics = List.of("IncomingBytes", "IncomingRecords", "FailedPutCount"); metrics.forEach(metric -> monitorMetric(metric, startTime, endTime, deliveryStreamName)); } private static void monitorMetric(String metricName, Instant startTime, Instant endTime, String deliveryStreamName) { try { GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder() .namespace("AWS/Firehose") .metricName(metricName) .dimensions(Dimension.builder().name("DeliveryStreamName").value(deliveryStreamName).build()) .startTime(startTime) .endTime(endTime) .period(60) .statistics(Statistic.SUM) .build(); GetMetricStatisticsResponse response = getCloudWatchClient().getMetricStatistics(request); double totalSum = response.datapoints().stream().mapToDouble(Datapoint::sum).sum(); System.out.println(metricName + ": " + totalSum); } catch (Exception e) { System.err.println("Failed to monitor metric " + metricName + ": " + e.getMessage()); } } public static String readJsonFile(String fileName) throws IOException { try (InputStream inputStream = FirehoseScenario.class.getResourceAsStream("/" + fileName); Scanner scanner = new Scanner(inputStream, StandardCharsets.UTF_8)) { return scanner.useDelimiter("\\\\A").next(); } catch (Exception e) { throw new RuntimeException("Error reading file: " + fileName, e); } } private static void closeClients() { try { if (firehoseClient != null) firehoseClient.close(); if (cloudWatchClient != null) cloudWatchClient.close(); } catch (Exception e) { System.err.println("Error closing clients: " + e.getMessage()); } } }
  • API 세부 정보는 AWS SDK for Java 2.x API 참조의 다음 주제를 참조하십시오.

AWS SDK 개발자 가이드 및 코드 예제의 전체 목록은 섹션을 참조하세요AWS SDK와 함께 Firehose 사용. 이 주제에는 시작하기에 대한 정보와 이전 SDK 버전에 대한 세부 정보도 포함되어 있습니다.

프라이버시사이트 이용 약관쿠키 기본 설정
© 2025, Amazon Web Services, Inc. 또는 계열사. All rights reserved.