Tracing applications with AWS X-Ray
A request through your application might consist of calls to databases, applications, and web services running in on-premises servers, Amazon EC2, containers, or Lambda. By implementing application tracing, you can quickly identify the root cause of issues in your applications that use distributed components and services. You can use AWS X-Ray to trace your application requests across multiple components. X-Ray samples and visualizes requests on a service graph when they flow through your application components and each component is represented as a segment. X-Ray generates trace identifiers so that you can correlate a request when it flows through multiple components, which helps you view the request from end to end. You can further enhance this by including annotations and metadata to help uniquely search for and identify the characteristics of a request.
We recommend that you configure and instrument each server or endpoint in your application with X-Ray. X-Ray is implemented in your application code by making calls to the X-Ray service. X-Ray also provides AWS SDKs for multiple languages, including instrumented clients that automatically send data to X-Ray. The X-Ray SDKs provide patches to common libraries used for making calls to other services (for example, HTTP, MySQL, PostgreSQL, or MongoDB).
X-Ray provides an X-Ray daemon that you can install and run on Amazon EC2 and Amazon ECS to relay data to X-Ray. X-Ray creates traces for your application that capture performance data from the servers and containers running the X-Ray daemon that serviced the request. X-Ray automatically instruments your calls to AWS services, such as Amazon DynamoDB, as subsegments through patching the AWS SDK. X-Ray can also automatically integrate with Lambda functions.
If your application components make calls to external services that can't configure and install the X-Ray daemon or instrument the code, you can create subsegments to wrap calls to external services. X-Ray correlates CloudWatch logs and metrics with your application traces if you are using the AWS X-Ray SDK for Java, which means you can quickly analyze the related metrics and logs for requests.
Deploying the X-Ray daemon to trace applications and services on Amazon EC2
You need to install and run the X-Ray daemon on the EC2 instances that your application components or microservices run on. You can use a user data script to deploy the X-Ray daemon when EC2 instances are provisioned or you can include it in the AMI build process if you create your own AMIs. This can be particularly useful when EC2 instances are ephemeral.
You should use State Manager to ensure that the X-Ray daemon is consistently installed on your EC2 instances. For Amazon EC2 Windows instances, you can use the Systems Manager AWS-RunPowerShellScript document to run the Windows script that downloads and installs the X-Ray agent. For EC2 instances on Linux, you can use the AWS-RunShellScript document to run the Linux script that downloads and installs the agent as a service.
You can use the Systems Manager AWS-RunRemoteScript document to run the script in a multi-account environment. You
must create an S3 bucket that is accessible from all your accounts and we recommend creating an
S3 bucket with an organization-based bucket policy
You can also configure State Manager to associate the scripts to EC2 instances that have the
X-Ray agent installed. Because all of your EC2 instances might not require or use X-Ray, you
can target the association with instance tags. For example, you can create the State Manager
association based on the presence of InstallAWSXRayDaemonWindows
or
InstallAWSXRayDaemonLinux
tags.
Deploying the X-Ray daemon to trace applications and services on Amazon ECS or Amazon EKS
You can deploy the X-Ray
daemon
For Amazon EKS, you can define the X-Ray daemon in your application's pod definition and then your application can connect to the daemon over localhost on the container port that you specified.
Configuring Lambda to trace requests to X-Ray
Your application might include calls to Lambda functions. You don’t need to install the X-Ray daemon for Lambda because the daemon process is fully managed by Lambda and cannot be configured by the user. You can enable it for your Lambda function by using the AWS Management Console and checking the Active Tracing option in the X-Ray console.
For further instrumentation, you can bundle the X-Ray SDK with your Lambda function to record outgoing calls and add annotations or metadata.
Instrumenting your applications for X-Ray
You should evaluate the X-Ray SDK that aligns with your application's programming language and classify all calls that your application makes to other systems. Review the clients provided by the library that you chose and see if the SDK can automatically instrument tracing for your application's request or response. Determine if the clients provided by the SDK can be used for other downstream systems. For external systems that your application calls and that you can’t instrument with X-Ray, you should create a custom subsegments to capture and identify them in your trace information.
When you instrument your application, make sure that you create annotations to help you to
identify and search for requests. For example, your application might use an identifier for
customers, such as customer id
, or segment different users based on their role in the
application.
You can create a maximum of 50 annotations for each trace but you can create a metadata object containing one or more fields as long as the segment document doesn’t exceed 64 kilobytes. You should selectively use annotations to locate information and use the metadata object to provide more context that helps troubleshoot the request after it is located.
Configuring the X-Ray sampling rules
By customizing sampling rules, you can control the amount of data that you record and modify the sampling behavior without modifying or redeploying your code. Sampling rules tell the X-Ray SDK how many requests to record for a set of criteria. By default, the X-Ray SDK records the first request each second and five percent of any additional requests. One request per second is the reservoir. This ensures that at least one trace is recorded each second as long as the service is serving requests. Five percent is the rate at which additional requests are sampled beyond the reservoir size.
You should review and update the default configuration to determine an appropriate value for your account. Your requirements might vary in development, test, performance test, and production environments. You might have applications that require their own sampling rules based on the amount of traffic that they receive or their level of criticality. You should begin with a baseline and regularly re-evaluate whether the baseline meets your requirements.