App Mesh connectivity troubleshooting
Important
End of support notice: On September 30, 2026, AWS will discontinue support for AWS App Mesh. After September 30, 2026, you will no longer be able to access the AWS App Mesh console or AWS App Mesh resources. For more information, visit this blog post Migrating from AWS App Mesh to Amazon ECS Service Connect
This topic details common issues that you may experience with App Mesh connectivity.
Unable to resolve DNS name for a virtual service
Symptoms
Your application is unable to resolve the DNS name of a virtual service that it is attempting to connect to.
Resolution
This is a known issue. For more information, see the Name VirtualServices by any
hostname or FQDNA
record for the virtual service name and the application
can resolve the virtual service name, the request will be proxied by Envoy and routed to its
appropriate destination. To resolve the issue, add a DNS A
record to any
non-loopback IP address, such as 10.10.10.10
, for the virtual service name. The
DNS A
record can be added under the following conditions:
-
In Amazon Route 53, if the name is suffixed by your private hosted zone name
-
Within the application container's
/etc/hosts
file -
In a third-party DNS server that you manage
If your issue is still not resolved, then consider opening a GitHub issue
Unable to connect to a virtual service backend
Symptoms
Your application is unable to establish a connection to a virtual service defined as a
backend on your virtual node. When attempting to establish a connection, the connection may
fail entirely, or the request from the application's perspective may fail with an HTTP
503
response code.
Resolution
If the application fails to connect at all (no HTTP 503
response code returned),
then do the following:
-
Make sure that your compute environment has been set up to work with App Mesh.
-
For Amazon ECS, make sure that you have the appropriate proxy configuration enabled. For an end-to-end walkthrough, see Getting Started with App Mesh and Amazon ECS.
-
For Kubernetes, including Amazon EKS, make sure that you have the latest App Mesh controller installed via Helm. For more information, see App Mesh Controller
on Helm Hub or Tutorial: Configure App Mesh integration with Kubernetes. -
For Amazon EC2, make sure that you have setup your Amazon EC2 instance for proxying App Mesh traffic. For more information, see Update services.
-
-
Make sure that the Envoy container that is running on your compute service has successfully connected to the App Mesh Envoy management service. You can confirm this by checking Envoy stats for the field
control_plane.connected_state
. For more information oncontrol_plane.connected_state
, see Monitor the Envoy Proxy Connectivity in our Troubleshooting Best Practices.If the Envoy was able to establish the connection initially, but later was disconnected and never reconnected, see Envoy disconnected from App Mesh Envoy management service with error text to troubleshoot why it was disconnected.
If the application connects but the request fails with an HTTP 503
response code,
try the following:
-
Make sure that the virtual service you're connecting to exists in the mesh.
-
Make sure that the virtual service has a provider (a virtual router or virtual node).
-
When using Envoy as an HTTP Proxy, if you're seeing egress traffic coming into
cluster.cds_egress_*_mesh-allow-all
instead of the correct destination through Envoy stats, most likely Envoy isn't routing requests properly throughfilter_chains
. This can be a result of using an unqualified virtual service name. We recommend that you use the service discovery name of the actual service as the virtual service name, because Envoy proxy communicates with other virtual services through their names.For more information, see virtual services.
-
Inspect the Envoy proxy logs for any of the following error messages:
-
No healthy upstream
– The virtual node that the Envoy proxy is attempting to route to does not have any resolved endpoints, or it does not have any healthy endpoints. Make sure that the target virtual node has the correct service discovery and health check settings.If requests to the service are failing during a deployment or scaling of the backend virtual service, follow the guidance in Some requests fail with HTTP status code 503 when a virtual service has a virtual node provider.
-
No cluster match for URL
– This is most likely caused when a request is sent to a virtual service that does not match the criteria defined by any of the routes defined under a virtual router provider. Make sure that the requests from the application are sent to a supported route by ensuring the path and HTTP request headers are correct. -
No matching filter chain found
– This is most likely caused when a request is sent to a virtual service on an invalid port. Make sure that the requests from the application are using the same port specified on the virtual router.
-
If your issue is still not resolved, then consider opening a GitHub issue
Unable to connect to an external service
Symptoms
Your application is unable to connect to a service outside of the mesh, such as
amazon.com
.
Resolution
By default, App Mesh does not allow outbound traffic from applications within the mesh to any destination outside of the mesh. To enable communication with an external service, there are two options:
-
Set the outbound filter on the mesh resource to
ALLOW_ALL
. This setting will allow any application within the mesh to communicate with any destination IP address inside or outside of the mesh. -
Model the external service in the mesh using a virtual service, virtual router, route, and virtual node. For example, to model the external service
example.com
, you can create a virtual service namedexample.com
with a virtual router and route that sends all traffic to a virtual node with a DNS service discovery hostname ofexample.com
.
If your issue is still not resolved, then consider opening a GitHub issue
Unable to connect to a MySQL or SMTP server
Symptoms
When allowing outbound traffic to all destinations (Mesh EgressFilter
type
=ALLOW_ALL
), such as an SMTP server or a MySQL database using a
virtual node definition, the connection from your application fails. As an example, the
following is an error message from attempting to connect to a MySQL server.
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
Resolution
This is a known issue that is resolved by using App Mesh image version 1.15.0 or later. For
more information, see the Unable to connect to MySQL with App Mesh
To work around this issue depending on your version of Envoy:
-
If your App Mesh image Envoy version is 1.15.0 or later, do not model external services such as MySQL, SMTP, MSSQL, etc. as a backend for your application's virtual node.
-
If your App Mesh image Envoy version is prior to 1.15.0, add port
3306
to the list of values for theAPPMESH_EGRESS_IGNORED_PORTS
in your services for MySQL and as the port you are using for STMP.
Important
While the standard SMTP ports are 25
, 587
, and 465
,
you should only add the port you are using to APPMESH_EGRESS_IGNORED_PORTS
and not
all three.
For more information, see Update services for Kubernetes , Update services for Amazon ECS, or Update services for Amazon EC2.
If your issue is still not resolved, then you can provide us with details on what you're
experiencing using the existing GitHub issue
Unable to connect to a service modeled as a TCP virtual node or virtual router in App Mesh
Symptoms
Your application is unable to connect to a backend that uses the TCP protocol setting in the App Mesh PortMapping definition.
Resolution
This is a known issue. For more information, see Routing to multiple TCP
destinations on the same port
-
Make sure that all destinations are using a unique port. If you are using a virtual router provider for the backend virtual service, you can change the virtual router port without changing the port on the virtual nodes that it routes to. This allows the applications to open connections on the virtual router port while the Envoy proxy continues to use the port defined in the virtual node.
-
If the destination modeled as TCP is a MySQL server, or any other TCP-based protocol in which the server sends the first packets after connection, see Unable to connect to a MySQL or SMTP server.
If your issue is still not resolved, then you can provide us with details on what you're
experiencing using the existing GitHub issue
Connectivity succeeds to service not listed as a virtual service backend for a virtual node
Symptoms
Your application is able to connect and send traffic to a destination that is not specified as a virtual service backend on your virtual node.
Resolution
If requests are succeeding to a destination that has not been modeled in the App Mesh APIs,
then the most likely cause is that the mesh's outbound filter type has
been set to ALLOW_ALL
. When the outbound filter is set to ALLOW_ALL
,
an outbound request from your application that does not match a modeled destination (backend)
will be sent to the destination IP address set by the application.
If you want to disallow traffic to destinations not modeled in the mesh, consider setting the
outbound filter value to DROP_ALL
.
Note
Setting the mesh outbound filter value affects all virtual nodes within the mesh.
Configuring egress_filter
as DROP_ALL
and enabling TLS isn't
available for outbound traffic that isn't to an AWS domain.
If your issue is still not resolved, then consider opening a GitHub issue
Some requests fail with HTTP status code
503
when a virtual service has a virtual node provider
Symptoms
A portion of your application's requests fail to a virtual service backend that is using a virtual node provider instead of a virtual router provider. When using a virtual router provider for the virtual service, requests do not fail.
Resolution
This is a known issue. For more information, see Retry policy on Virtual Node
provider for a Virtual Service
To reduce request failures to virtual node providers, use a virtual router provider instead, and specify a retry policy on its routes. For other ways to reduce request failures to your applications, see App Mesh best practices.
If your issue is still not resolved, then consider opening a GitHub issue
Unable to connect to an Amazon EFS filesystem
Symptoms
When configuring an Amazon ECS task with an Amazon EFS filesystem as a volume, the task fails to start with the following error.
ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: mount.nfs4: Connection refused : unsuccessful EFS utils command execution; code: 32
Resolution
This is a known issue. This error occurs because the NFS connection to Amazon EFS occurs before
any containers in your task are started. This traffic is routed by the proxy configuration to
Envoy, which will not be running at this point. Because of the ordering of startup, the NFS
client fails to connecting to the Amazon EFS filesystem and the task fails to launch. To resolve the
issue, add port 2049
to the list of values for the EgressIgnoredPorts
setting in the proxy configuration of your Amazon ECS task definition. For more information, see
Proxy configuration.
If your issue is still not resolved, then consider opening a GitHub issue
Connectivity succeeds to service, but the incoming request does not appear in access logs, traces, or metrics for Envoy
Symptoms
Even though your application can connect and send requests to another application, you either can not see incoming requests in the access logs or in tracing information for the Envoy proxy.
Resolution
This is a known issue. From more information, see iptables rules setup
If your issue is still not resolved, then consider opening a GitHub issue
Setting the HTTP_PROXY
/HTTPS_PROXY
environment variables at container level doesn't work as expected.
Symptoms
When HTTP_PROXY/HTTPS_PROXY is set as an environment variable at the:
-
App container in the task definition with App Mesh enabled, requests being sent to the namespace of the App Mesh services will get
HTTP 500
error responses from the Envoy sidecar. -
Envoy container in task definition with App Mesh enabled, requests coming out of Envoy sidecar will not go through the
HTTP
/HTTPS
proxy server, and the environment variable will not work.
Resolution
For the app container:
App Mesh functions by having traffic within your task go through the Envoy proxy.
HTTP_PROXY
/HTTPS_PROXY
configuration overrides this behavior by
configuring container traffic to go through a different external proxy. The traffic will still
be intercepted by Envoy, but it doesn't support proxying the mesh traffic using an external
proxy.
If you want to proxy all non-mesh traffic, please set NO_PROXY
to include your
mesh's CIDR/namespace, localhost, and the credential's endpoints like in the following
example.
NO_PROXY=localhost,127.0.0.1,169.254.169.254,169.254.170.2,10.0.0.0/16
For the Envoy container:
Envoy doesn't support a generic proxy. We do not recommend setting these variables.
If your issue is still not resolved, then consider opening a GitHub issue
Upstream request timeouts even after setting the timeout for routes.
Symptoms
You defined the timeout for:
-
The routes, but you are still getting an upstream request timeout error.
-
The virtual node listener and the timeout and the retry timeout for the routes, but you are still getting an upstream request timeout error.
Resolution
For the high latency requests greater than 15 seconds to complete successfully, you need to specify a timeout at both the route and virtual node listener level.
If you specify a route timeout that is greater than the default 15 seconds, make sure that the timeout is also specified for the listener for all participating virtual nodes. However, if you decrease the timeout to a value that is lower than the default, it's optional to update the timeouts at virtual nodes. For more information about options when setting up virtual nodes and routes, see virtual nodes and routes.
If you specified a retry policy, the duration that you specify for the request timeout should always be greater than or equal to the retry timeout multiplied by the max retries that you defined in the retry policy. This allows your request with all the retries to complete successfully. For more information, see routes.
If your issue is still not resolved, then consider opening a GitHub issue
Envoy responds with HTTP Bad request.
Symptoms
Envoy responds with HTTP 400 Bad request for all requests sent through the Network Load Balancer (NLB). When we check the Envoy logs, we see:
-
Debug logs:
dispatch error: http/1.1 protocol error: HPE_INVALID_METHOD
-
Access logs:
"- - HTTP/1.1" 400 DPE 0 11 0 - "-" "-" "-" "-" "-"
Resolution
The resolution is to disable the proxy protocol version 2 (PPv2) on your NLB's target group attributes.
As of today the PPv2 is not supported by virtual gateway and virtual node Envoy that are run
using the App Mesh control plane. If you deploy NLB using AWS load balancer controller on
Kubernetes, then disable PPv2 by setting the following attribute to false
:
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: proxy_protocol_v2.enabled
See AWS Load Balancer Controller Annotations
If your issue is still not resolved, then consider opening a GitHub issue
Unable to configure timeout properly.
Symptoms
Your request timeouts within 15 seconds even after configuring the timeout on the virtual node listener and the timeout on the route towards virtual node backend.
Resolution
Make sure that the correct virtual service is included under the backend list.
If your issue is still not resolved, then consider opening a GitHub issue