Using Spark Connect with AWS Glue interactive sessions
Apache Spark Connect
Spark Connect is supported natively in AWS Glue version 5.1 and above. You can connect to a AWS Glue interactive session directly
from an environment that supports the PySpark remote() API.
Comparing session types: Livy and Spark Connect
AWS Glue interactive sessions support two session types. The following table compares Livy-based sessions and Spark Connect sessions.
| Feature | Livy | Spark Connect |
|---|---|---|
Protocol |
REST |
gRPC (to send logical execution plans) + Apache Arrow (to stream results) |
Connection method |
Statement APIs ( |
Direct connection through endpoint URL using PySpark
|
Client requirement |
|
PySpark with Spark Connect support |
IDE support |
Through Jupyter with SparkMagic kernel |
Notebooks on SageMaker Unified Studio or IDEs with Python interpreters like VS Code, PyCharm, and others |
When to use Spark Connect
Use Spark Connect sessions when you need direct, programmatic access to a AWS Glue interactive session from your development environment. The following are common use cases:
-
Notebooks in SageMaker Unified Studio – Connect to AWS Glue interactive sessions directly from your notebook environment for interactive data exploration.
-
IDEs such as VS Code or PyCharm – Use PySpark from your preferred IDE to develop and test Spark applications against a remote AWS Glue cluster.
-
Python scripts and applications – Access AWS Glue interactive sessions programmatically from a Python application that uses the PySpark
remote()API.
Region availability
AWS Glue interactive sessions with Spark Connect is available in the following AWS Regions:
Asia Pacific (Mumbai)
Asia Pacific (Seoul)
Asia Pacific (Singapore)
Asia Pacific (Sydney)
Asia Pacific (Tokyo)
Canada (Central)
Europe (Frankfurt)
Europe (Ireland)
Europe (London)
Europe (Paris)
Europe (Stockholm)
South America (São Paulo)
US East (Ohio)
US East (N. Virginia)
US West (Oregon)
Considerations and limitations
Consider the following when you use Spark Connect with AWS Glue interactive sessions:
-
Spark Connect is available for AWS Glue interactive sessions running AWS Glue version 5.1 and later.
-
Statement APIs (
RunStatement,CancelStatement,GetStatement, andListStatements) are not supported for Spark Connect sessions. You interact with the session directly through the PySpark client. -
You can't change the session type after you create a session. To switch between Livy and Spark Connect, you must create a new session.
-
Spark Connect is not supported on AWS Glue Studio. For interactive development using AWS Glue, you can use Notebooks in SageMaker Unified Studio or your preferred IDEs with Python interpreters.
-
Fine-grained access control through Lake Formation is not supported for Spark Connect sessions.