|Did this page help you? Yes | No | Tell us about it...|
Welcome to the Amazon Redshift Cluster Management Guide. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift offers you fast query performance when analyzing virtually any size data set using the same SQL-based tools and business intelligence applications you use today. With a few clicks in the AWS Management Console, you can launch a Redshift cluster, starting with a few hundred gigabytes of data and scaling to a petabyte or more.
Your first step in creating a data warehouse is to launch a set of compute nodes, called an Amazon Redshift cluster. The number and type of compute nodes that you need depends on the size of your data, the number of queries you will execute, and the query execution performance you need. Each cluster that you provision is a fully managed Amazon Redshift data warehouse. You can use the Amazon Redshift Management console, API, or CLI to create and manage clusters.
By default, Amazon Redshift creates one database when you create a cluster. You can create additional databases as needed. After your cluster has been provisioned, you can upload your dataset and then perform data analysis queries by using the SQL-based tools and business intelligence applications that you are already familiar with. Regardless of the size of the data set, Amazon Redshift offers high query performance.
Amazon Redshift manages all the work of setting up, operating, and scaling a data warehouse: provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine. You can focus on using your data to acquire new insights for your business and customers.
Cluster management involves the following operations:
Create and manage clusters – Depending on your data warehousing needs, you can start with a small cluster with just a single XL node and easily scale up to 100 8XL nodes as your requirements change. You can monitor the performance of your data warehouse, and if needed, you can add or remove compute nodes without any interruption to the service. For more information, see Amazon Redshift Clusters.
If you intend to keep your cluster running for a year or longer, you can save money by reserving compute nodes for a one-year or three-year period. Reserving compute nodes offers significant savings compared to the hourly rates that you pay when you provision compute nodes on demand. For more information, see Purchasing Amazon Redshift Reserved Nodes.
Create and manage cluster security groups – By default, any cluster that you create is closed to everyone. To enable access to your cluster, you create a security group and associate it with your cluster. You add rules to the security group to grant explicit inbound access to a specific range of CIDR/IP addresses or to an Amazon Elastic compute Cloud (EC2) security group if your SQL client is running on an EC2 instance. For more information, see Amazon Redshift Cluster Security Groups.
Create and manage parameter groups – When you create an Amazon Redshift cluster, you associate a parameter group with it. The parameters in this group, such as the date presentation style and floating point precision, apply to all the databases that you create on the cluster. There is a default parameter group for the Amazon Redshift engine that defines preset values for the parameters. If your application requires different settings, you can create your own parameter group. For more information, see Amazon Redshift Parameter Groups.
Manage snapshots – Amazon Redshift continuously backs up your data to an Amazon Simple Storage Service (S3) bucket to help protect against data loss. You also have the option to create your own point-in-time backups of your cluster. These automated and manual backups are called snapshots. You can restore a cluster to its state when you took the snapshot. Amazon Redshift supports both automated and manual snapshots. For more information, see Amazon Redshift Snapshots
Monitor cluster performance – Amazon Redshift collects metrics that you can use to track the health and performance of your clusters and queries. You can set up alarms that notify you when one or more metrics are outside an acceptable range. You can use the Amazon Redshift console to directly access the most common cluster performance metrics and to look at the resource utilization of individual queries. You can also view cluster performance metrics by using Amazon CloudWatch. For more information, see Monitoring Amazon Redshift Cluster Performance.
Control access to your Amazon Redshift resources – The AWS account that creates the cluster has full access to the cluster. Within your AWS account, you can use the AWS Identity and Access Management (IAM) service to create user accounts and manage permissions for those accounts. By using IAM, you can grant different users permission to perform only the cluster operations that are necessary for their work. IAM controls only access to the Amazon Redshift API; it does not control access to the cluster via JDBC and ODBC. For more information, see Controlling Access to Amazon Redshift Resources.
If you are a first-time user of Amazon Redshift, we recommend that you begin by reading the following sections:
Service Highlights and Pricing – The product detail page provides the Amazon Redshift value proposition, service highlights, and pricing.
Getting Started – The Getting Started Guide includes an example that walks you through the process of creating a cluster, creating database tables, uploading data, and testing queries.
After you complete the Getting Started guide, we recommend that you explore one of the following guides:
Amazon Redshift Cluster Management Guide (this document) – This guide shows you how to create and manage Amazon Redshift clusters. For more information, see the following Cluster Management Overview section.
If you are an application developer, you can use the Amazon Redshift Query API to manage clusters programmatically. Additionally, the AWS SDK libraries that wrap the underlying Amazon Redshift API simplify your programming tasks. If you prefer a more interactive way of managing clusters, you can use the Amazon Redshift console and the AWS command line interface (AWS CLI). For information about the API and CLI, go to the following manuals :
Amazon Redshift Database Developer Guide – If you are a database developer, the Amazon Redshift Database Developer Guide explains how to design, build, query, and maintain the databases that make up your data warehouse.