Monitoring FSx for ONTAP file systems using Harvest and Grafana - FSx for ONTAP

Monitoring FSx for ONTAP file systems using Harvest and Grafana

NetApp Harvest is an open source tool for gathering performance and capacity metrics from ONTAP systems, and is compatible with FSx for ONTAP. You can use Harvest with Grafana for an open source monitoring solution.

Getting started with Harvest and Grafana

The following section details how you can set up and configure Harvest and Grafana to measure your FSx for ONTAP file system’s performance and storage capacity utilization.

You can monitor your Amazon FSx for NetApp ONTAP file system by using Harvest and Grafana. NetApp Harvest monitors ONTAP data centers by collecting performance, capacity, and hardware metrics from FSx for ONTAP file systems. Grafana provides a dashboard where the collected Harvest metrics can be displayed.

Supported Harvest dashboards

Amazon FSx for NetApp ONTAP exposes a different set of metrics than does on-premises NetApp ONTAP. Therefor, only the following out-of-the-box Harvest dashboards tagged with fsx are currently supported for use with FSx for ONTAP. Some of the panels in these dashboards may be missing information that is not supported.

  • ONTAP: Compliance

  • ONTAP: Data Protection Snapshots

  • ONTAP: Security

  • ONTAP: SVM

  • ONTAP: Volume

AWS CloudFormation template

To get started, you can deploy an AWS CloudFormation template that automatically launches an Amazon EC2 instance running Harvest and Grafana. As an input to the AWS CloudFormation template, you specify the fsxadmin user and the Amazon FSx management endpoint for the file system which will be added as part of this deployment. After the deployment is completed, you can log in to the Grafana dashboard to monitor your file system.

This solution uses AWS CloudFormation to automate the deployment of the Harvest and Grafana solution. The template creates an Amazon EC2 Linux instance and installs Harvest and Grafana software. To use this solution, download the fsx-ontap-harvest-grafana.template AWS CloudFormation template.

Note

Implementing this solution incurs billing for the associated AWS services. For more information, see the pricing details pages for those services.

Amazon EC2 instance types

When configuring the template, you provide the Amazon EC2 instance type. NetApp's recommendation for the instance size depends on how many file systems you monitor and the number of metrics you choose to collect. With the default configuration, for each 10 file systems you monitor, NetApp recommends:

  • CPU: 2 cores

  • Memory: 1 GB

  • Disk: 500 MB (mostly used by log files)

Following are some sample configurations and the t3 instance type you might choose.

File systems CPU Disk Instance type

Under 10

2 cores

500 MB

t3.micro

10–40

4 cores

1000 MB

t3.xlarge

40+

8 cores

2000 MB

t3.2xlarge

For more information on Amazon EC2 instance types, see General purpose instances in the Amazon EC2 User Guide.

Instance port rules

When you set up your Amazon EC2 instance, make sure that ports 3000 and 9090 are open for inbound traffic for the security group that the Amazon EC2 Harvest and Grafana instance is in. Because the instance that is launched connects to an endpoint over HTTPS, it needs to resolve the endpoint, which needs port 53 TCP/UDP for DNS. Additionally, to reach the endpoint it needs port 443 TCP for HTTPS and Internet Access.

Deployment procedure

The following procedure configures and deploys the Harvest/Grafana solution. It takes about five minutes to deploy. Before you start, you must have an FSx for ONTAP file system running in an Amazon Virtual Private Cloud (Amazon VPC) in your AWS account, and the parameter information for the template listed below. For more information on creating a file system, see Creating file systems.

To launch the Harvest/Grafana solution stack
  1. Download the fsx-ontap-harvest-grafana.template AWS CloudFormation template. For more information on creating an AWS CloudFormation stack, see Creating a stack on the AWS CloudFormation console in the AWS CloudFormation User Guide.

    Note

    By default, this template launches in the US East (N. Virginia) AWS Region. You must launch this solution in an AWS Region where Amazon FSx is available. For more information, see Amazon FSx endpoints and quotas in the AWS General Reference.

  2. For Parameters, review the parameters for the template and modify them for the needs of your file system. This solution uses the following default values.

    Parameter Default Description
    InstanceType t3.micro

    The Amazon EC2 instance type. Following are the t3 instance types.

    • t3.micro

    • t3.small

    • t3.medium

    • t3.large

    • t3.xlarge

    • t3.2xlarge

    For the complete list of allowed Amazon EC2 instance type values for this parameter, see the fsx-ontap-harvest-grafana.template.

    KeyPair No default value The key pair that is used to access the Amazon EC2 instance.
    SecurityGroup No default value The Security group ID for the Harvest/Grafana Instance. Ensure Inbound ports 3000 and 9090, in addition to ports 53 and 443, are open from the clients you wish to use to access your Grafana dashboard.
    Subnet Type No default value Specify the subnet type, either public or private. Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won't be connected to the internet. For more information, see Subnet types in the Amazon VPC User Guide.
    Subnet No default value Specify the same subnet as your Amazon FSx for NetApp ONTAP file system's preferred subnet. You can find the file system's Preferred subnet ID in the Amazon FSx console, in the Network & security tab of the FSx for ONTAP file system details page
    LatestLinuxAmiId /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 The latest version of the Amazon Linux 2 AMI in a given AWS Region.
    FSxEndPoint No default value The file system's Management endpoint IP address. You can find the file system's management endpoint IP address in the Amazon FSx console, in the Administration tab of the FSx for ONTAP file system details page.
    SecretName No default value AWS Secrets Manager secret name containing the password for the file system's fsxadmin user. This is the password you provided when you created the file system.
  3. Choose Next.

  4. For Options, choose Next.

  5. For Review, review and confirm the settings. You must select the check box acknowledging that the template create IAM resources.

  6. Choose Create to deploy the stack.

You can view the status of the stack in the AWS CloudFormation console in the Status column. You should see a status of CREATE_COMPLETE in about five minutes.

Logging in to Grafana

After the deployment has finished, use your browser to log in to the Grafana dashboard at the IP and port 3000 of the Amazon EC2 instance:

http://EC2_instance_IP:3000

When prompted, use the Grafana default user name (admin) and password (pass). We recommend that you change your password as soon as you log in.

For more information, see the NetApp Harvest page on GitHub.

Troubleshooting Harvest and Grafana

If you are encountering any data missing mentioned in Harvest and Grafana dashboards or are having trouble setting up Harvest and Grafana with FSx for ONTAP, check the following topics for a potential solution.

SVM and volume dashboards are blank

If the AWS CloudFormation stack deployed successfully and can contact Grafana but the SVM and volume dashboards are blank, use the following procedure to troubleshoot your environment. You will need SSH access to the Amazon EC2 instance that Harvest and Grafana is deployed on.

  1. SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.

    [~]$ ssh ec2-user@ec2_ip_address
  2. Use the following command to open the harvest.yml file and:

    • Verify that an entry was created for your FSx for ONTAP instance as Cluster-2.

    • Verify that the entries for username and password match your fsxadmin credentials.

    [ec2-user@ip-ec2_ip_address ~]$ sudo cat /home/ec2-user/harvest_install/harvest/harvest.yml
  3. If the password field is blank, open the file in an editor and update it with the fsxadmin password, as follows:

    [ec2-user@ip-ec2_ip_address ~]$ sudo vi /home/ec2-user/harvest_install/harvest/harvest.yml
  4. Ensure the fsxadmin user credentials are stored in Secrets Manager in the following format for any future deployments, replacing fsxadmin_password with your password.

    {"username" : "fsxadmin", "password" : "fsxadmin_password"}

CloudFormation stack rolled back after timeout

If you are unable to deploy the CloudFormation stack successfully and it is rolling back with errors, use the following procedure to resolve the issue. You will need SSH access to the EC2 instance deployed by the CloudFormation stack.

  1. Redeploy the CloudFormation stack, making sure that automatic rollback is disabled.

  2. SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.

    [~]$ ssh ec2-user@ec2_ip_address
  3. Verfy that the docker containers were successfully started using the following command.

    [ec2-user@ip-ec2_ip_address ~]$ sudo docker ps

    In the response you should see five containers as follows:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6b9b3f2085ef rahulguptajss/harvest "bin/poller --config…" 8 minutes ago Restarting (1) 20 seconds ago harvest_cluster-2 3cf3e3623fde rahulguptajss/harvest "bin/poller --config…" 8 minutes ago Up About a minute harvest_cluster-1 708f3b7ef6f8 grafana/grafana "/run.sh" 8 minutes ago Up 8 minutes 0.0.0.0:3000->3000/tcp harvest_grafana 0febee61cab7 prom/alertmanager "/bin/alertmanager -…" 8 minutes ago Up 8 minutes 0.0.0.0:9093->9093/tcp harvest_prometheus_alertmanager 1706d8cd5a0c prom/prometheus "/bin/prometheus --c…" 8 minutes ago Up 8 minutes 0.0.0.0:9090->9090/tcp harvest_prometheus
  4. If the docker containers are not running, check for failures in the /var/log/cloud-init-output.log file as follows.

    [ec2-user@ip-ec2_ip_address ~]$ sudo cat /var/log/cloud-init-output.log PLAY [Manage Harvest] ********************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [Verify images] *********************************************************** failed: [localhost] (item=prom/prometheus) => {"ansible_loop_var": "item", "changed": false, "item": "prom/prometheus", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co nnection reset by peer'))"} failed: [localhost] (item=prom/alertmanager) => {"ansible_loop_var": "item", "changed": false, "item": "prom/alertmanage r", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"} failed: [localhost] (item=rahulguptajss/harvest) => {"ansible_loop_var": "item", "changed": false, "item": "rahulguptajs s/harvest", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetEr ror(104, 'Connection reset by peer'))"} failed: [localhost] (item=grafana/grafana) => {"ansible_loop_var": "item", "changed": false, "item": "grafana/grafana", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co nnection reset by peer'))"} PLAY RECAP ********************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
  5. If there are failures, execute the following commands to deploy the Harvest and Grafana containers.

    [ec2-user@ip-ec2_ip_address ~]$ sudo su [ec2-user@ip-ec2_ip_address ~]$ cd /home/ec2-user/harvest_install [ec2-user@ip-ec2_ip_address ~]$ /usr/local/bin/ansible-playbook manage_harvest.yml [ec2-user@ip-ec2_ip_address ~]$ /usr/local/bin/ansible-playbook manage_harvest.yml --tags api
  6. Validate the containers started successfully by running sudo docker ps and connecting to your Harvest and Grafana URL.