Monitoring Networks in AWS
Cloud Watch Introduction
Amazon Cloud Watch is Serverless Service , provides below Services to monitor AWS Network Components.
- Provide highly Scalable Metering Service
- Log Collection & Graphing
- Alarm Services that collect metrices & Logs from AWS Component & On-Premise system that has Cloud watch agent installed
By Default , All AWS Services sends the relevant services matrices and logs to CloudWatch.
Below are some benefits that CloudWatch offers:
- Service collects matrices from all AWS Services and other Sources
- Built-in-tools for analytics allow us to analyze data quickly and efficiently
- With Cloud Watch, alerting service can be done, also we can push data out to other systems for more analysis
All the data (metrices & Logs) collected by CloudWatch are stored in CloudWatch repository and this repository is regionally bound and will only display the metrices from services in its own region and any on-premises services that are configured to send the logs in that region. Cloud Watch can be integrated with external tool so that we can analyze the application that runs across multiple regions and also we need to aggregate the data and visualize those in single set.
We can view all CloudWatch metrices from management console , AWS CLI , and SDKs and from directly from cloud watch API.
Metrices , Logs & Alarms
With the CloudWatch , We can easily get the state of our application. Below are the following core elements of Cloud Watch
Metrices in CloudWatch are used to collect information at certain point of time , related to particular System , subsystems, or resources that are being consumed in AWS.
Each metric has certain value , which represent the state of system or level of resources usages by that system. Example : Memory usage, CPU usage , Disk Utilization usage.
There are two types of Metrices available
- Standard Metrices : It is Collected in every 5 Minutes interval by default with no cost.
- Detailed metrices: It is Collected in 1 Mint interval , if enabled by user with some additional cost.
Cloud Watch Sets the retention period for metrices that are collected
- Custom metrices with below 1 minute interval – 3 hours
- 1 minute metrices – 15 days
- 5 minutes metrices – 63 days
- 1 – hour metrices – 15 Months.
Once the retention period for certain metric level expires , the service will aggregate the detailed metrices to next tier.
CloudWatch are used to collect the logs for different Sources from AWS or from external services, such as application running on EC2 instance or on-premises applications.
With the help of Cloud Watch logs management Console, we can view any types of logs from any hosted services
In these logs are grouped by domains and each domain can have one or multiple streams in it. The logs collected do not have expiration, and any unnecessary logs should be deleted once they are no longer required.
CloudWatch management Console allows us to perform the analytics to our logs, with the insights feature, we can built the expression, that can help us to find exact logs types.
Based on predefined Conditions, Threshold, predefined logs value or pattern, we can perform actions using CloudWatch platform. We can configure these conditions, pattern, or threshold in CloudWatch Alarm.
Below are some actions that can be used on Alarms
- Auto Recovery
- Event Based Computing
Auto Scaling helps Cloud Watch Alarm to trigger an action and is also used to detect the Log pattern in log file.
We can understand this by using below example.
An alarm will be generated, if it detects the 400 error in log file and if trigger threshold could be set to a certain number of occurrences in a specified amount of time.
- When 400 errors in the log appear
- With 10 or More Occurrences
- With in span of 10 mints
- Send a notification to the response team via SNS
If health check of any instance is failed, Cloud Watch can help in recovering those instances either by reboot of instances or recovery of instance on another host (if any issue happens in EC2 Environment). This can be achieved by using health check metrics of instances.
If you want to collect audit logs of all API calls, we should use the AWS Cloud Trail Solution. It is the Fully managed Serverless Service, which gives us ability to gain complete insight in to WHO performed, WHAT action taken on WHICH resources and WHEN.
Cloud Trail collect all this information and keep it in to a bucket that we designate. All data collected by this is encrypted automatically and additionally by enabling Cloud Trail Log file integrity validation, we can provide some extra protection in bucket.
This Cloud Trail Log File Integrity Validation helps in Following ways
- Check weather Log file is changed or has been altered since time it is collected
- Any deletion in log file
- Whether logs has been delivered during certain time or not.
When this Service is enabled , a hash is created for every log that is delivered to the bucket. A Separate hash Inventory file is created to keep track of the files and their hashes. This file is created every hour and contains the listing of all files delivered in last hour. A separate hash is also created for inventory file.
If we use the Cloud Watch and Cloud trail together , it will gives us complete picture of the environment and can be used when performing auditing , compliance , and validation of AWS environment along with historical usage of the account.