Testable Update: AWS spot instances and better dashboards

This month we introduce the ability to generate load using AWS spot instances and improved dashboard sharing that enables you to customize and share different views for different audiences.

AWS Spot Instances

Last month we introduced AWS On Demand test runners. This month we are adding support for AWS spot instances saving you money on non-critical test workloads. Simply check the checkbox and specify your desired maximum bid price. The max bid defaults to the on demand price for the chosen instance type and region.

Look out for more test runners in the near future to support on demand instances on other platforms like the Microsoft Cloud. Click here for more detailed documentation on our full set of test runner options.

Better Dashboards

Dashboards are now shareable across all test cases in your account. Previously each test case had it’s own set of dashboards.

When sharing results publicly, the selected dashboard is associated with the share url now. This allows you to share the same results using several dashboards with a unique URL for each. For example, you might want to share one high level view with company executives and a different detailed view with your QA team.

Please check out all the new features and let us know if you have any thoughts or concerns. Happy testing!

Your AWS account or mine?

This month we introduce the ability to generate load using on demand, isolated AWS infrastructure using your AWS account or ours with more cloud providers coming soon.

AWS On Demand Load Generation

Until now all load was generated on shared infrastructure. Either shared across all accounts via our public grid or across all tests within your account using our on premises solution.

We now have a third option and also did a major internal overhaul to support new test runners in the future while we were at it.

From today you can generate load from isolated, on demand AWS instances that are spun up for each test separately. Use your own AWS account or ours.

When running a test you can select multiple test runners. This means you can run your scenario from multiple AWS regions, our shared public grid, and within your data center (or VPC) all at the same time.

Look out for more test runners in the near future to support on demand instances on other platforms like the Microsoft Cloud. Click here for more detailed documentation on the new features.

Customize Percentiles

When creating a new test configuration you can now specify exactly which percentiles you want to calculate for all timing metrics. This includes our own metrics (e.g. latency) as well as any custom ones you define in your test.

While we were at it, we also improved the speed at which we process and aggregate results by 10x using a new approach for calculating percentiles.

Please check out all the new features and let us know if you have any thoughts or concerns. Happy testing!

Testable Update: Trends, Memory/CPU tracking, and more

Plenty of new features, fixes, and enhancements this month including the launch of Trends, agent memory and CPU tracking, and more.

Trends

View sparklines and metric history in your test results to get a sense for how metrics like latency and throughput have changed across recent test executions. Drill down by clicking on the sparkline to see the exact history. This feature is available for all metrics including user defined custom ones.

Agent Memory/CPU tracking

See how much memory and CPU was required to execute your test. Both metrics are available on the default dashboard layout as a chart and in the balloon text on the agent location map. See this information across your entire test or per region.

Usability improvements

Numerous changes have been made to the dashboard, test case, and test results pages to make things simpler and clearer.

Streaming smoke testing

When writing a script or uploading a JMeter test plan it is useful to run a quick smoke test to make sure it works. This feature existed previously but did not work well for scripts where one iteration took longer than a minute. This has now been fixed and smoke test results stream into the browser in real time including full trace details.

Capacity increase, performance and reliability improvements

Numerous changes were made to further improve performance and reliability. We also increased our capacity to ensure we can scale to meet our clients load testing demand.

JMeter improvements

JMeter test plans that had long iteration time (> 1 minute) did not work well previously. This has now been remedied, and these tests should run smoothly.

Documentation updates

More documentation updates to improve the depth and coverage have been made during this release cycle.

Please check out all the new features and let us know if you have any thoughts or concerns. Happy testing!

Create an AWS Auto Scaling Group with Terraform that scales based on Ubuntu memory utilization

In this post we will show you how to use Terraform to spin up an AWS Auto Scaling Group that uses instance memory utilization as the trigger for adding/removing instances from the group. We use Ubuntu 14.04 (trusty) as our OS of choice. Hopefully some of you find this useful since we could not find all this information put together in a nice easy to understand way.

Background

At Testable our focus is on making it easy for our clients to quickly spin up large performance tests and execute them across a global set of stateless agents. Our cloud provider is AWS and we run a set of docker containers on EC2 instances running Ubuntu 14.04 (trusty). In each location where tests can execute we have a set of agents running. Each agent works roughly as follows:

  • Connect to the central coordination service (load balanced via an ELB of course)
  • Perform a registration/handshake process
  • Listen for new “work” where “work” is defined as some part of a test to execute
  • Upon receiving new “work”, execute it and report back results/errors/logging

The agents are completely stateless and it is only the central coordination service that maintains any state about test progress, how many agents are running, etc.

This is a classic use case for an Auto Scaling Group! When demand is low we do not need many instances of the agent process running in each location. When a client comes with a large test or many clients come at the same time, we want to dynamically spin up more agents and then destroy them afterwards when they are no longer needed.

If you are unfamiliar with Auto Scaling Groups, try reading this introduction and information on dynamic scaling first.

The above is a simple image that shows what we are aiming to setup. Let’s go through it step by step now. We assume you already have an AWS account and have installed Terraform on your computer.

Build our AMI

We need to create an AMI to use in our Auto Scaling Group launch configuration. This instance will run Ubuntu 14.04 and be configured to report memory (and disk) utilization.

  • Launch a new EC2 instance and select Ubuntu 14.04 as your operating system.
  • Once the instance is running, connect via SSH
  • Install monitoring script that reports memory utilization as a CloudWatch metricAmazon provides monitoring scripts that work on Ubuntu. We have created an extension that also captures disk inode utilization that can be used (source code and binaries). We will use our extension for the purposes of this blog but it should work the same for memory utilization.
sudo apt-get install unzip
wget https://s3.amazonaws.com/testable-scripts/AwsScriptsMon-0.0.1.zip
unzip AwsScriptsMon-0.0.1.zip
rm AwsScriptsMon-0.0.1.zip

We now have the monitoring scripts installed on our instance, let’s try them out once:

cd aws-scripts-mon
./mon-put-instance-data.pl --verify --verbose --auto-scaling --mem-util --aws-access-key-id=[access-key-here] --aws-secret-key=[secret-key-here]
  • Add monitoring script to crontabIf the trial above works successfully we can cron this script to run every 5 minutes and report memory utilization as a CloudWatch metric:
crontab -l | { cat; echo "*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --from-cron --auto-scaling --mem-util --aws-access-key-id=[access-key-here] --aws-secret-key=[secret-key-here]"; } | crontab - 

Feel free to add other metrics like disk utilization, disk inode utilization, etc to improve your general instance monitoring.

NOTE: This script caches the instance id locally (6 hour TTL default, 24 hours for auto scaling groups). This means that new instances launched with this image will report metrics for the wrong instance id for up to 24 hours. To get around this we use user_data to delete the cached instance id during instance creation. See the Terraform code section for more details.

  • Install any other software required for your instance
  • In the AWS console, right click your instance -> Image -> Create Image. Note the AMI ID of your image for future steps.

Terraform Code

Now that we have the image and the metrics let’s set everything else up using Terraform. Let’s go through it part by part.

provider "aws" {
    access_key = "${var.access_key}"
    secret_key = "${var.secret_key}"
    region = "us-east-1"
}

This first part simply initializes the AWS provider with our access key, secret, and region.

resource "aws_launch_configuration" "agent-lc" {
    name_prefix = "agent-lc-"
    image_id = "${var.ami}"
    instance_type = "${var.instance_type}"
    user_data = "${file("init-agent-instance.sh")}"

    lifecycle {
        create_before_destroy = true
    }

    root_block_device {
        volume_type = "gp2"
        volume_size = "50"
    }
}

The launch configuration is very similar to an instance configuration. See the documentation for the full set of options (e.g. availability zone, security groups, vpc, etc). The above is just a simple example. The image_id should be the AMI id from earlier.

Note the user_data script. This script, as mentioned earlier, deletes the cached instance id used by the monitoring scripts to ensure that we report metrics using the right instance id from the moment the instance launches.

Contents of init-agent-instance.sh:

#!/bin/bash
rm -Rf /var/tmp/aws-mon

Next we configure the actual auto scaling group.

resource "aws_autoscaling_group" "agents" {
    availability_zones = ["us-east-1a"]
    name = "agents"
    max_size = "20"
    min_size = "1"
    health_check_grace_period = 300
    health_check_type = "EC2"
    desired_capacity = 2
    force_delete = true
    launch_configuration = "${aws_launch_configuration.agent-lc.name}"

    tag {
        key = "Name"
        value = "Agent Instance"
        propagate_at_launch = true
    }
}

This defines the group as containing 1-20 instances and points at our earlier launch configuration as the way to launch new instances. The tag is propogated to any launched instance at launch.

resource "aws_autoscaling_policy" "agents-scale-up" {
    name = "agents-scale-up"
    scaling_adjustment = 1
    adjustment_type = "ChangeInCapacity"
    cooldown = 300
    autoscaling_group_name = "${aws_autoscaling_group.agents.name}"
}

resource "aws_autoscaling_policy" "agents-scale-down" {
    name = "agents-scale-down"
    scaling_adjustment = -1
    adjustment_type = "ChangeInCapacity"
    cooldown = 300
    autoscaling_group_name = "${aws_autoscaling_group.agents.name}"
}

The above configures our scale up and scale down policies that we will trigger using a CloudWatch alarm. At this point we are just configuring the policy that defines what to do in either event which is to add one instance or remove one instance.

resource "aws_cloudwatch_metric_alarm" "memory-high" {
    alarm_name = "mem-util-high-agents"
    comparison_operator = "GreaterThanOrEqualToThreshold"
    evaluation_periods = "2"
    metric_name = "MemoryUtilization"
    namespace = "System/Linux"
    period = "300"
    statistic = "Average"
    threshold = "80"
    alarm_description = "This metric monitors ec2 memory for high utilization on agent hosts"
    alarm_actions = [
        "${aws_autoscaling_policy.agents-scale-up.arn}"
    ]
    dimensions {
        AutoScalingGroupName = "${aws_autoscaling_group.agents.name}"
    }
}

resource "aws_cloudwatch_metric_alarm" "memory-low" {
    alarm_name = "mem-util-low-agents"
    comparison_operator = "LessThanOrEqualToThreshold"
    evaluation_periods = "2"
    metric_name = "MemoryUtilization"
    namespace = "System/Linux"
    period = "300"
    statistic = "Average"
    threshold = "40"
    alarm_description = "This metric monitors ec2 memory for low utilization on agent hosts"
    alarm_actions = [
        "${aws_autoscaling_policy.agents-scale-down.arn}"
    ]
    dimensions {
        AutoScalingGroupName = "${aws_autoscaling_group.agents.name}"
    }
}

Creates the CloudWatch metric alarms. The first one triggers the scale up policy when the group’s overall memory utilization is >= 80% for 2 5 minute intervals. The second one triggers the scale down policy when the group’s overall memory utilization is <= 40%.

Save this to your main.tf file and go ahead and apply it to initialize all the resources:

terraform apply

And that’s it. You now have a cluster of instances that automatically scale up and down based on memory usage. It is easy to extend this to also monitor disk utilization, disk inode utilization, cpu, etc.

Post in the comments if you have any questions/comments.