RDS Performance Insights is awesome

Jeremy Nagel
3 min readNov 28, 2019

The backend team and I at GOFAR recently did a release to staging which we confidently expected would be a crowd pleaser. After much refactoring, we were able to get the app upgraded to version 3 of Loopback, the NodeJS framework we’re using to run the app. This was kind of a big deal because the app had been running on Loopback 2, which is now officially unsupported and therefore a scary security risk.

The deploy seemed to go smoothly..until my boss started complaining about the mobile apps now running really slowly. Eek. I dug into the logs and couldn’t see anything obvious. We’ve had a few issues with the database in the past, so I upgraded our RDS instance from pg9.5 to pg11. Sounds risky but we’d done the upgrade on pre-staging with no ill-effects and we don’t have many users on staging so could afford to experiment a bit.

The upgrade on its own didn’t seem to improve things much but it did give me access to one very useful service: RDS Performance Insights.

Performance insights helps you see slow queries and issues with how clients are connecting to the DB

I was immediately able to spot an indexing issue. An index that we had on production was somehow missing from staging. It made a pretty big difference.

Here is a load test result done without the index:

A whole stack of errors and p95 of 15 seconds

Here it is after adding the index:

No errors and p95 of 5.5 seconds

Almost a 3x improvement in request duration and a heck of a lot fewer 500 errors.

But what’s all this ClientRead?

The CPU is barely doing anything — it’s spending all its time waiting for the client to provide parameters

Something still didn’t look right. Performance insights showed an inordinate amount of time being spent on ClientRead and very little pressure on the DB instance’s CPU.

A bit of googling helped me realise that would only happen if packets were taking a very long time to travel between the API instance and the database instance. Almost like they were going out over the internet…Surely not?

Yup..whoever built the elastic beanstalk stack we were using originally had decided to untick the “Add to VPC” box.

How much of a difference would this make? I rebuilt the app in Fargate (inside a VPC) to see.

Here are the results of the load test:

Fargate App

Total time 467 seconds

Elastic Beanstalk app (outside VPC)

Total time 2067 seconds

Graphs from Performance Insights

With the client inside the VPC — DB instance is working pretty hard!
DB being hit by the EB app outside a VPC — CPU massively blocked waiting for client reads

The verdict: >4x improvement

It definitely makes a difference putting your infrastructure inside a VPC! Some of the performance gains may have been due to better performance out of Fargate vs Elastic Beanstalk docker instances. I could’ve tested rebuilding the EB app inside a VPC but frankly I was keen to get rid of EB ASAP!

--

--