Highly-Available, Scalable WordPress using ECS/Docker & RDS/MariaDB

Highly-Available, Scalable WordPress using ECS/Docker & RDS/MariaDB

Preface and 2019 Update

Back in 2017, this was my first real personal project to explore all the ins and outs of Amazon Web Services, after being granted unfettered access to my best friend’s AWS account (I later had to tear all of this sh*t down, because it was costing him an arm and a leg).

I have since gotten involved with Terraform, and would now automate this entire configuration so that it would be completely reproducible, and the entire environment could be spun up in a matter of minutes. That being said, I feel like for any AWS novice (like I was at the time), you should first have to “click it out” before you go and start automating all the things, and that is what is described in this blog post. I also would like to stress the fact that there are a few very stand-out aspects of this project, given the particular timeframe that it was carried out…

First of all, containerizing WordPress is not something that is normally done. Historically, it has remained monolithic in nature, and mostly scales vertically, not horizontally like the architecture I implemented in this blog post. I would say that a good 90% of installations only use a 2-tiered architecture, where you have a webserver like Apache or Nginx serving up the static files and PHP, and the PHP talks to a database tier. I would even go so far as to say that probably a good 50% of WordPress installations run everything on the same tier, with a single server used for both the webserver and database. The thing that I did differently in order to implement a highly-available architecture was to deploy multiple WordPress instances across Availability Zones, all of which talked to the same Amazon RDS instance, which is highly available by default.

The second thing that I did completely differently was to utilize Amazon’s Elastic File System, which had been released less than a year prior to me writing this article. There was literally zero documentation on how to implement an architecture like this at the time. Believe me, I Googled until my fingers were numb, and eventually rolled my own architecture and half-assedly documented it as this 2017 blog post, mainly so I wouldn’t forget how I did it. I had recently taken the 2016 acloud.guru AWS Certified Solutions Architect Associate course and saw that he created a highly-available WordPress instance that used syncing between S3 buckets, which worked alright, but I thought was kind of lame, because my mind instantly went to NFS shares instead — which I discovered had recently been made possible by Amazon’s Elastic File System. Unbelievably, in the updated 2019 acloud.guru AWS Certified Solutions Architect Associate course, he is still teaching the same method of creating a highly-available WordPress instance by copying files between S3 buckets — which to me seemed to be incredibly counter-intuitive — and at the very least, and outdated architectural solution. There are clearly better ways, as I demonstrated (albeit to myself) back in early 2017.

So…my architecture solution utilized Amazon’s Elastic File System (EFS) in order to share a single Nginx config, PHP config, as well as WordPress installation files across all EC2 instances. As an additional security measure, these host directories could be mapped as read-only container volumes, so that if an attacker were to pop himself into a container, they would have no access to modify the main part of the filesystem responsible for the website. Additionally, the containers [and even the hosts, if I had fully fleshed out this idea] could be regularly drained of network connections and destroyed, and a new container [and like I said, potentially host] could be spun up, restoring the integrity of the entire filesystem and application structure.

All updates were done via a management [bastion] host, which was segmented off from the webservers, and this host functioned mostly as a deployment manager. For instance, it could update style.css or upload and activate a plugin, and the updates would instantly propagate out to the webservers and database. A change to the Nginx config could also be made, but would require rolling restarts of the containers, which could also be managed on this management host via aws-cli or via the AWS ECS Management Console.

A couple of other minor details was that the database was using a MariaDB RDS instance rather than MySQL, which WordPress normally uses. I forget why I chose MariaDB, but I think it had something to do with performance, and the performance of this setup was outstanding — even before having the ability to implement caching.

Eventually I wanted to implement some really heavy, next-level caching using Redis, but the bills started rolling in before I ever got around to that, and I had to tear this beauty down. She ran wonderfully in production for a few months, hosting the website for one of the top cybersecurity firms in the industry…before eventually having to make some performance sacrifices due to exorbitant costs for such a simple use case.


The initial idea

The recent Amazon S3 outage showed us just how delicate the state of the web is, especially when you don’t utilize Amazon’s built-in redundancy features. My goal was to create a highly-available and scalable WordPress installation in AWS using Docker. I would have auto-scale Docker clusters in multiple Availability Zones running Nginx, PHP-FPM, and a Redis client. The Docker config and WordPress install would be on EFS volumes that would be mounted in the Docker containers. I would use an RDS MariaDB for the database backend and Redis-based ElastiCache for serving up the site blazing fast from memory. Who needs CloudFront, right?

I will go ahead and say that I have seen a few variations on some of the techniques mentioned in this article, some of which I will provide links to at some point, if I can round them all up. However, as always, I decided to pick and choose what I liked the most and what think will work best. A whole lot of what I was trying to do was largely undocumented (or poorly-documented at best), but now that I am nearing the end of the initial setup process, I will go ahead and try my best to document how it all played out.

The basics of what I needed to accomplish this
  • Underlying VPCs, Subnets, Gateways, Routes, Roles, Security Groups, and ACLs
  • Domain registrar pointing to Route 53 DNS nameservers
  • DNS A record aliases to Elastic Load Balancer IP
  • Elastic Load Balancer terminating SSL
  • ELB-to-EC2 Health Checks
  • Health Check-based DNS failover between Availability Zones
  • Autoscaling clusters of Docker web containers in at least 2 separate Availability Zones
  • DockerFile containing bootstrap commands to pull config and content from mounted EFS volumes
  • EFS volume containing Nginx/PHP-FPM/Redis client config
  • EFS volume containing WordPress installation
  • DEV environment to test updates before rolling out to the auto-scaling group
  • Management host to mount and make changes to the EFS volumes for config changes and code deployments
  • RDS MySQL database instance spanning multiple availability zones
  • Multi-AZ Redis-based ElastiCache instance to serve frequent requests stored in memory
  • A security industry professional attempting to hack it just for fun
The Amazonian breakdown
  • Roles (too many to remember):
    • RDS
    • EFS
    • ECS
    • EC2
    • ETC…
  • Security Groups:
    • ELB Tier (ports 80, 443 from 0.0.0.0/0)
    • Web Tier (ports 80, 443 from ELB Tier, SSH from my IP)
    • Management Tier (port 22 from my IP)
    • Storage Tier (port 2049 from Web Tier and Management Tier)
    • Database Tier (port 3306 from Web Tier and Management Tier)
  • Route 53:
    • DNS A record aliases to Elastic Load Balancer
  • Elastic Load Balancer (ports 80, 443 mapped to EC2 HTTP listeners)
    • Configure Health Check to /heartbeat.html
    • Response Timeout: 5 seconds
    • Health Check Interval: 30 seconds
    • Unhealthy Threshold: 10
    • Healthy Threshold: 10
    • Security Group: ELB
  • ECS:
    • Auto Scaling Service Cluster
  • EC2:
    • Bootstrapped Launch Configuration
    • Auto Scaling Group
    • Host ports 80, 443 mapped to Docker container port 1443 (via ECS task)
  • EFS:
    • Webserver config
    • WordPress code
  • RDS MariaDB Database
    • Multi A-Z
    • Not publicly-accessible

The basic flow

Here is the way I currently have it setup (which I will go into greater detail on later):

  1. DNS requests come in to the Elastic Load Balancer, which routes to one of up to 4 EC2 hosts in an Auto-Scaling Group spread across 2-3 Availability Zones.
  2. There is an ECS Bootstrapped Auto-Scaling Cluster of Docker containers spread across the EC2 host cluster in 2-3 Availability Zones.
  3. I am running an ECS Service, which is basically an Ubuntu-based Docker image running supervisord, which kicks off non-daemonized versions of nginx and php-fpm logging to /dev/stdout.
  4. The ECS Service spreads 1 supervisord (nginx/php-fpm task) across each EC2 container instance so that there is a task running in at least 2 A-Z’s at all times.
  5. The nginx task reads its configuration from a host-mounted EFS volume that is mapped to the container.
  6. The document root is a WordPress installation located on a separate host-mounted EFS volume, also mapped to the container.
  7. nginx sends PHP files to php-fpm, which also reads its config from the EFS volume.
  8. php-fpm communicates with a MariaDB-based RDS instance.
  9. Config and code changes are managed on a firewalled EC2 management host.
  10. Code changes propagate instantly to the docker containers, while config changes require rolling restarts of the supervisord task from the ECS Service control panel.

This is as far as I have gotten. There are plans to speed things up with Redis, and then it is a matter of further tweaking for security and performance. I will get nitty-gritty in the near future and hope to post some helpful links to some of the things that helped me sort all of this out. It most certainly did not happen overnight.


2019 Update

I will not, in fact, be posting a follow-up on this particular experiment (or any helpful links), because it’s been well-over 2 years, and I have forgotten nearly everything related to this project.

If I were to do this all over again — first of all, I would use Terraform (ahem, I mean Cloudformation, if anyone from AWS is reading), and I would probably use EKS — Amazon’s Elastic Kubernetes service, which simply did not exist at the time that this particular architecture was originally documented. I would also scrap supervisord and opt to decouple Nginx from php-fpm in order for them to function as two separate microservices.

So there you have it…a pretty long story about a short-lived [and completely undocumented] architecture as my first creative deep dive into the AWS platform.

Leave a Reply