Houston, we have a problem with MySQL DB
We have all been there. We are scaling up our technology stack and our user base so rapidly and it’s an exciting journey until one fine day, we have an outage due to the MySQL database being overloaded.
The kneejerk responses to these downtimes typically take these formats like
- Can we quickly upgrade it to a larger instance to handle the load?
- Is there a magic variable on my.cnf that can be set to handle the load better?
- Is there a slow query that the DBA can fix and it solves the problem once and for all?
- Let’s add everything to monitoring and we will identify the root cause?
- My CPU utilization is less than 50%. Why am I facing downtimes? Would shifting this to a different version of MySQL help?
The answer to all the questions above as well as queries from a similar thought process is an emphatic NO.
Of course, all servers being on monitoring and correctly tuned kernel and MySQL settings help apart from optimizations like tmpfs, etc but the larger longer-term solutions wouldn’t come out of it
So what do we do now? Here is a sequence of steps we follow at E2E Networks as part of DevOps Consulting to help customers troubleshoot this and put in solutions from a longer-term perspective.
First and foremost is the meta-analysis of all queries on MySQL. In a monolithic MySQL box, how many read queries are there vs Write queries? In a typical scenario, it can be an 80:20 split but it could be varying depending on the workloads
- The output from this exercise is a rough estimate of no of read vs write queries
Once you identify all the read queries (they could include reporting and analytics queries as well), it’s important to offload this to a separate read replica server
Horizontal Scaling with MySQL read replicas and proxysql
The premise behind deploying multiple read replica servers is to scale out reads. Now there could be two major challenges in using a read replica MySQL instance
- There is no support for configuring read queries to be sent to a read replica
- There is no transaction support in the queries
If there is transaction support by default, it becomes easy to use a tool like proxySQL to automatically send read queries to a slave. In case of lack of transaction support, the DBAs have to manually identify read queries from query logs and configure regular expressions on proxySQL to send those specific queries to read replicas
What not to store on MySQL?
Any optimized DB setup can come under heavy loads if its storing data which it shouldn’t. Examples of data which ideally shouldn’t be stored on MySQL and its alternatives are
- Session data – Typically user sessions which are validated on all requests. These should be stored on either a Redis or a couchbase/Memcached clusters for faster access and the huge concurrency that these can provide
- Event data – Any user actions or polled events should always be stored outside of MySQL and preferably into a No-SQL database like MongoDB
- Logs – Strict no and it should always be stored on files and shipped to an ELK server for processing the logs and visualizing them
In conclusion, the combination of read replicas where non-transactional select queries are processed along with the usage of Redis/couchbase/Memcached/Mongo for storing specific datasets that perform better on them thereby also reducing the load on MySQL better managed. The architecture is a best practice deployment which can help speed up application response times and also provide better uptime for your application.