Progress on MySQL Proxy Partitioning
As posted here I started to think about possible ways to implement database sharding/partitioning.
I finally found the time to start prototyping a MySQL Proxy based solution that would allow you to analyze and rewrite queries to direct them to different databases. So this would going to be a nearly 100% transparent solution (some queries are impossible to support due to the nature of having multiple tables in different locations).
How does it work?
The main goal is to split up mysql tables and optionally put each of the resulting partitions on different mysql servers.
For now I concentrate on splitting up big tables into smaller ones within the same database. Distribution of these tables (i.e. partitions) over multiple databases would be the final goal and a lot more challenging task (think of transactions).
The work to be done would be divided into these 4 steps:
1. Analyze the query to find out which tables are involved and what the parition key would be (i.e. the value of the partition column or a hint – more on that later).
1.a. Validate the query and reject queries that cannot be analyzed (missing partition key etc.)
2. Determine the partition table / database. This could be done by a simple lookup, a hashing function or anything else.
3. Rewrite the query and replace the table names with the partition table names.
4. Execute the query on the correct database server and return the result back to the client.
Say this is the table you want to split up:
CREATE TABLE books (
id INTEGER NOT NULL,
name VARCHAR(100) NOT NULL,
author INTEGER NOT NULL,
The partition tables for table
books are named
books_odd all with the same layout as
books. The partition column is
author so this value determines the partition table to be used. In this example we put all books of authors with an even id into
books_even and the “odd ones” (meaning the “not even” ones not the strange ones ) into
Say the following query is sent to the proxy:
SELECT * FROM books WHERE author = 3;
The proxy would do the following:
1. Analyze the query and find that table
books is used and that it is a partitioned table. We have defined that
author is our partition column so “
3” is our partition key. Both will be passed to the next part:
2. Lookup the partition table for
books and partition key “
3. Rewrite the query to:
SELECT * FROM books_odd WHERE author = 3;
4. Execute it and send the result to the client.
What is the status right now?
After getting warm with LUA and setting up a (unit) test environment, which was an easy task since MySQL Proxy already comes with a handy solution for that, I started to implement the query analyzer (step 1) and rewriter (step 3). Both utilize the tokenizer provided by the proxy and don’t rely on regular expressions because it would be to be too error prone in my opinion.
There is already a lot of code and tests ready and it turns out to work as intended. But still there is a lot of work and head ache ahead. My next goal is to run the test suite of one of our major products with about 2,000 database bound tests against the partitioner to see how this works out.
The analyzer is already able to parse various types of queries like:
SELECT * FROM books WHERE author = 3;
SELECT * FROM books AS b, prices WHERE b.author = 3;
INSERT INTO books (id, author) VALUES (1, 3);
DELETE FROM books WHERE author = 3;
UPDATE books SET name = 'new' AND author = 3;
# Hinting works like this
/* partitionKey(books) = 'books_odd' */ SELECT * FROM books;
/* skipPartition() */ SELECT 'do not analyze me!';
… and a lot more like queries including comments or joins or invalid queries (i.e. no partition key provided).
The rewriter is able to rewrite all of the queries above.
Actually, I think a good deal of the queries of the application I intended to test with would correctly be analyzed and rewritten right now. But there is still a lot of work to do on this side to handle more complex queries including subselects, functions etc.
As said above the next goal is to run the test suites of our applications utilizing the partitioner. Once this is point is reached I will post the prototype here.
Up to now all of this looks like it could really work for us so I hope there is going to be more than a prototype but we will have to see.