pero on anything

HSCALE 0.1 released – Partitioning Using MySQL Proxy

As written here and here I’ve been working on a MySQL Proxy Lua module that transparently splits up tables into multiple partitions and rewriting all queries to go to the right partition.

I finally got everything together to release a 0.1 version. Go on and download, try and read more about HSCALE 0.1.

All this started out as a prototype just to see if it could be done. And after adopting parts of our main product to use partitions via HSCALE + MySQL Proxy (which was an easy task, we just had to rewrite a few out of hundreds of statements) I really think that this could work out in a larger scale.

What Will Come Next?

Just a few notes on what I am working on right now:

Project Page And Issue Tracker

In a few days there will be a “real” project page with more documentation and an issue tracker ready. Since we already have both in use internally this should be an easy task.

Write Another Partition Lookup Module

A partition lookup module decides the partition(s) to use for a particular query. In the current release there is only a ModulusPartitionLookup integrated. Since the partition lookup module is pluggable it is easy to write other modules doing other things. The main focus now is to implement a DictionaryLookupService which will store the information of which partition is where inside the database. This allows you to add and move partitions “on the fly”. At the end you just have more control over the partition scheme.

Along with the new partition lookup module there will be more administrative SQL commands like:
HSCALE ADD PARTITION ..., HSCALE MOVE PARTITION ... and so on.

Full Partition Scans For Queries With Multiple Partitioned Tables

In the current version HSCALE is already capable of performing full partition scans for queries that don’t use the partition column and thus don’t provide a partition key like:
SELECT * FROM my_partitioned_table;
(Just a side note: Results returned from this query are not in natural order due to the fact that the data is spread over multiple tables. Thus your application cannot rely on the natural order for statements against partitioned tables (if full partition scan is performed). You should not rely on natural order anyway.)

Even though you should avoid full partition scans where you can sometime you just have to look into every partition. And even worse sometimes you join multiple partitioned tables like:
SELECT * FROM my_partitioned_table LEFT JOIN my_other_partitioned_table ON ...;
Currently HSCALE rejects queries of this kind. In future it will join every partition of my_partitioned_table with every partition of my_other_partitioned_table. In most cases this is evil but sometimes you just have to.

Finally it is up to the partition lookup module to find out the combinations of partitions to use so we can optimize here for tables that use the same partitioning scheme.

Performance Profiling And Optimization

I was really astonished by the performance of the Lua scripting inside MySQL Proxy. I was able to analyze more than 100,000 statements in just a few seconds (without network overhead). This is already pretty good but can be improved. First of all I will have to find out the performance patterns to be used when scripting with Lua like “Is it better to inline functions?”, “Does ‘OO’ hurt?” and so on. Then performance tests analyzing both, speed and memory consumption, will have to be implemented to see if there is progression.

Distribute Partitions Across Multiple MySQL Servers

Right now we need to just split up huge tables but later on we want to distribute partitions over multiple MySQL server instances to have real horizontal scale out. The hardest part will be dealing with transactions where we have to use distributed transactions (XA) or disallow transactions involving partitions on different hosts. The latter one works well for parts of (our) application since they just don’t use transactions. Other parts will have to use XA. At this point I am not sure about the overhead XA will add but this has to be worked out once we come to this.

So, any feedback is welcome!

9 Responses to “HSCALE 0.1 released – Partitioning Using MySQL Proxy”

  • mike says:

    This is cool. Sharding pushed from application level to a transparent DB proxy?

    Didn’t even think of that. I hope this idea gets picked up and someone runs with it, in some shape or form! Thanks for getting the ball rolling!

  • Regarding all big traffic web site they all did that, but without mysql-proxy. That’s very nice to share these scripts.
    Did you find a way to dynamically add a node to the list of database servers when you need some more.

  • [...] HSCALE 0.1 released – Partitioning Using MySQL Proxy (tags: links) [...]

  • eriam says:

    Hello,

    Just a quick question, how do you plan to distribute queries and combine results that contains ORDER BY and/or LIMIT statements ?

    I’m curious because I’m working on something similar (written in Perl though).

    Thanks

  • pero says:

    eriam,

    currently LIMIT clauses are handled correctly (even though the natural order cannot be preserved due to the nature of having data spread along multiple tables). Queries that go to all tables (i.e. a full partition scan is performed) that contain ORDER BY are rejected. (Our application doesn’t use ORDER BY in full partition scans).

    The only way to implement ORDER BY in these cases would be to perform the sort in the MySQL Proxy. I would rather solve this in the application layer, i.e. rewrite queries and application logic. Remember, HSCALE is not a “magical plug-in and forget” module that abstracts the MySQL Server in all its functions but a module to aid you writing application based partitioning.

    Anyway, if you have any more ideas, please let me know!

    Greetings

    Peter

  • Kevin McCravy says:

    To make the ORDER BY work just have each database perform the ORDER BY on it’s part of the data and merge the results in the proxy. [http://en.wikipedia.org/wiki/Heapsort]

  • pero says:

    Kevin,

    it is not that easy. Especially if functions or multiple columns are involved like in: SELECT * FROM tbl_name ORDER BY CONCAT(TO_DAYS(date_column), SUBSTRING(string_column, LOCATE(‘substr’, string_column))) In order to merge all the results we would have to mimic mysql server logic (or we rewrite the query to include the order by conditions as hidden columns so we can use them for sort later on). Currently this is much better handled within your application. Remember that “full partition scan” is something you should try to avoid as much as you can anyway since it becomes quite a heavy operation. But you are right that simple ORDER BY could be integrated quite easy.

    Greetings

    Peter

  • [...] HSCALE 0.1 released – Partitioning Using MySQL Proxy (tags: mysql database cluster 247up sysadmin) [...]

  • [...] Tinynatz, Domains, Dupont Model and Business information plus more related topics on BrokerBible.compero on anything Blog Archive HSCALE 0.1 released …Regarding all big traffic web site they all did that, but without mysql-proxy. … HSCALE 0.1 [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">