<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>pero on anything &#187; hscale</title>
	<link>http://pero.blogs.aprilmayjune.org</link>
	<description></description>
	<pubDate>Thu, 11 Sep 2008 15:16:17 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>LuaSQL fetches results about 15% faster than MySQL Proxy?</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/#comments</comments>
		<pubDate>Sun, 07 Sep 2008 14:57:51 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[lua]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/</guid>
		<description><![CDATA[While evaluating LuaSQL as backend connection replacement I came across this. I did a quick performance test using mysqlslap and it showed that just reading and copying the result can be significantly faster with LuaSQL.
Benchmark details
What I did was just sending the query to the backend and building up a new result-set in LUA. 
This [...]]]></description>
			<content:encoded><![CDATA[<p>While evaluating <a href="http://www.keplerproject.org/luasql/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/luasql/');">LuaSQL</a> as <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >backend connection replacement</a> I came across this. I did a quick performance test using <a href="http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html');">mysqlslap</a> and it showed that just reading and copying the result <em>can be</em> significantly faster with LuaSQL.</p>
<h2>Benchmark details</h2>
<p>What I did was just sending the query to the backend and building up a new result-set in LUA. </p>
<p>This is the code for LuaSQL:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;luasql.mysql&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> _sqlEnv = <span class="kw1">assert</span><span class="br0">&#40;</span>luasql.mysql<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> _con = <span class="kw1">nil</span></p>
<p><span class="kw1">function</span> read_auth<span class="br0">&#40;</span>auth<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> host, port = <span class="kw1">string</span>.match<span class="br0">&#40;</span>proxy.backends<span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span>.address, <span class="st0">&quot;(.*):(.*)&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="co1">&#8211; We explicitly connect to db &quot;test&quot; since mysqlslap drops the database</span><br />
&nbsp; &nbsp; <span class="co1">&#8211; and LuaSQL needs the db to exists beforehand. Anyway this is just a </span><br />
&nbsp; &nbsp; <span class="co1">&#8211; quick tests, so don&#8217;t bother.</span><br />
&nbsp; &nbsp; _con = <span class="kw1">assert</span><span class="br0">&#40;</span>_sqlEnv:connect<span class="br0">&#40;</span><span class="st0">&quot;test&quot;</span>, auth.username, auth.password, host, port<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> disconnect_client<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span><span class="br0">&#40;</span>_con:close<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> read_query<span class="br0">&#40;</span>packet<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>packet:byte<span class="br0">&#40;</span><span class="br0">&#41;</span> == proxy.COM_QUERY<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> query = packet:sub<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> result = <span class="kw1">nil</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> cur = <span class="kw1">assert</span><span class="br0">&#40;</span>_con:<span class="kw1">execute</span><span class="br0">&#40;</span>query<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw1">type</span><span class="br0">&#40;</span>cur<span class="br0">&#41;</span> == <span class="st0">&quot;number&quot;</span><span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response.<span class="kw1">type</span> = proxy.MYSQLD_PACKET_RAW;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response.packets = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;<span class="es0">\0</span>00&quot;</span> .. <span class="co1">&#8211; fields</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">string.char</span><span class="br0">&#40;</span>cur<span class="br0">&#41;</span> ..<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;<span class="es0">\0</span>00&quot;</span> <span class="co1">&#8211; insert_id</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result = proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">&#8211; Build up the result set.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> fields = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> colNames = cur:getcolnames<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> colTypes = cur:getcoltypes<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> a = <span class="nu0">1</span>, #colNames, <span class="nu0">1</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>fields, <span class="br0">&#123;</span>name = colNames<span class="br0">&#91;</span>a<span class="br0">&#93;</span>, <span class="kw1">type</span>=proxy.MYSQL_TYPE_STRING<span class="br0">&#125;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> curRow = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> rows = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>cur:fetch<span class="br0">&#40;</span>curRow<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>rows, curRow<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = proxy.MYSQLD_PACKET_OK,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; resultset = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fields = fields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rows = rows<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result = proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>result ~= <span class="kw1">nil</span><span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> result<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</div>
<p>And this is the code for using MySQL Proxy only:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">function</span> read_query<span class="br0">&#40;</span>packet<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>packet:byte<span class="br0">&#40;</span><span class="br0">&#41;</span> == proxy.COM_QUERY<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> query = packet:sub<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">&#8211; We append the query so read_query_result gets triggered.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; proxy.queries:append<span class="br0">&#40;</span><span class="nu0">1</span>, <span class="kw1">string.char</span><span class="br0">&#40;</span>proxy.COM_QUERY<span class="br0">&#41;</span> .. query<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> proxy.PROXY_SEND_QUERY <br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> _read_query_result<span class="br0">&#40;</span>inj<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> resultSet = <span class="kw1">assert</span><span class="br0">&#40;</span>inj.resultset<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> newFields = <span class="kw1">nil</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> fieldCount = <span class="nu0">1</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> fields = resultSet.fields<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>fields<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; newFields = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newFields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span>.<span class="kw1">type</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; name = fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span>.name<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fieldCount = fieldCount + <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; <span class="kw1">local</span> newRows = <span class="kw1">nil</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>resultSet.rows<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; newRows = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> row <span class="kw1">in</span> resultSet.rows <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>newRows, row<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>newFields<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; proxy.response = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = proxy.MYSQLD_PACKET_OK,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; resultset = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fields = newFields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rows = newRows<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</div>
<p>As you can see we do nothing but copy the result-set in LUA. This mimics the result-set aggregation HSCALE does if a full partition scan is necessary.</p>
<h2>Results</h2>
<p>Using LuaSQL:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p>Using MySQL Proxy only:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p>For comparison: Using empty <code>read_query</code> and <code>read_query_result</code> functions:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p><i>Versions used: MySQL Proxy 0.7.0 (svn-rev 511), LuaSQL 2.1.1, MySQL server 5.0.51a, mysqlslap 5.1.26rc</i></p>
<p>Of course I repeated the tests several times to verify the results.</p>
<p>Without digging too deep into the source of both MySQL Proxy and LuaSQL the biggest difference is that LuaSQL pushes the result-set row-by-row onto the LUA-stack whereas MySQL Proxy puts the whole result.</p>
<p><b>Update:</b> As Jan points out below this is not true. MySQL Proxy puts the result row by row onto th LUA stack, too.</p>
<p><b>Update #2:</b> The tests above ran on my Notebook (MacBookPro 2.4GHz, 4GB RAM running Ubuntu 8.04 64 bit). They are reproducible. Running the same tests on an 8-core-server putting the MySQL database on another server results in the MySQL Proxy version running <em>slightly</em> faster (about 2-5%) than the LuaSQL version.</p>
<h2>Conclusions</h2>
<p>Even though this tiny benchmark showed that the speed of LuaSQL seems to be feasible, there are still drawbacks. First of all: Depending on your workload only a fraction of your queries need result-set altering. Namely it&#8217;s only full partition scans that need this. Most of the time you just need to change the table name or the backend. And then LuaSQL is 100% slower than MySQL Proxy alone. </p>
<p>Another downside of LuaSQL is that it does not return the mysql field types but only the LUA types. This makes it impossible to build up a correct result-set that can be sent back to the client.</p>
<p>So still we need suitable (for HSCALE) backend connection handling in MySQL Proxy if we want higher performance.</p>
<p>Built-in result-set merging would be a big win, too. Then we could even have streaming combined result-sets taking the memory pressure from the proxy (since every result-set has to be fully loaded into memory).</p>
<p>That said I think about using LuaSQL for configuration handling since it is a lot easier than doing it via <code>proxy.queries:append -> read_query_result -> proxy.queries:append -> ...</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Version 0.3 of HSCALE is almost in the door</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 03:54:19 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/</guid>
		<description><![CDATA[After working on build and test improvements (for example incorporating lualint and LuaCov) as well as other lua &#8220;side-projects&#8221; (i.e. Log4LUA) we are running towards HSCALE 0.3.
The focus of the forthcoming version 0.3 of HSCALE is Dictionary Based Partition Lookup. Using this partition lookup module lets you take full control over how your partitions are [...]]]></description>
			<content:encoded><![CDATA[<p>After working on build and test improvements (for example incorporating <a href="http://lua-users.org/wiki/LuaLint" onclick="javascript:pageTracker._trackPageview('/http://lua-users.org/wiki/LuaLint');">lualint</a> and <a href="http://luacov.luaforge.net/" onclick="javascript:pageTracker._trackPageview('/http://luacov.luaforge.net/');">LuaCov</a>) as well as other lua &#8220;side-projects&#8221; (i.e. <a href="http://hscale.org/display/LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA');">Log4LUA</a>) we are running towards HSCALE 0.3.</p>
<p>The focus of the forthcoming version 0.3 of HSCALE is <b>Dictionary Based Partition Lookup</b>. Using this partition lookup module lets you take full control over how your partitions are created and where they are actually located.</p>
<p><em>Please note: Due to the <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >problems with backend connection handling</a> version 0.3 will still focus on single server backends.</em> Even though support for multiple backends is already implemented in HSCALE. Please look at the proxyUnit tests for a glimpse on how multi-server backends will be handled.</p>
<p>Please check out the current development snapshot at <a href="http://svn.hscale.org/trunk" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk');">svn.hscale.org</a> to see what is already there. Currently the partition lookup is fully functional but the administrative commands and some further hashing functions are missing (see below).</p>
<h2>How does dictionary partition lookup work?</h2>
<p>As the name implies partitions are looked up in a dictionary. The dictionary itself is stored in the main database and cached internally. So now you can freely move partitions around and create new ones.</p>
<p>What is done internally is this:</p>
<ol>
<li>Apply a hashing function (read further) to the value.</li>
<li>Lookup the partition based on the hashed value.</li>
<li>If no partition has been found use the <em>default partition</em> created for every partitioned table.</li>
<li>Return the partition (and assigned backend)</li>
</ol>
<h3>Hashing functions</h3>
<p>To reduce the number of partitions to be created and the overall administration overhead, a hashing function may be applied to the partition value before the partition is looked up.<br />
Currently there are 3 hashing functions available:</p>
<ul>
<li><code>MOD(X)</code>: A modulus function grouping <code>X</code> partition values together. Works only for numbers of course. If you have *lots* of different partition values with a smaller number of rows each, then it might be better to group them together instead of creating *lots* of partitions.<br />
Example: Using <code>MOD(3)</code> the values <code>1</code>, <code>4</code> and <code>7</code> will end up in the same partition.
</li>
<li><code>PREFIX(length)</code> Partition values are grouped together by the first <code>length</code> characters. Works on everything (everything is treated as string).<br />
Example: Using <code>PREFIX(3)</code> the values &#8220;<code>foo</code>&#8220;, &#8220;<code>foobar</code>&#8221; and &#8220;<code>footaliciuos</code>&#8221; will end up in the same partition.
</li>
<li><code>NONE()</code>: This function does &#8230; nothing! Use it if you really want a 1:1 relationship between partition values and partitions.</li>
</ul>
<p>Further hashing functions are planned (and might make it to version 0.3):</p>
<ul>
<li><code>DIV(X)</code>: As opposite to <code>MOD(X)</code> this function divides the partition value by <code>X</code>. So this like a fixed range function. While <code>MOD(X)</code> creates at most <code>X</code> partitions <code>DIV(X)</code> creates infinite number of partitions.
</li>
<li><code>DATE(pattern)</code>: Enables date-range based partitions.<br />
Example: Using <code>DATE("%Y-%m")</code> will group by (year-)month.
</li>
</ul>
<h3>Administrative commands</h3>
<p>Because handling of partitions is a delicate thing, creation and maintenance of partitions should not be left to some obscure SQL-statements. Therefor the dictionary partition lookup will provide administrative commands that try to avoid mis-configuration. It will be checked whether partitions overlap etc.</p>
<p><em>Because this is still work in progress the commands might change.</em></p>
<h4>Table setup</h4>
<p>First of all your table has to be set up:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">HSCALE SETUP_TABLE<span class="br0">&#40;</span><span class="st0">&#8216;[table]&#8217;</span>, <span class="st0">&#8216;[column]&#8217;</span>, <span class="st0">&#8216;[default table]&#8217;</span>, <span class="br0">&#91;</span>backend<span class="br0">&#93;</span>, <span class="st0">&#8216;[hashing function]&#8217;</span><span class="br0">&#41;</span></p>
<p><span class="co2"># Example</span><br />
HSCALE SETUP_TABLE<span class="br0">&#40;</span><span class="st0">&#8216;users, &#8216;</span>nickname<span class="st0">&#8216;, &#8216;</span>users_default<span class="st0">&#8216;, 1, &#8216;</span>PREFIX<span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="st0">&#8216;)</span></div>
</div>
<p>The table has to exist before and will be renamed to the default table name (<code>'users_default'</code> in our example). <code>SETUP_TABLE</code> can only be called once per table.</p>
<h4>Add partitions</h4>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">HSCALE ADD_PARTITION<span class="br0">&#40;</span><span class="st0">&#8216;[table]&#8217;</span>, <span class="st0">&#8216;[partition name]&#8217;</span>, <span class="st0">&#8216;[partition value]&#8217;</span>, <span class="br0">&#91;</span>backend<span class="br0">&#93;</span><span class="br0">&#41;</span></p>
<p><span class="co2"># Example</span><br />
HSCALE ADD_PARTITION<span class="br0">&#40;</span><span class="st0">&#8216;users&#8217;</span>, <span class="st0">&#8216;nick1&#8242;</span>, <span class="st0">&#8216;mar&#8217;</span>, <span class="nu0">1</span><span class="br0">&#41;</span></div>
</div>
<p>The example above creates a partition with the name <code>'nick1'</code> for table <code>'users'</code> and partition value <code>'mar'</code> (users with nickname <code>'marvel'</code>, <code>'martin'</code> etc.). The partition name will directly reflect to the table name of the partition. So the table for this partition will be <code>users_nick1</code>. Usually you would use the partition value (<code>'mar'</code>) as partition name but you don&#8217;t have to. You are able store multiple partitions in the same table. This sounds strange but it makes sense to define a finer partitioning scheme upfront and actually use a wider scheme until you really need to split up data. This makes it a lot easier to split things up afterwards.</p>
<h4>Moving partition data</h4>
<p>In version 0.3 <em>partition data will not be moved</em> to a newly created partition. You will have to do it by hand. This will be implemented as soon as multiple backends are fully supported because that implies a different approach since data has to be moved between different servers then. An administrative command to move partitions will also be available then (<code>HSCALE MOVE_PARTITION(...)</code>).</p>
<h3>Multiple instances of HSCALE working on the same data</h3>
<p>HSCALE is designed to support multiple instances of it running in parallel mostly to avoid to be the bottleneck and single point of failure. Every instance of HSCALE periodically refreshes the internal partition configuration to reflect changes made by another instance. In the case of creating and moving partitions, partitions will be locked inside HSCALE so all clients will wait until the operation finishes. This guarantees data integrity.</p>
<p>Version 0.3 will be released within the next few weeks but definitely after MySQL Proxy 0.7 has been released so we can thoroughly test it against the newest version.</p>
<p>Please feel free to discuss certain features and design decisions shown above. Any feedback is welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/feed/</wfw:commentRss>
		</item>
		<item>
		<title>More fun with LUA - Introducing Log4LUA</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/#comments</comments>
		<pubDate>Tue, 02 Sep 2008 17:59:20 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[lua]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/</guid>
		<description><![CDATA[UPDATE Log4LUA 0.2 released. Go to the project page.
After the dust about backend connection handling in MySQL Proxy had settled, I begun working on other parts of HSCALE again. (I&#8217;ll return to the backend handling later after talking to Jan Kneschke about their plans.)
One of the issues that bothered me the most was logging. Until [...]]]></description>
			<content:encoded><![CDATA[<p><b>UPDATE</b> Log4LUA 0.2 released. Go to the <a href="http://hscale.org/display/LUA/Log4LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA/Log4LUA');">project page</a>.</p>
<p>After the dust about <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >backend connection handling in MySQL Proxy</a> had settled, I begun working on other parts of HSCALE again. <i>(I&#8217;ll return to the backend handling later after talking to Jan Kneschke about their plans.)</i></p>
<p>One of the <a href="http://jira.hscale.org/browse/HSCALE" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE');">issues</a> that bothered me the most was logging. Until now I just had used a simple <code>debug</code> function that could be enabled or disabled using an environment variable. The pretty straightforward approach a lot of people use. But there is more to logging than to print debug messages. So I searched the web and of course I found <a href="http://www.keplerproject.org/lualogging/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/lualogging/');">LuaLogging</a>. It is more flexible, has different levels (<code>DEBUG, INFO, WARN, ERROR, FATAL</code>) and appenders. You can print your log messages to the screen, write them to a file or send an email. </p>
<p>After working with other logging frameworks in other languages (mostly Log4X like <a href="http://logging.apache.org/log4j/index.html" onclick="javascript:pageTracker._trackPageview('/http://logging.apache.org/log4j/index.html');">Log4J</a>) I got used to a number of features missing in LuaLogging:</p>
<ul>
<li><b>There are no means of external configuration.</b> The logger is configured inside the code. You cannot use different configurations for development and production out of the box. This is crucial in my opinion, since during development it is handy to enable all log levels and print everything on screen, while in production you want to log to a file and disable at least the <code>DEBUG</code> level.</li>
<li><b>You get no information where the log event came from.</b> Log messages are more meaningful if you know where the log message has been created, i.e. in which source file and line position. This makes it easier to track down bugs and errors.</li>
<li><b>There is no log category concept.</b> Having different log categories makes it easier to group log messages either by source or by meaning. As an example you could have one log category called &#8220;core&#8221; where all core messages go to and another &#8220;connection&#8221; where all connection related log messages go to.</li>
<li><b>You cannot use multiple appenders.</b> In production you might want to log all messages to a file and get an email for all errors.</li>
</ul>
<p>Since adding all this functionality to LuaLogging would in fact mean to re-write it, I wrote yet another logging facility and called it: <b>Log4LUA</b>. I choose the name because this package behaves almost like Log4X.</p>
<p><b><a href="http://hscale.org/display/LUA/Log4LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA/Log4LUA');">Download Log4LUA</a></b></p>
<p>A detailed documentation on how to use it can be found <a href="http://static.hscale.org/log4lua/api/index.html" onclick="javascript:pageTracker._trackPageview('/http://static.hscale.org/log4lua/api/index.html');">here (luadoc)</a>.</p>
<h2>Features</h2>
<ul>
<li><b>External configuration.</b> Configure your logging system via a config file.</li>
<li><b>Logger categories</b> Configure different categories for different logging tasks. </li>
<li><b>Detailed information available: source file, line, function or the whole stack trace</b></li>
<li><b>Console, file and smtp (email) appenders</b></li>
<li><b>Multiple appenders per category</b> Log to a file and get the worst errors by email</li>
<li><b>Level threshold for smtp appender.</b> Don&#8217;t get every log message by mail only the ones above a defined level.</li>
<li><b>Log file rotations for file appender.</b> Based on a date pattern</li>
</ul>
<h2>Basic usage</h2>
<p>First write a configuration file (say <code>log4lua.conf.lua</code>):</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> console = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.console&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> file = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.file&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> smtp = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.smtp&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">local</span> config = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p><span class="co1">&#8211; Create a default smtp appender sending message of level WARN or higher.</span><br />
<span class="kw1">local</span> defaultSmtp = smtp.new<span class="br0">&#40;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; headers = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; from = <span class="st0">&quot;myserver@mydomain.com&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; to = <span class="st0">&quot;admin@mydomain.com&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; subject = <span class="st0">&quot;%LEVEL: %MESSAGE&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; body = <span class="st0">&quot;Hi, an error occurred at %FILE:%LINE.<span class="es0">\n</span><span class="es0">\n</span>Level: %LEVEL<span class="es0">\n</span>Message: %MESSAGE<span class="es0">\n</span>&quot;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span>,<br />
&nbsp; &nbsp; <span class="st0">&quot;mail.mydomain.com&quot;</span>,<br />
&nbsp; &nbsp;logger.WARN<br />
<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; ROOT category must be configured.</span><br />
<span class="co1">&#8211; For the root category we use console and smtp appender</span><br />
config<span class="br0">&#91;</span><span class="st0">&quot;ROOT&quot;</span><span class="br0">&#93;</span> = logger.Logger.new<span class="br0">&#40;</span><span class="br0">&#123;</span>console.new<span class="br0">&#40;</span><span class="br0">&#41;</span>, defaultSmtp<span class="br0">&#125;</span>, <span class="st0">&quot;ROOT&quot;</span>, logger.INFO<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Category &quot;core&quot; uses rotating file appender and the default message pattern</span><br />
config<span class="br0">&#91;</span><span class="st0">&quot;core&quot;</span><span class="br0">&#93;</span> = logger.Logger.new<span class="br0">&#40;</span>file.new<span class="br0">&#40;</span><span class="st0">&quot;core-%s.log&quot;</span>, <span class="st0">&quot;%Y-%m-%d&quot;</span><span class="br0">&#41;</span>, <span class="st0">&quot;core&quot;</span>, logger.INFO<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; The config table must be returned.</span><br />
<span class="kw1">return</span> config</div>
</div>
<p>In your lua source (say <code>mycode.lua</code>) use the following to get a logger:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; I would prefer calling the main logger LOG.</span><br />
<span class="co1">&#8211; This will return a logger for category &quot;ROOT&quot; since there is no category &quot;mycode&quot;.</span><br />
<span class="kw1">local</span> LOG = logger.getLogger<span class="br0">&#40;</span><span class="st0">&quot;mycode&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; This will return a logger of the category &quot;core&quot; since the category name &quot;core.moduleA&quot; starts with &quot;core&quot;</span><br />
<span class="kw1">local</span> LOG_CORE = logger.getLogger<span class="br0">&#40;</span><span class="st0">&quot;core.moduleA&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Log something</span><br />
LOG:info<span class="br0">&#40;</span><span class="st0">&quot;My first log message&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Tables are converted to string for you.</span><br />
LOG:warn<span class="br0">&#40;</span><span class="br0">&#123;</span>name = <span class="st0">&quot;Paul&quot;</span>, age = <span class="st0">&quot;22&quot;</span><span class="br0">&#125;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Log to different category.</span><br />
LOG_CORE:fatal<span class="br0">&#40;</span><span class="st0">&quot;System error&quot;</span><span class="br0">&#41;</span></div>
</div>
<p>Start your program with:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">LOG4LUA_CONFIG_FILE = <span class="st0">&quot;log4lua.conf.lua&quot;</span> lua mycode.lua</div>
</div>
<p>For more examples look at the <a href="http://svn.hscale.org/trunk/hscale" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk/hscale');">HSCALE</a> sources. Log4LUA is used everywhere.</p>
<p>Please feel free to comment on this. Or send usage or bug reports. I will setup a project page later on.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/feed/</wfw:commentRss>
		</item>
		<item>
		<title>MySQL Proxy vs. HSCALE</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 00:07:52 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/</guid>
		<description><![CDATA[Recently I added the first code to support multiple MySQL backends in HSCALE (see svn.hscale.org). As a &#8220;side note&#8221; I have to thank Giuseppe Maxia for MySQL Sandbox which made multi server testing a bliss!
While coding this I started to feel that writing HSCALE on top of MySQL Proxy is no more as easy and [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I added the first code to support multiple MySQL backends in HSCALE (see <a href="http://svn.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org');">svn.hscale.org</a>). <i>As a &#8220;side note&#8221; I have to thank Giuseppe Maxia for <a href="https://launchpad.net/mysql-sandbox" onclick="javascript:pageTracker._trackPageview('/https://launchpad.net/mysql-sandbox');">MySQL Sandbox</a> which made multi server testing a bliss!</i></p>
<p>While coding this I started to feel that writing HSCALE on top of MySQL Proxy is no more as easy and clean as it started out to be. And finally I reached the point where I have to say: </p>
<p><b>HSCALE (and maybe other advanced multi-server applications) and (current) MySQL Proxy don&#8217;t fit very well.</b></p>
<p>Before I go into details please let me make clear that this is mostly due to the specific nature of MySQL Proxy being a connection and protocol based proxy. MySQL Proxy is great and there are a lot of cool applications that fit perfectly with it.</p>
<p>Aside from some other minor glitches (like missing tokens in <a href="http://bugs.mysql.com/bug.php?id=36277" onclick="javascript:pageTracker._trackPageview('/http://bugs.mysql.com/bug.php?id=36277');">SQL tokenizer</a>) the biggest show stopper is:</p>
<h2>Handling of multiple backends</h2>
<p>The biggest problem I had to face is the handling of backend connections in MySQL Proxy. What I would need for HSCALE are dedicated connections to every backend for every proxy connection. So if a client connects to the proxy, a connection to each backend is opened and attached to this particular client connection. This way I easily could maintain the state of the connection by sending commands like <code>SET <i>variable</i></code> or <code>USE <i>database</i></code> to all backends.<br />
Dedicated connections are vital for XA or transactions in general, too. </p>
<p>The way the proxy handles backend connections right now is somewhat cumbersome (See an <a href="http://forge.mysql.com/tools/tool.php?id=151" onclick="javascript:pageTracker._trackPageview('/http://forge.mysql.com/tools/tool.php?id=151');">example</a>.). The way you have to maintain your own connection pool is hard to understand and uses too much &#8220;magic&#8221; in my opinion.  And the pool only grows with the number of connections made to the proxy. People are having problems with this approach (read the <a href="http://forums.mysql.com/list.php?146" onclick="javascript:pageTracker._trackPageview('/http://forums.mysql.com/list.php?146');">MySQL Proxy forum</a>, for instance <a href="http://forums.mysql.com/read.php?146,197085,197085#msg-197085" onclick="javascript:pageTracker._trackPageview('/http://forums.mysql.com/read.php?146,197085,197085#msg-197085');">this thread</a>). </p>
<p>The current experimental multi-server code in HSCALE uses this connection pooling technique since it is the only way to establish connections to other backend servers. </p>
<h3>So what can we do now to implement multi-server support in HSCALE?</h3>
<ol>
<li>
<b>Wait until MySQL Proxy supports dedicated backend connections</b> This would be the easiest and most elegant solution. But it could be that Jan Kneschke and the other developers decide that this is not what they intended to do with the proxy. And this would be totally ok from their point of view! Even if they decide that this is a cool feature it could take a long while until it is implemented. (Hint: I would gladly help out here <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ).
</li>
<li>
<b>Work around this using <a href="http://www.keplerproject.org/luasql/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/luasql/');">luasql</a></b> While possible this solution would not be preferable since we are using two technologies for the same thing.
</li>
<li>
<b>Switch to another proxy</b> There are quite <a href="http://krow.livejournal.com/595518.html" onclick="javascript:pageTracker._trackPageview('/http://krow.livejournal.com/595518.html');">a few</a>. This would be no fun and I did not take a look at them all to see if it is even possible. It would be complete rewrite though.
</li>
<li><b>Fork MySQL Proxy or implement a plugin</b> The new plugin technology could be used to implement the multi-backend connection stuff myself. As a last resort, I could fork the whole project (like <a href="http://spockproxy.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/http://spockproxy.sourceforge.net/');">Spock Proxy</a> did, see below). This would still mean a lot of work though and loss of the MySQL efforts put into the proxy.
</li>
<li>
<b>Use other sharding technologies and eventually abandon HSCALE</b> As an example <a href="http://spockproxy.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/http://spockproxy.sourceforge.net/');">Spock Proxy</a>, a fork of MySQL Proxy does a great deal of what we intend to do with HSCALE.
</li>
<li>
<b>Re-implement HSCALE as a JDBC driver</b> Since our applications are written in Java exclusively we could do that.
</ol>
<p><b>Of course I would love to continue HSCALE as a MySQL Proxy application!</b> The main reasons are the efforts MySQL is putting into MySQL Proxy, the extensibility we gain combining multiple LUA scripts (there are more things we do with the proxy apart from HSCALE) and last but not least the community echo HSCALE already received.</p>
<p>As a next step I will take a deeper look into the proxy code and evaluate the efforts needed to add dedicated connections. </p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/feed/</wfw:commentRss>
		</item>
		<item>
		<title>HSCALE 0.2 released and new project web page</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/#comments</comments>
		<pubDate>Tue, 06 May 2008 20:23:06 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/</guid>
		<description><![CDATA[The main focus of version 0.2 was to improve handling of almost all of SQL. So now you can issue DESC TABLE tbl_name or RENAME TABLE tbl_name TO another_tbl_name on a partitioned table and you get correct results for SHOW TABLES etc.
Other statements are rejected and we settled down for the feature set we want [...]]]></description>
			<content:encoded><![CDATA[<p>The main focus of version 0.2 was to improve handling of almost all of SQL. So now you can issue <code>DESC TABLE tbl_name</code> or <code>RENAME TABLE tbl_name TO another_tbl_name</code> on a partitioned table and you get correct results for <code>SHOW TABLES</code> etc.<br />
Other statements are rejected and we settled down for the feature set we want to provide for <em>full partition scans</em>.<br />
In addition to that there are some performance improvements.</p>
<p>See the <a href="http://jira.hscale.org/browse/HSCALE?report=com.atlassian.jira.plugin.system.project:changelog-panel" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE?report=com.atlassian.jira.plugin.system.project:changelog-panel');">full list of changes</a>.</p>
<p>For the next release (0.3) we focus on the dictionary based partition lookup module and further performance improvements.</p>
<p>Finally, there is a new project home page: <a href="http://www.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://www.hscale.org');">http://www.hscale.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Update: Benchmark HSCALE with MySQL Proxy 0.7.0 (svn) against 0.6.1</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/05/05/update-benchmark-hscale-with-mysql-proxy-070-svn-against-061/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/05/05/update-benchmark-hscale-with-mysql-proxy-070-svn-against-061/#comments</comments>
		<pubDate>Mon, 05 May 2008 19:26:50 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/05/05/update-benchmark-hscale-with-mysql-proxy-070-svn-against-061/</guid>
		<description><![CDATA[Earlier today I posted these benchmark results testing HSCALE and MySQL Proxy performance.
As Jan Kneschke (the author of MySQL Proxy) pointed out there are quite some improvements in the current development version (svn trunk). So I gave revision 369 a try.
Tests were all the same as mentioned in my previous post. And indeed we see [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier today I posted <a href="http://pero.blogs.aprilmayjune.org/2008/05/05/benchmark-mysql-proxy-and-hscale/" >these benchmark results</a> testing HSCALE and MySQL Proxy performance.</p>
<p>As Jan Kneschke (the author of MySQL Proxy) pointed out there are quite some improvements in the current development version (svn trunk). So I gave revision 369 a try.</p>
<p>Tests were all the same as mentioned in my previous post. And indeed we see quite dramatic improvements. While the performance of the Lua script stayed almost the same the footprint of the proxy itself sank to only 50 to 65%. Here are the numbers:</p>
<table cellspacing="5">
<tr>
<th>Version / Concurrency</th>
<th>MySQL</th>
<th>MySQL Proxy</th>
<th>Empty Lua</th>
<th>Tokenizer</th>
<th>QueryAnalyzer</th>
<th>HSCALE w/o partitions</th>
<th>HSCALE w/ partitions</th>
</tr>
<tr>
<td>0.6.1 / 40</td>
<td>217</td>
<td>1302</td>
<td>7667</td>
<td>7091</td>
<td>6162</td>
<td>7552</td>
<td>7577</td>
</tr>
<tr>
<td>0.6.1 / 20</td>
<td>217</td>
<td>557</td>
<td>2536</td>
<td>4532</td>
<td>4524</td>
<td>4325</td>
<td>4564</td>
</tr>
<tr style="background-color: #cccccc">
<td>0.6.1 / 10</td>
<td>287</td>
<td>641</td>
<td>675</td>
<td>1179</td>
<td>1813</td>
<td>738</td>
<td>2711</td>
</tr>
<tr>
<td>0.6.1 / 1</td>
<td>1906</td>
<td>3914</td>
<td>4574</td>
<td>5299</td>
<td>5411</td>
<td>4465</td>
<td>6957</td>
</tr>
<tr>
<td>0.7.0 / 40</td>
<td>229</td>
<td>1061</td>
<td>5165</td>
<td>4786</td>
<td>5844</td>
<td>6163</td>
<td>5950</td>
</tr>
<tr>
<td>0.7.0 / 20</td>
<td>222</td>
<td>331</td>
<td>2553</td>
<td>1900</td>
<td>2968</td>
<td>3927</td>
<td>4074</td>
</tr>
<tr style="background-color: #cccccc">
<td>0.7.0 / 10</td>
<td>297</td>
<td>489</td>
<td>499</td>
<td>930</td>
<td>1601</td>
<td>550</td>
<td>2413</td>
</tr>
<tr>
<td>0.7.0 / 1</td>
<td>1937</td>
<td>2895</td>
<td>2614</td>
<td>3814</td>
<td>4499</td>
<td>3235</td>
<td>5578</td>
</tr>
</table>
<p>(all values are &#8220;time in ms&#8221;)</p>
<p>Looking at the highlighted rows (concurrency = 10) you see that the difference between the MySQL server and MySQL Proxy is much smaller for the svn version. This is a great step forward!</p>
<p>If you compare all the other numbers and relate them to the execution time of the MySQL Proxy you see that the overhead stayed pretty much the same. So we see great improvements in general footprint but not in Lua execution.</p>
<p>And still the test does not scale well beyond 10 parallel threads. As Kay Roepke (co-author of MySQL Proxy) pointed out MySQL Proxy is currently single threaded and thus improvements on this were not expected (but would have been fine <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ).</p>
<p>I hope version 0.7.0 is to be released quite soon (last commit in SVN is from February??? according to <a href="http://svn.mysql.com/fisheye/browse/mysql-proxy" onclick="javascript:pageTracker._trackPageview('/http://svn.mysql.com/fisheye/browse/mysql-proxy');">http://svn.mysql.com/fisheye/browse/mysql-proxy</a>) since the performance improvement is simply great and this would help MySQL Proxy gaining more acceptance as the &#8220;latency&#8221; is often the number one &#8220;reason&#8221; not to try out MySQL Proxy. </p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/05/05/update-benchmark-hscale-with-mysql-proxy-070-svn-against-061/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Benchmark MySQL Proxy and HSCALE</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/05/05/benchmark-mysql-proxy-and-hscale/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/05/05/benchmark-mysql-proxy-and-hscale/#comments</comments>
		<pubDate>Mon, 05 May 2008 15:31:47 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/05/05/benchmark-mysql-proxy-and-hscale/</guid>
		<description><![CDATA[As part of developing HSCALE, a partitioning / sharding solution, I set up a benchmark test suite. I made it scripted and thus repeatable to monitor the progress and performance regressions during the development.
Test Suite
The test suite uses mysqlslap to benchmark the overhead of MySQL Proxy itself in real life scenario as well as the [...]]]></description>
			<content:encoded><![CDATA[<p>As part of developing <a href="http://hscale.org" onclick="javascript:pageTracker._trackPageview('/http://hscale.org');">HSCALE</a>, a partitioning / sharding solution, I set up a benchmark test suite. I made it scripted and thus repeatable to monitor the progress and performance regressions during the development.</p>
<h2>Test Suite</h2>
<p>The test suite uses <a href="http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html');">mysqlslap</a> to benchmark the overhead of <a href="http://forge.mysql.com/wiki/MySQL_Proxy" onclick="javascript:pageTracker._trackPageview('/http://forge.mysql.com/wiki/MySQL_Proxy');">MySQL Proxy</a> itself in real life scenario as well as the different components of HSCALE - query analyzing and query rewriting. The complete test suite is available in the svn trunk at <a href="http://svn.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org');">http://svn.hscale.org</a> under <code>hscale/test/performance/mysqlslap</code>. There you find a <code>build.xml</code> - an <a href="http://ant.apache.org" onclick="javascript:pageTracker._trackPageview('/http://ant.apache.org');">Ant</a> buildfile that is used to set up the test environment and perform the tests.</p>
<h2>Test Strategy</h2>
<p>There are several things we want to find out using this benchmark:</p>
<ol>
<li>How much overhead adds MySQL Proxy in a multiple server setup?</li>
<li>Does using Lua scripts add substantial overhead?</li>
<li>How much resources does the proxy.tokenizer use?</li>
<li>How does HSCALE perform on <em>unpartitioned</em> tables?</li>
<li>How does HSCALE perform on <em>partitioned</em> tables?</li>
</ol>
<p>As stated above <a href="http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html');">mysqlslap</a> is used to generate multi-threaded load. mysqlslap is used to fire this statement:<br />
<code type="sql"><br />
SELECT<br />
    id, category<br />
FROM small<br />
WHERE<br />
    small.category='books'<br />
    /* Added */ /* some */ /* comments */<br />
    /* to */ /* produce */ /* a */ /* higher */<br />
    /* tokenizer */ /* load */<br />
</code></p>
<p>against this table and content:<br />
<code type="sql"><br />
CREATE TABLE small (<br />
    id INT UNSIGNED NOT NULL,<br />
    category ENUM('books', 'hardware', 'software') NOT NULL,<br />
    PRIMARY KEY(id)<br />
) ENGINE=HEAP;</p>
<p>INSERT INTO small (id, category) VALUES (1, 'books');<br />
INSERT INTO small (id, category) VALUES (2, 'hardware');<br />
INSERT INTO small (id, category) VALUES (3, 'software');<br />
</code></p>
<p>Each run sends 10,000 queries to the MySQL Server or MySQL Proxy respectively.</p>
<h2>Test Setup</h2>
<ol>
<li>A MySQL server instance (5.0.54-enterprise-gpl-log) on a DELL PowerEdge 2850, 2xQuadCore 2.8GHz, 12GB RAM</li>
<li>A server running MySQL Proxy (version 0.6.1) instances exclusively (DELL PowerEdge 2950, 2xQuadCore 2.33GHz, 8GB RAM)</li>
<li>A test runner on a DELL PowerEdge 1950, 2xQuadCore 1.8GHz, 8GB RAM.</li>
</ol>
<p>The test suite is totally CPU and memory bound so the IO system doesn&#8217;t matter here.</p>
<h2>Results</h2>
<p><img src='http://pero.blogs.aprilmayjune.org/files/2008/05/benchmark_hscale_02_20080505.png' alt='benchmark_hscale_0.2_20080505' /></p>
<table cellpadding="5">
<tr>
<th>Concurrency</th>
<th>MySQL</th>
<th>MySQL Proxy</th>
<th>Empty Lua</th>
<th>Tokenizer</th>
<th>QueryAnalyzer</th>
<th>HSCALE w/o partitions</th>
<th>HSCALE w/ partitions</th>
</tr>
<tr>
<td>40</td>
<td>217</td>
<td>1302</td>
<td>7667</td>
<td>7091</td>
<td>6162</td>
<td>7552</td>
<td>7577</td>
</tr>
<tr>
<td>20</td>
<td>217</td>
<td>557</td>
<td>2536</td>
<td>4532</td>
<td>4524</td>
<td>4325</td>
<td>4564</td>
</tr>
<tr>
<td>10</td>
<td>287</td>
<td>641</td>
<td>675</td>
<td>1179</td>
<td>1813</td>
<td>738</td>
<td>2711</td>
</tr>
<tr>
<td>1</td>
<td>1906</td>
<td>3914</td>
<td>4574</td>
<td>5299</td>
<td>5411</td>
<td>4465</td>
<td>6957</td>
</tr>
</table>
<p>Each test means:</p>
<ol>
<li>MySQL: Test ran directly against a mysql server</li>
<li>MySQL Proxy: Test ran directly against a MySQL Proxy server with no additional configuration / script</li>
<li>Empty Lua: A Lua script with an empty <code>function read_request(packet)</code> has been used</li>
<li>Tokenizer: Each query has been tokenized using <code>proxy.tokenizer</code></li>
<li>QueryAnalyzer: Tokenizer and query analyzer are used but no query rewriting</li>
<li>HSCALE w/o partitions: HSCALE is used but the table is not partitioned</li>
<li>HSCALE w/ partitions: HSCALE is used against a partitioned table</li>
</ol>
<h2>Conclusions</h2>
<p>First of all: Please note that these benchmarks measure the <b>maximum overhead</b> of each component and that overhead is constant meaning that a statement that takes 1 minute to complete on the MySQL server does not take 2 minutes when using MySQL Proxy. </p>
<h3>CPU As Limiting Factor</h3>
<p>As you can see with a concurrency of 20 or more everything gets worse and worse. This is because the MySQL Proxy / Lua performance becomes CPU bound. In addition to that you can see that the time is spent anywhere but within the Lua scripts: While we see quite distinct performance values for lower concurrencies (HSCALE w/ and w/o partitions show a huge difference) every benchmarks takes almost the same time at 20 or 40 parallel threads.</p>
<p>Looking at <code>top</code> the MySQL Proxy seems to be using a single CPU out of 8 available. If this is the case it would be extremely desirable to have MySQL Proxy use all available resources.</p>
<h3>MySQL Proxy Overhead</h3>
<p>As we can see putting a plain MySQL Proxy between application and MySQL server adds about 100% to 150% to the average overall performance. This is what we could have expected because of the added latency - packets are going through 2 hops instead of 1.</p>
<p>With higher concurrency the overhead grows until it totally drops at 40 parallel threads. Here CPU seems to be the limiting factor.</p>
<h3>Lua Scripts</h3>
<p>Adding an empty Lua script to the configuration results in little overhead up to a concurrency of 10. With higher concurrency everything gets worse. Again CPU seems to be the limiting factor.</p>
<h3>Tokenizer</h3>
<p>The SQL tokenizer adds about 75% compared to an empty Lua script. So we should avoid it as much as we can. With the results of this benchmark we were able to improve the overall HSCALE performance for non-partitioned tables (see this <a href="http://jira.hscale.org/browse/HSCALE-31" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE-31');">Issue</a>). </p>
<h3>QueryAnalyzer</h3>
<p>Since the QueryAnalyzer utilizes the tokenizer it implies its overhead and adds additional 50% (at a concurrency of 10). Here is a lot of room for improvement. Currently the analyzer is almost complete so we can concentrate on performance. First of all the algorithm could be optimized (anticipating the fastest path) and then more hinting could be added.</p>
<h3>HSCALE w/o Partitions</h3>
<p>After implementing an improvement for this <a href="http://jira.hscale.org/browse/HSCALE-31" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE-31');">Issue</a> (avoiding tokenizer) we see that performance for queries against non-partitioned tables is almost as good as for empty Lua scripts.</p>
<h3>HSCALE w/ Partitions</h3>
<p>Looking at the concurrency level of 10 we see that HSCALE performs 10 times slower that the MySQL server and 5 times slower than an empty MySQL Proxy. Needless to say that this is quite a huge number. With performance improvements we might lower this to a factor of 2 or 3 times slower than MySQL Proxy itself. This is ok since we are still able to perform more than 3,000 statements / s. And finally we are able to use multiple proxies to spread the load.</p>
<h2>Final Thoughts</h2>
<p>This benchmark showed us mainly 3 things:</p>
<ol>
<li>MySQL Proxy adds the expected latency overhead - but not more. Average is about 0.035 milliseconds per query.</li>
<li>Scaling of MySQL Proxy could be improved - using all CPUs</li>
<li>HSCALE adds a maximum overhead of about 0.24 ms per query (against a partitioned table).</li>
</ol>
<p>Please feel free to comment on the results or run the tests on your own.</p>
<p><b>UPDATE:</b> Corrected the number of milliseconds MySQL Proxy and HSCALE add per query: Old were 0.35 ms for proxy and 2.4 ms for HSCALE. The correct numbers are 0.035 ms for proxy and 0.24 ms for HSCALE.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/05/05/benchmark-mysql-proxy-and-hscale/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Presentation Slides: Introduction to HSCALE</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/04/15/presentation-slides-introduction-to-hscale/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/04/15/presentation-slides-introduction-to-hscale/#comments</comments>
		<pubDate>Tue, 15 Apr 2008 14:09:24 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/04/15/presentation-slides-introduction-to-hscale/</guid>
		<description><![CDATA[No, these slides are not fresh from the User Conference in Santa Clara&#8230;  
Today, I held a presentation in front of all developers and support engineers of our technical department about database partitioning, MySQL Proxy, HSCALE and the progress we are making. 
Download the presentation slides here.
]]></description>
			<content:encoded><![CDATA[<p>No, these slides are not fresh from the User Conference in Santa Clara&#8230; <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Today, I held a presentation in front of all developers and support engineers of our technical department about database partitioning, MySQL Proxy, HSCALE and the progress we are making. </p>
<p>Download the presentation slides <a href="http://pero.blogs.aprilmayjune.org/files/2008/04/introduction_to_hscale.pdf" >here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/04/15/presentation-slides-introduction-to-hscale/feed/</wfw:commentRss>
		</item>
		<item>
		<title>HSCALE 0.1 released - Partitioning Using MySQL Proxy</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/04/10/hscale-01-released-partitioning-using-mysql-proxy/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/04/10/hscale-01-released-partitioning-using-mysql-proxy/#comments</comments>
		<pubDate>Wed, 09 Apr 2008 23:17:17 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/04/10/hscale-01-released-partitioning-using-mysql-proxy/</guid>
		<description><![CDATA[As written here and here I&#8217;ve been working on a MySQL Proxy Lua module that transparently splits up tables into multiple partitions and rewriting all queries to go to the right partition.
I finally got everything together to release a 0.1 version. Go on and download, try and read more about HSCALE 0.1.
All this started out [...]]]></description>
			<content:encoded><![CDATA[<p>As written <a href="http://pero.blogs.aprilmayjune.org/2008/03/26/mysql-partitioning-on-application-side/" >here</a> and <a href="http://pero.blogs.aprilmayjune.org/2008/03/29/progress-on-mysql-proxy-partitioning/" >here</a> I&#8217;ve been working on a MySQL Proxy Lua module that transparently splits up tables into multiple partitions and rewriting all queries to go to the right partition.</p>
<p>I finally got everything together to release a 0.1 version. <b><a href="http://www.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://www.hscale.org');">Go on and download, try and read more about HSCALE 0.1</a></b>.</p>
<p>All this started out as a prototype just to see if it could be done. And after adopting parts of our main product to use partitions via HSCALE + MySQL Proxy (which was an easy task, we just had to rewrite a few out of hundreds of statements) I really think that this could work out in a larger scale. </p>
<h3>What Will Come Next?</h3>
<p>Just a few notes on what I am working on right now:</p>
<h4>Project Page And Issue Tracker</h4>
<p>In a few days there will be a &#8220;real&#8221; project page with more documentation and an issue tracker ready. Since we already have both in use internally this should be an easy task.</p>
<h4>Write Another Partition Lookup Module</h4>
<p>A partition lookup module decides the partition(s) to use for a particular query. In the current release there is only a <code>ModulusPartitionLookup</code> integrated. Since the partition lookup module is pluggable it is easy to write other modules doing other things. The main focus now is to implement a <code>DictionaryLookupService</code> which will store the information of which partition is where inside the database. This allows you to add and move partitions &#8220;on the fly&#8221;. At the end you just have more control over the partition scheme.</p>
<p>Along with the new partition lookup module there will be more administrative SQL commands like:<br />
<code>HSCALE ADD PARTITION ...</code>, <code>HSCALE MOVE PARTITION ...</code> and so on.</p>
<h4>Full Partition Scans For Queries With Multiple Partitioned Tables</h4>
<p>In the current version HSCALE is already capable of performing full partition scans for queries that don&#8217;t use the partition column and thus don&#8217;t provide a partition key like:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">SELECT</span> * <span class="kw1">FROM</span> my_partitioned_table;</div>
</div>
<p>(<em>Just a side note: Results returned from this query are not in natural order due to the fact that the data is spread over multiple tables. Thus your application cannot rely on the natural order for statements against partitioned tables (if full partition scan is performed). You should not rely on natural order anyway.</em>)</p>
<p>Even though you should avoid full partition scans where you can sometime you just have to look into every partition. And even worse sometimes you join multiple partitioned tables like:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">SELECT</span> * <span class="kw1">FROM</span> my_partitioned_table <span class="kw1">LEFT</span> <span class="kw1">JOIN</span> my_other_partitioned_table <span class="kw1">ON</span> &#8230;;</div>
</div>
<p>Currently HSCALE rejects queries of this kind. In future it will join every partition of <code>my_partitioned_table</code> with every partition of <code>my_other_partitioned_table</code>. In most cases this is <em>evil</em> but sometimes you just have to.</p>
<p>Finally it is up to the partition lookup module to find out the combinations of partitions to use so we can optimize here for tables that use the same partitioning scheme.</p>
<h4>Performance Profiling And Optimization</h4>
<p>I was really astonished by the performance of the Lua scripting inside MySQL Proxy. I was able to analyze more than 100,000 statements in just a few seconds (without network overhead). This is already pretty good but can be improved. First of all I will have to find out the performance patterns to be used when scripting with Lua like &#8220;Is it better to inline functions?&#8221;, &#8220;Does &#8216;OO&#8217; hurt?&#8221; and so on. Then performance tests analyzing both, speed and memory consumption, will have to be implemented to see if there is progression.</p>
<h4>Distribute Partitions Across Multiple MySQL Servers</h4>
<p>Right now we need to just split up huge tables but later on we want to distribute partitions over multiple MySQL server instances to have real horizontal scale out. The hardest part will be dealing with transactions where we have to use distributed transactions (XA) or disallow transactions involving partitions on different hosts. The latter one works well for parts of (our) application since they just don&#8217;t use transactions. Other parts will have to use XA. At this point I am not sure about the overhead XA will add but this has to be worked out once we come to this.</p>
<p>So, any feedback is welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/04/10/hscale-01-released-partitioning-using-mysql-proxy/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Progress on MySQL Proxy Partitioning</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/03/29/progress-on-mysql-proxy-partitioning/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/03/29/progress-on-mysql-proxy-partitioning/#comments</comments>
		<pubDate>Sat, 29 Mar 2008 03:02:10 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[partition mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/03/29/progress-on-mysql-proxy-partitioning/</guid>
		<description><![CDATA[As posted here I started to think about possible ways to implement database sharding/partitioning.
I finally found the time to start prototyping a MySQL Proxy based solution that would allow you to analyze and rewrite queries to direct them to different databases. So this would going to be a nearly 100% transparent solution (some queries are [...]]]></description>
			<content:encoded><![CDATA[<p>As posted <a href="http://pero.blogs.aprilmayjune.org/2008/03/26/mysql-partitioning-on-application-side/" >here</a> I started to think about possible ways to implement database sharding/partitioning.</p>
<p>I finally found the time to start prototyping a MySQL Proxy based solution that would allow you to analyze and rewrite queries to direct them to different databases. So this would going to be a nearly 100% transparent solution (some queries are impossible to support due to the nature of having multiple tables in different locations).</p>
<p><b>How does it work?</b><br />
The main goal is to split up mysql tables and optionally put each of the resulting partitions on different mysql servers.</p>
<p>For now I concentrate on splitting up big tables into smaller ones within the same database. Distribution of these tables (i.e. partitions) over multiple databases would be the final goal and a lot more challenging task (think of transactions).</p>
<p>The work to be done would be divided into these 4 steps:</p>
<p>1. Analyze the query to find out which tables are involved and what the <b>parition key</b> would be (i.e. the value of the <b>partition column</b> or a hint - more on that later).<br />
1.a. Validate the query and reject queries that cannot be analyzed (missing partition key etc.)</p>
<p>2. Determine the <b>partition table</b> / database. This could be done by a simple lookup, a hashing function or anything else.</p>
<p>3. Rewrite the query and replace the table names with the <b>partition table</b> names.</p>
<p>4. Execute the query on the correct database server and return the result back to the client.</p>
<p>An example:</p>
<p>Say this is the table you want to split up:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">CREATE</span> <span class="kw1">TABLE</span> books <span class="br0">&#40;</span><br />
&nbsp; &nbsp; id INTEGER <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; name VARCHAR<span class="br0">&#40;</span><span class="nu0">100</span><span class="br0">&#41;</span> <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; author INTEGER <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">PRIMARY</span> <span class="kw1">KEY</span><span class="br0">&#40;</span>id<span class="br0">&#41;</span><br />
<span class="br0">&#41;</span>;</div>
</div>
<p>The <b>partition tables</b> for table <code>books</code> are named <code>books_even</code> and <code>books_odd</code> all with the same layout as <code>books</code>. The <b>partition column</b> is <code>author</code> so this value determines the <b>partition table</b> to be used. In this example we put all books of authors with an even id into <code>books_even</code> and the &#8220;odd ones&#8221; (meaning the &#8220;not even&#8221; ones not the strange ones <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ) into <code>books_odd</code>.</p>
<p>Say the following query is sent to the proxy:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">SELECT</span> * <span class="kw1">FROM</span> books <span class="kw1">WHERE</span> author = <span class="nu0">3</span>;</div>
</div>
<p>The proxy would do the following:</p>
<p>1. Analyze the query and find that table <code>books</code> is used and that it is a partitioned table. We have defined that <code>author</code> is our <b>partition column</b> so &#8220;<code>3</code>&#8221; is our <b>partition key</b>. Both will be passed to the next part:</p>
<p>2. Lookup the <b>partition table</b> for <code>books</code> and <b>partition key</b> &#8220;<code>3</code>&#8221; => <code>books_odd</code>.</p>
<p>3. Rewrite the query to:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">SELECT</span> * <span class="kw1">FROM</span> books_odd <span class="kw1">WHERE</span> author = <span class="nu0">3</span>;</div>
</div>
<p>4. Execute it and send the result to the client.</p>
<p><b>What is the status right now?</b><br />
After getting warm with LUA and setting up a (unit) test environment, which was an easy task since MySQL Proxy already comes with a handy solution for that, I started to implement the query analyzer (step 1) and rewriter (step 3). Both utilize the tokenizer provided by the proxy and don&#8217;t rely on regular expressions because it would be to be too error prone in my opinion.</p>
<p>There is already a lot of code and tests ready and it turns out to work as intended. But still there is a lot of work and head ache ahead. My next goal is to run the test suite of one of our major products with about 2,000 database bound tests against the partitioner to see how this works out. </p>
<p>The analyzer is already able to parse various types of queries like:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">SELECT</span> * <span class="kw1">FROM</span> books <span class="kw1">WHERE</span> author = <span class="nu0">3</span>;<br />
<span class="kw1">SELECT</span> * <span class="kw1">FROM</span> books <span class="kw1">AS</span> b, prices <span class="kw1">WHERE</span> b.author = <span class="nu0">3</span>;<br />
<span class="kw1">INSERT</span> <span class="kw1">INTO</span> books <span class="br0">&#40;</span>id, author<span class="br0">&#41;</span> <span class="kw1">VALUES</span> <span class="br0">&#40;</span><span class="nu0">1</span>, <span class="nu0">3</span><span class="br0">&#41;</span>;<br />
<span class="kw1">DELETE</span> <span class="kw1">FROM</span> books <span class="kw1">WHERE</span> author = <span class="nu0">3</span>;<br />
<span class="kw1">UPDATE</span> books <span class="kw1">SET</span> name = <span class="st0">&#8216;new&#8217;</span> <span class="kw1">AND</span> author = <span class="nu0">3</span>;</p>
<p><span class="co2"># Hinting works like this</span><br />
<span class="coMULTI">/* partitionKey(books) = &#8216;books_odd&#8217; */</span> <span class="kw1">SELECT</span> * <span class="kw1">FROM</span> books;<br />
<span class="coMULTI">/* skipPartition() */</span> <span class="kw1">SELECT</span> <span class="st0">&#8216;do not analyze me!&#8217;</span>;</div>
</div>
<p>&#8230; and a lot more like queries including comments or joins or invalid queries (i.e. no <b>partition key</b> provided).</p>
<p>The rewriter is able to rewrite all of the queries above.</p>
<p>Actually, I think a good deal of the queries of the application I intended to test with would correctly be analyzed and rewritten right now. But there is still a lot of work to do on this side to handle more complex queries including subselects, functions etc. </p>
<p><b>What&#8217;s next?</b><br />
As said above the next goal is to run the test suites of our applications utilizing the partitioner. Once this is point is reached I will post the prototype here.</p>
<p>Up to now all of this looks like it could really work for us so I hope there is going to be more than a prototype but we will have to see.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/03/29/progress-on-mysql-proxy-partitioning/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
