<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>pero on anything &#187; mysql</title>
	<link>http://pero.blogs.aprilmayjune.org</link>
	<description></description>
	<pubDate>Tue, 02 Feb 2010 19:12:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>MySQL Connector/J randomly hanging at com.mysql.jdbc.util.ReadAheadInputStream.fill</title>
		<link>http://pero.blogs.aprilmayjune.org/2010/02/02/mysql-connectorj-randomly-hanging-at-commysqljdbcutilreadaheadinputstreamfill/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2010/02/02/mysql-connectorj-randomly-hanging-at-commysqljdbcutilreadaheadinputstreamfill/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 19:12:52 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[java]]></category>

		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2010/02/02/mysql-connectorj-randomly-hanging-at-commysqljdbcutilreadaheadinputstreamfill/</guid>
		<description><![CDATA[In the past months we struggled with large SELECT queries just get stuck at:

java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
   - locked com.mysql.jdbc.util.ReadAheadInputStream@cb9a81c
com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2494)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2949)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2938)
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3481)
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)
   - locked java.lang.Object@70cbccca
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2571)
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:782)
   - locked java.lang.Object@70cbccca
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:625)
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)

Whenever this happened we just restarted the Tomcat server and everything was fine again for some days or weeks. But today it struck us very hard [...]]]></description>
			<content:encoded><![CDATA[<p>In the past months we struggled with large <code>SELECT</code> queries just get stuck at:</p>
<p><code><br />
java.net.SocketInputStream.socketRead0(Native Method)<br />
java.net.SocketInputStream.read(SocketInputStream.java:129)<br />
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)<br />
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)<br />
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)<br />
   - locked com.mysql.jdbc.util.ReadAheadInputStream@cb9a81c<br />
com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2494)<br />
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2949)<br />
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2938)<br />
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3481)<br />
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)<br />
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)<br />
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)<br />
   - locked java.lang.Object@70cbccca<br />
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2571)<br />
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:782)<br />
   - locked java.lang.Object@70cbccca<br />
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:625)<br />
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)<br />
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)<br />
</code></p>
<p>Whenever this happened we just restarted the Tomcat server and everything was fine again for some days or weeks. But today it struck us very hard so we finally took the time to hunt this down. It seems to be related to this <a href="http://bugs.mysql.com/bug.php?id=31353" onclick="javascript:pageTracker._trackPageview('/http://bugs.mysql.com/bug.php?id=31353');">bug report</a>. Some comments suggested to use <code>SQL_NO_CACHE</code> with your queries. </p>
<p>A lot of people (including me) suggest to disable the <a href="http://dev.mysql.com/doc/refman/5.0/en/query-cache.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.0/en/query-cache.html');">MySQL query cache</a> since it may cause <a href="http://www.mysqlperformanceblog.com/2009/03/19/mysql-random-freezes-could-be-the-query-cache/" onclick="javascript:pageTracker._trackPageview('/http://www.mysqlperformanceblog.com/2009/03/19/mysql-random-freezes-could-be-the-query-cache/');">severe problems</a>. To disable the query cache at server startup, set the query_cache_size system variable to 0.</p>
<p>This is what we usually do, but one of our servers had query cache turned on. Disabling it solved this problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2010/02/02/mysql-connectorj-randomly-hanging-at-commysqljdbcutilreadaheadinputstreamfill/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Simulating indexes in Hadoop</title>
		<link>http://pero.blogs.aprilmayjune.org/2009/06/06/simulating-indexes-in-hadoop/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2009/06/06/simulating-indexes-in-hadoop/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 19:07:09 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hadoop]]></category>

		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2009/06/06/simulating-indexes-in-hadoop/</guid>
		<description><![CDATA[You should not try to use Hadoop as a &#8220;drop-in&#8221; replacement of your current (R)DBMS. That said it is still possible to utilize the power of cluster computing while circumventing its weaknesses when it comes to ad-hoc or real-time queries. We use Hadoop as an on-line system tightly integrated with our application and use it [...]]]></description>
			<content:encoded><![CDATA[<p>You should not try to use Hadoop as a &#8220;drop-in&#8221; replacement of your current (R)DBMS. That said it is still possible to utilize the power of cluster computing while circumventing its weaknesses when it comes to ad-hoc or real-time queries. We use Hadoop as an on-line system tightly integrated with our application and use it for both, long-running analytical queries and ad-hoc style queries.</p>
<p>In the mindset of a &#8220;traditional&#8221; database engineer one of the biggest concerns about Hadoop, or MapReduce in conjunction with a distributed file system in general, is the lack of indexes. Set aside that the debate <a href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html" onclick="javascript:pageTracker._trackPageview('/http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html');">&#8220;(R)DBMS vs MapReduce&#8221;</a> is most of the time superfluous and sometimes almost leads to religious debates, the absence of a thing like an index is one the biggest hurdles you face when migrating data from a traditional DBMS.<br />
Even though you will love the ability to view your data in any way you want without caring about its structure, at some point you feel that it is not right to always scan you 45TB of log files. (Even though it is soooo easy&#8230;).</p>
<h2>Brute force is easy. Brute force is bad.</h2>
<p>When we began migrating all those TBs of log-style data from our huge MySQL installations to Hadoop we did a lot of testing. We tested everything from Hadoop and MapReduce settings to different MapReduce abstractions like <a href="http://hadoop.apache.org/pig/" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org/pig/');">PIG</a>, <a href="http://www.cascading.org" onclick="javascript:pageTracker._trackPageview('/http://www.cascading.org');">Cascading</a>, <a href="http://hadoop.apache.org/hive/" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org/hive/');">Hive</a> and others. There was this huge mass of data grinning at us and waited to be analysed in multiple ways, from &#8220;online&#8221; real-time access to &#8220;offline&#8221; decision making analysis. Due to our multiple views on the same data we came to this conclusion quite quickly: &#8220;Brute force is easy. Brute force is bad.&#8221; Yes, we can optimize our Hadoop installations and we can choose the really best query mechanism (actually we ended up writing our own), but it will not make things <i>noticeably</i> faster if you continue scanning all of our data all of the time.</p>
<h2>Partitions are (sometimes) the better indexes</h2>
<p>So, why are you using indexes (in the context of data retrieval)? I know why we did and do. It is all about primary key lookup and data clustering. Say you have the following table (MySQL):</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">CREATE</span> <span class="kw1">TABLE</span> <span class="kw1">ORDER</span> <span class="br0">&#40;</span><br />
&nbsp; &nbsp; id INT <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; product INT <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; customer INT <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; amount FLOAT <span class="kw1">NOT</span> <span class="kw1">NULL</span>,<br />
&nbsp; &nbsp; orderDate DATE <span class="kw1">NOT</span> <span class="kw1">NULL</span>,</p>
<p>&nbsp; &nbsp; <span class="kw1">PRIMARY</span> <span class="kw1">KEY</span><span class="br0">&#40;</span>id<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; <span class="kw1">INDEX</span> idx_product_customer<span class="br0">&#40;</span>product, customer<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; <span class="kw1">INDEX</span> idx_customer<span class="br0">&#40;</span>customer<span class="br0">&#41;</span><br />
<span class="br0">&#41;</span></div>
</div>
<p>Just a simple order log with an unique identifier (id) and a single associated product and customer. Since we want to view our data from different perspectives we added two additional indexes on product and customer. (In this example we need two indexes because MySQL can only use the <a href="http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html');">leftmost prefix of an index</a>.)<br />
Dumping the whole table as a single CSV-file into your Hadoop cluster would mean that you always have to use what (R)DBMS call a &#8220;full table scan&#8221;. It would be pretty much the same like removing all indexes from your MySQL-table. Try to search for all products a customer ordered without the index <code>idx_product_customer</code>. (In fact Hadoop would perform this full table scan an order of magnitude faster.) But it would be ridiculous to remove all indexes from your table. But that is actually what you did when you exported the whole table into a flat-file!<br />
What you should do, and what we did with great success, is to split up your flat-file CSV and arrange the data so that you can decide beforehand which part of the data needs to be accessed. So let&#8217;s split up the data and simulate all of the indexes (besides the primary key, more on that later on). A file-system-layout could look like this:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">orders/<br />
&nbsp; &nbsp; product_A/<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_1.csv<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_2.csv<br />
&nbsp; &nbsp; product_B/<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_1.csv<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_3.csv</div>
</div>
<p>So when searching all orders <code>customer_1</code> placed, we just use this file-pattern <code>orders/*/customer_1.csv</code>. Remember: HDFS and MapReduce&#8217;s inputs (like <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html');"><code>FileInputFormat</code></a>) support <a href="http://en.wikipedia.org/wiki/Glob_%28programming%29" onclick="javascript:pageTracker._trackPageview('/http://en.wikipedia.org/wiki/Glob_%28programming%29');">globbing</a>.</p>
<p>Now we actually simulated indexes by partitioning the data! </p>
<p>From here on you can go into more detail depending on your data structure. As an example you could add the date- and id-range to the file name like this:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">orders/product_A/customer_1<span class="nu0">.2009</span><span class="nu0">-06</span><span class="nu0">-04.2009</span><span class="nu0">-06</span><span class="nu0">-05.1000</span><span class="nu0">.2000</span>.csv<br />
orders/product_A/customer_1<span class="nu0">.2009</span><span class="nu0">-06</span><span class="nu0">-06.2009</span><span class="nu0">-06</span><span class="nu0">-07.5000</span><span class="nu0">.7000</span>.csv</div>
</div>
<p>This comes handy if you keep adding data to your cluster.<br />
To make thinks even easier you could write your own <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html');"><code>InputFormat</code></a> that encapsulates the building of the paths that match your query.</p>
<h3>The small file problem</h3>
<p>Since Hadoop has been designed to work on quite huge blocks of data it is not efficient when using <a href="http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/" onclick="javascript:pageTracker._trackPageview('/http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/');">a lot of small files</a>. To prevent the creation of millions of very small files take a closer look at your data. Say your average customer places 50 orders. It would be a waste of resources to store multiple files for a single customer, each filling only a few KBs. A possible solution: group customers together. </p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">orders/<br />
&nbsp; &nbsp; product_A/<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_1_to_1000.csv<br />
&nbsp; &nbsp; &nbsp; &nbsp; customer_1001_to_2000.csv</div>
</div>
<p>You have to find the right balance between file-size and access pattern.</p>
<h3>Some final words</h3>
<p>To make it clear: Even though we have found a way to partition our data we have not gained the same flexibility as we have in any descent (R)DBMS (with enough of disk space, processing power and - most of all - RAM!). Querying for all orders made by a single customer may still take 0.01s in a (R)DBMS vs. 10s (or more) in Hadoop.</p>
<p>Never try to simply replace your (R)DBMS with Hadoop! Eventually you will end up writing a blog post saying that MapReduce and Hadoop are hopelessly worse than your favourite (R)DBMS. Hadoop is not a database!</p>
<h2>Real-time lookups</h2>
<p>You can still accomplish real-time lookup performance using Hadoop. One thing you could do is to take a look at <a href="http://wiki.apache.org/hadoop/Hbase" onclick="javascript:pageTracker._trackPageview('/http://wiki.apache.org/hadoop/Hbase');">HBase</a>, a Google BigTable implementation.<br />
Some times it is enough to use <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/MapFile.html" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/MapFile.html');"><code>MapFile</code>s</a> which are simply a huge disk-based Hashtable.<br />
In our application we implemented primary key and secondary keys directly on CSV files using MapFiles and distribute the lookup and local in-memory-caches over several machines. To speed things up even more we use a <a href="http://www.danga.com/memcached/" onclick="javascript:pageTracker._trackPageview('/http://www.danga.com/memcached/');">memcached</a> cluster. (Eventually we will release all of this along with our MapReduce-abstraction as open-source once we feel it is mature and stable enough.)<br />
One way or the other: Data redundancy will most likely become your best friend in these situations.</p>
<p>Regardless the techniques you are actually using you still have to think about your data in another way.  You always have to when moving from a traditional (R)DBMS to any other kind of data storage and retrieval system!</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2009/06/06/simulating-indexes-in-hadoop/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Current project status - HSCALE</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/12/22/current-project-status-hscale/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/12/22/current-project-status-hscale/#comments</comments>
		<pubDate>Mon, 22 Dec 2008 17:31:06 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/12/22/current-project-status-hscale/</guid>
		<description><![CDATA[A lot of people keep asking me about the status of HSCALE so I thought it is best to just write about it.
Since I am waiting for the next GPL release of MySQL Proxy I concentrate on other things, both project-related and not project-related. 
Continuous integration and test strategy enhancements
First of all there is a [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of people keep asking me about the status of HSCALE so I thought it is best to just write about it.<br />
Since I am waiting for the next GPL release of <a href="http://forge.mysql.com/wiki/MySQL_Proxy" onclick="javascript:pageTracker._trackPageview('/http://forge.mysql.com/wiki/MySQL_Proxy');">MySQL Proxy</a> I concentrate on other things, both project-related and not project-related. </p>
<h2>Continuous integration and test strategy enhancements</h2>
<p>First of all there is a CI server up for almost 3 months now. Check out <a href="http://teamcity.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://teamcity.hscale.org');">teamcity.hscale.org</a> (Login as guest user). I also introduced a lint-process to discover bugs in my LUA code which arise mostly by mistyping variable names. By the way: You should definitely have a lint-like process in place if you are doing LUA programming or other languages of that type. It helps a lot.</p>
<h2>What&#8217;s in svn trunk?</h2>
<h3>Multiple backends</h3>
<p>The current status is this: Spreading across multiple backends is fully implemented and there are a lot of tests for that. Check out the latest code at <a href="http://svn.hscale.org/trunk" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk');">http://svn.hscale.org/trunk</a>. The tests running against multiple backends are &#8220;cheating&#8221; a little bit: Since the proxy still does not allow for ad-hoc allocation of backend connections I just create a lot of &#8220;dummy&#8221; connections to the proxy and use the approach of connection pooling the way the proxy does it. See an <a href="http://forge.mysql.com/tools/tool.php?id=151" onclick="javascript:pageTracker._trackPageview('/http://forge.mysql.com/tools/tool.php?id=151');">example</a>. Note: This will not work in production! </p>
<h3>Dictionary partition lookup and auto-partitioning</h3>
<p>Besides the <a href="http://svn.hscale.org/trunk/hscale/src/optivo/hscale/modulusPartitionLookup.lua" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk/hscale/src/optivo/hscale/modulusPartitionLookup.lua');">modulus partition lookup</a>, which is only used in tests, there is now the <a href="http://svn.hscale.org/trunk/hscale/src/optivo/hscale/dictionaryPartitionLookup.lua" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk/hscale/src/optivo/hscale/dictionaryPartitionLookup.lua');">dictionary partition lookup</a> which allows for explicit partition definition. Detailed documentation can be found in the <a href="http://hscale.org/display/HSCALE/DictionaryPartitionLookup" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/HSCALE/DictionaryPartitionLookup');">project wiki</a>. It works as described <a href="http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/" >here</a>.<br />
Another feature implemented is auto-partitioning. Depending on the partitioning function used it is possible to create new partitions on the fly. This way you can automatically spread the load across your MySQL servers. Another benefit is that you have a fine grained partition set-up right from the start which makes re-partitioning a lot easier afterwards.<br />
Please take a look at the code and especially the tests to see how it works.</p>
<h3>Parallel setup of HSCALE servers</h3>
<p>In order to eliminate the SPOF (single point of failure) and possible bottleneck that directing all traffic through a single proxy would imply, HSCALE is designed to run in parallel. This means you can set up multiple proxies running HSCALE in parallel. This way you can easily implement fail-over scenarios using <a href="http://www.linux-ha.org/Heartbeat" onclick="javascript:pageTracker._trackPageview('/http://www.linux-ha.org/Heartbeat');">heartbeat</a> or your favourite high availability solution and spread the load.<br />
Why should running HSCALE in parallel be a problem in the first place? The easy answer: Because partition information is cached within each instance. As soon as the partition information changes your HSCALE instances would run out of sync using different partition information which would be disastrous to your data integrity. To avoid this HSCALE (more precise the dictionary partition lookup) works in two different modes: &#8220;NORMAL&#8221; and &#8220;FORCE&#8221;. If the mode is set to &#8220;NORMAL&#8221; then partition information is cached internally and only refreshed in a configurable time interval (configuration parameter &#8220;reloadInterval&#8221;). Whenever a change is made to the partition set up the mode is changed to &#8220;FORCE&#8221; which forces all HSCALE instances to re-fetch the partition information prior executing any query. After a configurable amount of time the system switches back to &#8220;NORMAL&#8221;. This is only the big picture. The implementation itself is a bit more complicated.<br />
This approach is simple and robust because no other components are involved (like message queues) and it is guaranteed by design that no HSCALE instance is able to run with a wrong partition mapping. Changes made to the partition information involve a little overhead since all HSCALE instances will reload the partition mapping quite often. But taken into account that the partition information does not change that often (100 times a day would be huge!) it is an affordable price to pay for data integrity.</p>
<h2>What&#8217;s next</h2>
<p>Currently HSCALE is almost feature complete. The thing that&#8217;s missing - and making HSCALE production ready - is a MySQL Proxy version with different backend handling (Jan, please do not hate me!). Currently we (our company) do not have the resources digging into MySQL Proxy ourself and HSCALE is currently(!) not top-priority. So we will just wait and see.</p>
<p>Part of the problems we intended to solve with HSCALE are now moved to a <a href="http://hadoop.apache.org" onclick="javascript:pageTracker._trackPageview('/http://hadoop.apache.org');">Hadoop cluster</a> since we have huge masses of log-style, read-only data which has to be analysed in multiple dimensions. HSCALE will gain focus right after that or as soon as a &#8220;suitable&#8221; GPL version of MySQL Proxy comes out.</p>
<p>If you are in need for a production ready, proxy-based sharding solution, please take a look at <a href="http://spockproxy.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/http://spockproxy.sourceforge.net/');">Spock Proxy</a>. They use a different approach - they actually forked MySQL Proxy and implemented everything into it, thus no LUA is used - but the idea behind it is basically the same. They also offer some features HSCALE will not offer in the near future like handling of auto_increment columns across partitions. Some features are not there mostly because of the different design approach like arbitrary partitioning functions (they only offer range-based partitioning which is ok for many scenarios) or query hinting.</p>
<p>Speaking of Hadoop cluster - an idea that is ringing in my head for a while is to implement a MySQL Proxy LUA script that enables running (basic) queries against a cluster. It would be a little fun project. <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/12/22/current-project-status-hscale/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SHOW STATUS considered harmful</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/11/show-status-considered-harmful/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/11/show-status-considered-harmful/#comments</comments>
		<pubDate>Thu, 11 Sep 2008 15:16:17 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/11/show-status-considered-harmful/</guid>
		<description><![CDATA[First of all, I know this is a known problem, but it struck me so hard, I just had to write about it!
As Peter Zaitev points out calling SHOW STATUS might have a huge performance impact.
We recently replaced one of our servers with a DELL R900 with 96GB RAM. Having a disk-bound workload and > [...]]]></description>
			<content:encoded><![CDATA[<p><em>First of all, I know this is a known problem, but it struck me so hard, I just had to write about it!</em></p>
<p>As Peter Zaitev <a href="http://www.mysqlperformanceblog.com/2007/07/27/more-gotchas-with-mysql-50/" onclick="javascript:pageTracker._trackPageview('/http://www.mysqlperformanceblog.com/2007/07/27/more-gotchas-with-mysql-50/');">points out</a> calling <code>SHOW STATUS</code> might have a huge performance impact.</p>
<p>We recently replaced one of our servers with a <a href="http://www.mysqlperformanceblog.com/2008/08/04/128gb-or-ram-finally-got-cheap/" onclick="javascript:pageTracker._trackPageview('/http://www.mysqlperformanceblog.com/2008/08/04/128gb-or-ram-finally-got-cheap/');">DELL R900</a> with 96GB RAM. Having a disk-bound workload and > 1TB worth of data in InnoDB we expected a noticeable performance gain compared to the former server with 32GB. The new server even has better RAID and HDD.</p>
<p>But that was not the case. Things got even worse! A lot of queries &#8220;hang&#8221;, server load peaked at almost 7 and we saw a lot of cpu activity. Just before I started a deep analysis of what is going on inside I spotted a <code>SHOW GLOBAL STATUS</code> which ran every second. DOH!</p>
<p>Where did it came from? An administrator was running <a href="http://www.mysql.com/products/tools/administrator/" onclick="javascript:pageTracker._trackPageview('/http://www.mysql.com/products/tools/administrator/');">MySQL Administrator</a> and used the Health chart to monitor some variables. So it periodically sent <code>SHOW GLOBAL STATUS</code> to the server. That resulted in a lot of queries waiting for the buffer pool (look at Peter&#8217;s post and the comments to understand why). And things get worse with bigger InnoDB buffer pool (this particular mysql instance uses 70GB!).</p>
<p>MySQL Administrator (the <em>tool</em>, not the person <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ) shut down - everything just looks great now!</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/11/show-status-considered-harmful/feed/</wfw:commentRss>
		</item>
		<item>
		<title>LuaSQL fetches results about 15% faster than MySQL Proxy?</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/#comments</comments>
		<pubDate>Sun, 07 Sep 2008 14:57:51 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[lua]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/</guid>
		<description><![CDATA[While evaluating LuaSQL as backend connection replacement I came across this. I did a quick performance test using mysqlslap and it showed that just reading and copying the result can be significantly faster with LuaSQL.
Benchmark details
What I did was just sending the query to the backend and building up a new result-set in LUA. 
This [...]]]></description>
			<content:encoded><![CDATA[<p>While evaluating <a href="http://www.keplerproject.org/luasql/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/luasql/');">LuaSQL</a> as <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >backend connection replacement</a> I came across this. I did a quick performance test using <a href="http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html" onclick="javascript:pageTracker._trackPageview('/http://dev.mysql.com/doc/refman/5.1/en/mysqlslap.html');">mysqlslap</a> and it showed that just reading and copying the result <em>can be</em> significantly faster with LuaSQL.</p>
<h2>Benchmark details</h2>
<p>What I did was just sending the query to the backend and building up a new result-set in LUA. </p>
<p>This is the code for LuaSQL:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;luasql.mysql&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> _sqlEnv = <span class="kw1">assert</span><span class="br0">&#40;</span>luasql.mysql<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> _con = <span class="kw1">nil</span></p>
<p><span class="kw1">function</span> read_auth<span class="br0">&#40;</span>auth<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> host, port = <span class="kw1">string</span>.match<span class="br0">&#40;</span>proxy.backends<span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span>.address, <span class="st0">&quot;(.*):(.*)&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="co1">&#8211; We explicitly connect to db &quot;test&quot; since mysqlslap drops the database</span><br />
&nbsp; &nbsp; <span class="co1">&#8211; and LuaSQL needs the db to exists beforehand. Anyway this is just a </span><br />
&nbsp; &nbsp; <span class="co1">&#8211; quick tests, so don&#8217;t bother.</span><br />
&nbsp; &nbsp; _con = <span class="kw1">assert</span><span class="br0">&#40;</span>_sqlEnv:connect<span class="br0">&#40;</span><span class="st0">&quot;test&quot;</span>, auth.username, auth.password, host, port<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> disconnect_client<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span><span class="br0">&#40;</span>_con:close<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> read_query<span class="br0">&#40;</span>packet<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>packet:byte<span class="br0">&#40;</span><span class="br0">&#41;</span> == proxy.COM_QUERY<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> query = packet:sub<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> result = <span class="kw1">nil</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> cur = <span class="kw1">assert</span><span class="br0">&#40;</span>_con:<span class="kw1">execute</span><span class="br0">&#40;</span>query<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="kw1">type</span><span class="br0">&#40;</span>cur<span class="br0">&#41;</span> == <span class="st0">&quot;number&quot;</span><span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response.<span class="kw1">type</span> = proxy.MYSQLD_PACKET_RAW;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response.packets = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;<span class="es0">\0</span>00&quot;</span> .. <span class="co1">&#8211; fields</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">string.char</span><span class="br0">&#40;</span>cur<span class="br0">&#41;</span> ..<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;<span class="es0">\0</span>00&quot;</span> <span class="co1">&#8211; insert_id</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result = proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">&#8211; Build up the result set.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> fields = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> colNames = cur:getcolnames<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> colTypes = cur:getcoltypes<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> a = <span class="nu0">1</span>, #colNames, <span class="nu0">1</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>fields, <span class="br0">&#123;</span>name = colNames<span class="br0">&#91;</span>a<span class="br0">&#93;</span>, <span class="kw1">type</span>=proxy.MYSQL_TYPE_STRING<span class="br0">&#125;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> curRow = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> rows = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>cur:fetch<span class="br0">&#40;</span>curRow<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>rows, curRow<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; proxy.response = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = proxy.MYSQLD_PACKET_OK,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; resultset = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fields = fields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rows = rows<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result = proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>result ~= <span class="kw1">nil</span><span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> result<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</div>
<p>And this is the code for using MySQL Proxy only:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">function</span> read_query<span class="br0">&#40;</span>packet<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>packet:byte<span class="br0">&#40;</span><span class="br0">&#41;</span> == proxy.COM_QUERY<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">local</span> query = packet:sub<span class="br0">&#40;</span><span class="nu0">2</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">&#8211; We append the query so read_query_result gets triggered.</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; proxy.queries:append<span class="br0">&#40;</span><span class="nu0">1</span>, <span class="kw1">string.char</span><span class="br0">&#40;</span>proxy.COM_QUERY<span class="br0">&#41;</span> .. query<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> proxy.PROXY_SEND_QUERY <br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">function</span> _read_query_result<span class="br0">&#40;</span>inj<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> resultSet = <span class="kw1">assert</span><span class="br0">&#40;</span>inj.resultset<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> newFields = <span class="kw1">nil</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> fieldCount = <span class="nu0">1</span><br />
&nbsp; &nbsp; <span class="kw1">local</span> fields = resultSet.fields<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>fields<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; newFields = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span> <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newFields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span>.<span class="kw1">type</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; name = fields<span class="br0">&#91;</span>fieldCount<span class="br0">&#93;</span>.name<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fieldCount = fieldCount + <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; <span class="kw1">local</span> newRows = <span class="kw1">nil</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>resultSet.rows<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; newRows = <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> row <span class="kw1">in</span> resultSet.rows <span class="kw1">do</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">table.insert</span><span class="br0">&#40;</span>newRows, row<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>newFields<span class="br0">&#41;</span> <span class="kw1">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; proxy.response = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">type</span> = proxy.MYSQLD_PACKET_OK,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; resultset = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fields = newFields,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rows = newRows<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> proxy.PROXY_SEND_RESULT<br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</div>
<p>As you can see we do nothing but copy the result-set in LUA. This mimics the result-set aggregation HSCALE does if a full partition scan is necessary.</p>
<h2>Results</h2>
<p>Using LuaSQL:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">65.731</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p>Using MySQL Proxy only:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">74.607</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p>For comparison: Using empty <code>read_query</code> and <code>read_query_result</code> functions:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">&gt; $ mysqlslap -h <span class="nu0">127.0</span><span class="nu0">.0</span><span class="nu0">.1</span> -P <span class="nu0">4040</span> &#8211;auto-generate-sql &#8211;number-of-<span class="re2">queries=</span><span class="nu0">10000</span><br />
Benchmark<br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Minimum number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Maximum number of seconds to run all queries: <span class="nu0">39.657</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; Number of clients running queries: <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Average number of queries per client: <span class="nu0">10000</span></div>
</div>
<p><i>Versions used: MySQL Proxy 0.7.0 (svn-rev 511), LuaSQL 2.1.1, MySQL server 5.0.51a, mysqlslap 5.1.26rc</i></p>
<p>Of course I repeated the tests several times to verify the results.</p>
<p>Without digging too deep into the source of both MySQL Proxy and LuaSQL the biggest difference is that LuaSQL pushes the result-set row-by-row onto the LUA-stack whereas MySQL Proxy puts the whole result.</p>
<p><b>Update:</b> As Jan points out below this is not true. MySQL Proxy puts the result row by row onto th LUA stack, too.</p>
<p><b>Update #2:</b> The tests above ran on my Notebook (MacBookPro 2.4GHz, 4GB RAM running Ubuntu 8.04 64 bit). They are reproducible. Running the same tests on an 8-core-server putting the MySQL database on another server results in the MySQL Proxy version running <em>slightly</em> faster (about 2-5%) than the LuaSQL version.</p>
<h2>Conclusions</h2>
<p>Even though this tiny benchmark showed that the speed of LuaSQL seems to be feasible, there are still drawbacks. First of all: Depending on your workload only a fraction of your queries need result-set altering. Namely it&#8217;s only full partition scans that need this. Most of the time you just need to change the table name or the backend. And then LuaSQL is 100% slower than MySQL Proxy alone. </p>
<p>Another downside of LuaSQL is that it does not return the mysql field types but only the LUA types. This makes it impossible to build up a correct result-set that can be sent back to the client.</p>
<p>So still we need suitable (for HSCALE) backend connection handling in MySQL Proxy if we want higher performance.</p>
<p>Built-in result-set merging would be a big win, too. Then we could even have streaming combined result-sets taking the memory pressure from the proxy (since every result-set has to be fully loaded into memory).</p>
<p>That said I think about using LuaSQL for configuration handling since it is a lot easier than doing it via <code>proxy.queries:append -> read_query_result -> proxy.queries:append -> ...</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/07/luasql-fetches-results-about-15-faster-than-mysql-proxy/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Version 0.3 of HSCALE is almost in the door</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 03:54:19 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/</guid>
		<description><![CDATA[After working on build and test improvements (for example incorporating lualint and LuaCov) as well as other lua &#8220;side-projects&#8221; (i.e. Log4LUA) we are running towards HSCALE 0.3.
The focus of the forthcoming version 0.3 of HSCALE is Dictionary Based Partition Lookup. Using this partition lookup module lets you take full control over how your partitions are [...]]]></description>
			<content:encoded><![CDATA[<p>After working on build and test improvements (for example incorporating <a href="http://lua-users.org/wiki/LuaLint" onclick="javascript:pageTracker._trackPageview('/http://lua-users.org/wiki/LuaLint');">lualint</a> and <a href="http://luacov.luaforge.net/" onclick="javascript:pageTracker._trackPageview('/http://luacov.luaforge.net/');">LuaCov</a>) as well as other lua &#8220;side-projects&#8221; (i.e. <a href="http://hscale.org/display/LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA');">Log4LUA</a>) we are running towards HSCALE 0.3.</p>
<p>The focus of the forthcoming version 0.3 of HSCALE is <b>Dictionary Based Partition Lookup</b>. Using this partition lookup module lets you take full control over how your partitions are created and where they are actually located.</p>
<p><b>Update:</b> Dictionary Based Partition Lookup is fully implemented. See this <a href="http://pero.blogs.aprilmayjune.org/2008/12/22/current-project-status-hscale/" >blog post</a> and the <a href="http://hscale.org/display/HSCALE/DictionaryPartitionLookup" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/HSCALE/DictionaryPartitionLookup');">wiki page</a> about it.</p>
<p><em>Please note: Due to the <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >problems with backend connection handling</a> version 0.3 will still focus on single server backends.</em> Even though support for multiple backends is already implemented in HSCALE. Please look at the proxyUnit tests for a glimpse on how multi-server backends will be handled.</p>
<p>Please check out the current development snapshot at <a href="http://svn.hscale.org/trunk" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk');">svn.hscale.org</a> to see what is already there. Currently the partition lookup is fully functional but the administrative commands and some further hashing functions are missing (see below).</p>
<h2>How does dictionary partition lookup work?</h2>
<p>As the name implies partitions are looked up in a dictionary. The dictionary itself is stored in the main database and cached internally. So now you can freely move partitions around and create new ones.</p>
<p>What is done internally is this:</p>
<ol>
<li>Apply a hashing function (read further) to the value.</li>
<li>Lookup the partition based on the hashed value.</li>
<li>If no partition has been found use the <em>default partition</em> created for every partitioned table.</li>
<li>Return the partition (and assigned backend)</li>
</ol>
<h3>Hashing functions</h3>
<p>To reduce the number of partitions to be created and the overall administration overhead, a hashing function may be applied to the partition value before the partition is looked up.<br />
Currently there are 3 hashing functions available:</p>
<ul>
<li><code>MOD(X)</code>: A modulus function grouping <code>X</code> partition values together. Works only for numbers of course. If you have *lots* of different partition values with a smaller number of rows each, then it might be better to group them together instead of creating *lots* of partitions.<br />
Example: Using <code>MOD(3)</code> the values <code>1</code>, <code>4</code> and <code>7</code> will end up in the same partition.
</li>
<li><code>PREFIX(length)</code> Partition values are grouped together by the first <code>length</code> characters. Works on everything (everything is treated as string).<br />
Example: Using <code>PREFIX(3)</code> the values &#8220;<code>foo</code>&#8220;, &#8220;<code>foobar</code>&#8221; and &#8220;<code>footaliciuos</code>&#8221; will end up in the same partition.
</li>
<li><code>NONE()</code>: This function does &#8230; nothing! Use it if you really want a 1:1 relationship between partition values and partitions.</li>
</ul>
<p>Further hashing functions are planned (and might make it to version 0.3):</p>
<ul>
<li><code>DIV(X)</code>: As opposite to <code>MOD(X)</code> this function divides the partition value by <code>X</code>. So this like a fixed range function. While <code>MOD(X)</code> creates at most <code>X</code> partitions <code>DIV(X)</code> creates infinite number of partitions.
</li>
<li><code>DATE(pattern)</code>: Enables date-range based partitions.<br />
Example: Using <code>DATE("%Y-%m")</code> will group by (year-)month.
</li>
</ul>
<h3>Administrative commands</h3>
<p>Because handling of partitions is a delicate thing, creation and maintenance of partitions should not be left to some obscure SQL-statements. Therefor the dictionary partition lookup will provide administrative commands that try to avoid mis-configuration. It will be checked whether partitions overlap etc.</p>
<p><em>Because this is still work in progress the commands might change.</em></p>
<h4>Table setup</h4>
<p>First of all your table has to be set up:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">HSCALE SETUP_TABLE<span class="br0">&#40;</span><span class="st0">&#8216;[table]&#8217;</span>, <span class="st0">&#8216;[column]&#8217;</span>, <span class="st0">&#8216;[default table]&#8217;</span>, <span class="br0">&#91;</span>backend<span class="br0">&#93;</span>, <span class="st0">&#8216;[hashing function]&#8217;</span><span class="br0">&#41;</span></p>
<p><span class="co2"># Example</span><br />
HSCALE SETUP_TABLE<span class="br0">&#40;</span><span class="st0">&#8216;users, &#8216;</span>nickname<span class="st0">&#8216;, &#8216;</span>users_default<span class="st0">&#8216;, 1, &#8216;</span>PREFIX<span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="st0">&#8216;)</span></div>
</div>
<p>The table has to exist before and will be renamed to the default table name (<code>'users_default'</code> in our example). <code>SETUP_TABLE</code> can only be called once per table.</p>
<h4>Add partitions</h4>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">HSCALE ADD_PARTITION<span class="br0">&#40;</span><span class="st0">&#8216;[table]&#8217;</span>, <span class="st0">&#8216;[partition name]&#8217;</span>, <span class="st0">&#8216;[partition value]&#8217;</span>, <span class="br0">&#91;</span>backend<span class="br0">&#93;</span><span class="br0">&#41;</span></p>
<p><span class="co2"># Example</span><br />
HSCALE ADD_PARTITION<span class="br0">&#40;</span><span class="st0">&#8216;users&#8217;</span>, <span class="st0">&#8216;nick1&#8242;</span>, <span class="st0">&#8216;mar&#8217;</span>, <span class="nu0">1</span><span class="br0">&#41;</span></div>
</div>
<p>The example above creates a partition with the name <code>'nick1'</code> for table <code>'users'</code> and partition value <code>'mar'</code> (users with nickname <code>'marvel'</code>, <code>'martin'</code> etc.). The partition name will directly reflect to the table name of the partition. So the table for this partition will be <code>users_nick1</code>. Usually you would use the partition value (<code>'mar'</code>) as partition name but you don&#8217;t have to. You are able store multiple partitions in the same table. This sounds strange but it makes sense to define a finer partitioning scheme upfront and actually use a wider scheme until you really need to split up data. This makes it a lot easier to split things up afterwards.</p>
<h4>Moving partition data</h4>
<p>In version 0.3 <em>partition data will not be moved</em> to a newly created partition. You will have to do it by hand. This will be implemented as soon as multiple backends are fully supported because that implies a different approach since data has to be moved between different servers then. An administrative command to move partitions will also be available then (<code>HSCALE MOVE_PARTITION(...)</code>).</p>
<h3>Multiple instances of HSCALE working on the same data</h3>
<p>HSCALE is designed to support multiple instances of it running in parallel mostly to avoid to be the bottleneck and single point of failure. Every instance of HSCALE periodically refreshes the internal partition configuration to reflect changes made by another instance. In the case of creating and moving partitions, partitions will be locked inside HSCALE so all clients will wait until the operation finishes. This guarantees data integrity.</p>
<p>Version 0.3 will be released within the next few weeks but definitely after MySQL Proxy 0.7 has been released so we can thoroughly test it against the newest version.</p>
<p>Please feel free to discuss certain features and design decisions shown above. Any feedback is welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/06/version-03-of-hscale-is-almost-in-the-door/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Log4LUA version 0.2 and project page</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/06/log4lua-version-02-and-project-page/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/06/log4lua-version-02-and-project-page/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 02:43:50 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[lua]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/06/log4lua-version-02-and-project-page/</guid>
		<description><![CDATA[Everything is available at the project page
Please report issues and feature request here.
Version 0.2 is a bug fix release but with 2 significant changes in syntax:

Changed all constructor methods from create(...) to new. Seems to be more common in the LUA world.

The logger class is now returned by the module. So it is

local logger = [...]]]></description>
			<content:encoded><![CDATA[<p>Everything is available at the <a href="http://hscale.org/display/LUA/Log4LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA/Log4LUA');"><b>project page</b></a></p>
<p>Please report issues and feature request <a href="http://jira.hscale.org/browse/LUALOG" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/LUALOG');">here</a>.</p>
<p>Version 0.2 is a bug fix release but with 2 significant changes in syntax:</p>
<ol>
<li>Changed all constructor methods from <code>create(...)</code> to <code>new</code>. Seems to be more common in the LUA world.
</li>
<li>The logger class is now returned by the module. So it is
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> LOG = logger.new<span class="br0">&#40;</span>&#8230;<span class="br0">&#41;</span></div>
</div>
<p>instead of</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> LOG = logger.Logger.create<span class="br0">&#40;</span>&#8230;<span class="br0">&#41;</span></div>
</div>
</li>
</ol>
<p>Some potential bugs have been spotted using (a slightly adopted version of) <a href="http://lua-users.org/wiki/LuaLint" onclick="javascript:pageTracker._trackPageview('/http://lua-users.org/wiki/LuaLint');">lualint</a> and <a href="http://luacov.luaforge.net/" onclick="javascript:pageTracker._trackPageview('/http://luacov.luaforge.net/');">LuaCov</a>.</p>
<p>With release of version 0.2 the Log4LUA API is declared as <em>stable</em>. Future versions might change the syntax but are guaranteed to be backwards compatible.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/06/log4lua-version-02-and-project-page/feed/</wfw:commentRss>
		</item>
		<item>
		<title>More fun with LUA - Introducing Log4LUA</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/#comments</comments>
		<pubDate>Tue, 02 Sep 2008 17:59:20 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[lua]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/</guid>
		<description><![CDATA[UPDATE Log4LUA 0.2 released. Go to the project page.
After the dust about backend connection handling in MySQL Proxy had settled, I begun working on other parts of HSCALE again. (I&#8217;ll return to the backend handling later after talking to Jan Kneschke about their plans.)
One of the issues that bothered me the most was logging. Until [...]]]></description>
			<content:encoded><![CDATA[<p><b>UPDATE</b> Log4LUA 0.2 released. Go to the <a href="http://hscale.org/display/LUA/Log4LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA/Log4LUA');">project page</a>.</p>
<p>After the dust about <a href="http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/" >backend connection handling in MySQL Proxy</a> had settled, I begun working on other parts of HSCALE again. <i>(I&#8217;ll return to the backend handling later after talking to Jan Kneschke about their plans.)</i></p>
<p>One of the <a href="http://jira.hscale.org/browse/HSCALE" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE');">issues</a> that bothered me the most was logging. Until now I just had used a simple <code>debug</code> function that could be enabled or disabled using an environment variable. The pretty straightforward approach a lot of people use. But there is more to logging than to print debug messages. So I searched the web and of course I found <a href="http://www.keplerproject.org/lualogging/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/lualogging/');">LuaLogging</a>. It is more flexible, has different levels (<code>DEBUG, INFO, WARN, ERROR, FATAL</code>) and appenders. You can print your log messages to the screen, write them to a file or send an email. </p>
<p>After working with other logging frameworks in other languages (mostly Log4X like <a href="http://logging.apache.org/log4j/index.html" onclick="javascript:pageTracker._trackPageview('/http://logging.apache.org/log4j/index.html');">Log4J</a>) I got used to a number of features missing in LuaLogging:</p>
<ul>
<li><b>There are no means of external configuration.</b> The logger is configured inside the code. You cannot use different configurations for development and production out of the box. This is crucial in my opinion, since during development it is handy to enable all log levels and print everything on screen, while in production you want to log to a file and disable at least the <code>DEBUG</code> level.</li>
<li><b>You get no information where the log event came from.</b> Log messages are more meaningful if you know where the log message has been created, i.e. in which source file and line position. This makes it easier to track down bugs and errors.</li>
<li><b>There is no log category concept.</b> Having different log categories makes it easier to group log messages either by source or by meaning. As an example you could have one log category called &#8220;core&#8221; where all core messages go to and another &#8220;connection&#8221; where all connection related log messages go to.</li>
<li><b>You cannot use multiple appenders.</b> In production you might want to log all messages to a file and get an email for all errors.</li>
</ul>
<p>Since adding all this functionality to LuaLogging would in fact mean to re-write it, I wrote yet another logging facility and called it: <b>Log4LUA</b>. I choose the name because this package behaves almost like Log4X.</p>
<p><b><a href="http://hscale.org/display/LUA/Log4LUA" onclick="javascript:pageTracker._trackPageview('/http://hscale.org/display/LUA/Log4LUA');">Download Log4LUA</a></b></p>
<p>A detailed documentation on how to use it can be found <a href="http://static.hscale.org/log4lua/api/index.html" onclick="javascript:pageTracker._trackPageview('/http://static.hscale.org/log4lua/api/index.html');">here (luadoc)</a>.</p>
<h2>Features</h2>
<ul>
<li><b>External configuration.</b> Configure your logging system via a config file.</li>
<li><b>Logger categories</b> Configure different categories for different logging tasks. </li>
<li><b>Detailed information available: source file, line, function or the whole stack trace</b></li>
<li><b>Console, file and smtp (email) appenders</b></li>
<li><b>Multiple appenders per category</b> Log to a file and get the worst errors by email</li>
<li><b>Level threshold for smtp appender.</b> Don&#8217;t get every log message by mail only the ones above a defined level.</li>
<li><b>Log file rotations for file appender.</b> Based on a date pattern</li>
</ul>
<h2>Basic usage</h2>
<p>First write a configuration file (say <code>log4lua.conf.lua</code>):</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> console = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.console&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> file = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.file&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">local</span> smtp = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.appenders.smtp&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">local</span> config = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p><span class="co1">&#8211; Create a default smtp appender sending message of level WARN or higher.</span><br />
<span class="kw1">local</span> defaultSmtp = smtp.new<span class="br0">&#40;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; headers = <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; from = <span class="st0">&quot;myserver@mydomain.com&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; to = <span class="st0">&quot;admin@mydomain.com&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; subject = <span class="st0">&quot;%LEVEL: %MESSAGE&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; body = <span class="st0">&quot;Hi, an error occurred at %FILE:%LINE.<span class="es0">\n</span><span class="es0">\n</span>Level: %LEVEL<span class="es0">\n</span>Message: %MESSAGE<span class="es0">\n</span>&quot;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span>,<br />
&nbsp; &nbsp; <span class="st0">&quot;mail.mydomain.com&quot;</span>,<br />
&nbsp; &nbsp;logger.WARN<br />
<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; ROOT category must be configured.</span><br />
<span class="co1">&#8211; For the root category we use console and smtp appender</span><br />
config<span class="br0">&#91;</span><span class="st0">&quot;ROOT&quot;</span><span class="br0">&#93;</span> = logger.Logger.new<span class="br0">&#40;</span><span class="br0">&#123;</span>console.new<span class="br0">&#40;</span><span class="br0">&#41;</span>, defaultSmtp<span class="br0">&#125;</span>, <span class="st0">&quot;ROOT&quot;</span>, logger.INFO<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Category &quot;core&quot; uses rotating file appender and the default message pattern</span><br />
config<span class="br0">&#91;</span><span class="st0">&quot;core&quot;</span><span class="br0">&#93;</span> = logger.Logger.new<span class="br0">&#40;</span>file.new<span class="br0">&#40;</span><span class="st0">&quot;core-%s.log&quot;</span>, <span class="st0">&quot;%Y-%m-%d&quot;</span><span class="br0">&#41;</span>, <span class="st0">&quot;core&quot;</span>, logger.INFO<span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; The config table must be returned.</span><br />
<span class="kw1">return</span> config</div>
</div>
<p>In your lua source (say <code>mycode.lua</code>) use the following to get a logger:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;"><span class="kw1">local</span> logger = <span class="kw1">require</span><span class="br0">&#40;</span><span class="st0">&quot;optivo.common.log4lua.logger&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; I would prefer calling the main logger LOG.</span><br />
<span class="co1">&#8211; This will return a logger for category &quot;ROOT&quot; since there is no category &quot;mycode&quot;.</span><br />
<span class="kw1">local</span> LOG = logger.getLogger<span class="br0">&#40;</span><span class="st0">&quot;mycode&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; This will return a logger of the category &quot;core&quot; since the category name &quot;core.moduleA&quot; starts with &quot;core&quot;</span><br />
<span class="kw1">local</span> LOG_CORE = logger.getLogger<span class="br0">&#40;</span><span class="st0">&quot;core.moduleA&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Log something</span><br />
LOG:info<span class="br0">&#40;</span><span class="st0">&quot;My first log message&quot;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Tables are converted to string for you.</span><br />
LOG:warn<span class="br0">&#40;</span><span class="br0">&#123;</span>name = <span class="st0">&quot;Paul&quot;</span>, age = <span class="st0">&quot;22&quot;</span><span class="br0">&#125;</span><span class="br0">&#41;</span></p>
<p><span class="co1">&#8211; Log to different category.</span><br />
LOG_CORE:fatal<span class="br0">&#40;</span><span class="st0">&quot;System error&quot;</span><span class="br0">&#41;</span></div>
</div>
<p>Start your program with:</p>
<div class="codesnip-container" >
<div class="codesnip" style="font-family: monospace;">LOG4LUA_CONFIG_FILE = <span class="st0">&quot;log4lua.conf.lua&quot;</span> lua mycode.lua</div>
</div>
<p>For more examples look at the <a href="http://svn.hscale.org/trunk/hscale" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org/trunk/hscale');">HSCALE</a> sources. Log4LUA is used everywhere.</p>
<p>Please feel free to comment on this. Or send usage or bug reports. I will setup a project page later on.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/09/02/more-fun-with-lua-introducing-log4lua/feed/</wfw:commentRss>
		</item>
		<item>
		<title>MySQL Proxy vs. HSCALE</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 00:07:52 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/</guid>
		<description><![CDATA[Recently I added the first code to support multiple MySQL backends in HSCALE (see svn.hscale.org). As a &#8220;side note&#8221; I have to thank Giuseppe Maxia for MySQL Sandbox which made multi server testing a bliss!
While coding this I started to feel that writing HSCALE on top of MySQL Proxy is no more as easy and [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I added the first code to support multiple MySQL backends in HSCALE (see <a href="http://svn.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://svn.hscale.org');">svn.hscale.org</a>). <i>As a &#8220;side note&#8221; I have to thank Giuseppe Maxia for <a href="https://launchpad.net/mysql-sandbox" onclick="javascript:pageTracker._trackPageview('/https://launchpad.net/mysql-sandbox');">MySQL Sandbox</a> which made multi server testing a bliss!</i></p>
<p>While coding this I started to feel that writing HSCALE on top of MySQL Proxy is no more as easy and clean as it started out to be. And finally I reached the point where I have to say: </p>
<p><b>HSCALE (and maybe other advanced multi-server applications) and (current) MySQL Proxy don&#8217;t fit very well.</b></p>
<p>Before I go into details please let me make clear that this is mostly due to the specific nature of MySQL Proxy being a connection and protocol based proxy. MySQL Proxy is great and there are a lot of cool applications that fit perfectly with it.</p>
<p>Aside from some other minor glitches (like missing tokens in <a href="http://bugs.mysql.com/bug.php?id=36277" onclick="javascript:pageTracker._trackPageview('/http://bugs.mysql.com/bug.php?id=36277');">SQL tokenizer</a>) the biggest show stopper is:</p>
<h2>Handling of multiple backends</h2>
<p>The biggest problem I had to face is the handling of backend connections in MySQL Proxy. What I would need for HSCALE are dedicated connections to every backend for every proxy connection. So if a client connects to the proxy, a connection to each backend is opened and attached to this particular client connection. This way I easily could maintain the state of the connection by sending commands like <code>SET <i>variable</i></code> or <code>USE <i>database</i></code> to all backends.<br />
Dedicated connections are vital for XA or transactions in general, too. </p>
<p>The way the proxy handles backend connections right now is somewhat cumbersome (See an <a href="http://forge.mysql.com/tools/tool.php?id=151" onclick="javascript:pageTracker._trackPageview('/http://forge.mysql.com/tools/tool.php?id=151');">example</a>.). The way you have to maintain your own connection pool is hard to understand and uses too much &#8220;magic&#8221; in my opinion.  And the pool only grows with the number of connections made to the proxy. People are having problems with this approach (read the <a href="http://forums.mysql.com/list.php?146" onclick="javascript:pageTracker._trackPageview('/http://forums.mysql.com/list.php?146');">MySQL Proxy forum</a>, for instance <a href="http://forums.mysql.com/read.php?146,197085,197085#msg-197085" onclick="javascript:pageTracker._trackPageview('/http://forums.mysql.com/read.php?146,197085,197085#msg-197085');">this thread</a>). </p>
<p>The current experimental multi-server code in HSCALE uses this connection pooling technique since it is the only way to establish connections to other backend servers. </p>
<h3>So what can we do now to implement multi-server support in HSCALE?</h3>
<ol>
<li>
<b>Wait until MySQL Proxy supports dedicated backend connections</b> This would be the easiest and most elegant solution. But it could be that Jan Kneschke and the other developers decide that this is not what they intended to do with the proxy. And this would be totally ok from their point of view! Even if they decide that this is a cool feature it could take a long while until it is implemented. (Hint: I would gladly help out here <img src='http://pero.blogs.aprilmayjune.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ).
</li>
<li>
<b>Work around this using <a href="http://www.keplerproject.org/luasql/" onclick="javascript:pageTracker._trackPageview('/http://www.keplerproject.org/luasql/');">luasql</a></b> While possible this solution would not be preferable since we are using two technologies for the same thing.
</li>
<li>
<b>Switch to another proxy</b> There are quite <a href="http://krow.livejournal.com/595518.html" onclick="javascript:pageTracker._trackPageview('/http://krow.livejournal.com/595518.html');">a few</a>. This would be no fun and I did not take a look at them all to see if it is even possible. It would be complete rewrite though.
</li>
<li><b>Fork MySQL Proxy or implement a plugin</b> The new plugin technology could be used to implement the multi-backend connection stuff myself. As a last resort, I could fork the whole project (like <a href="http://spockproxy.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/http://spockproxy.sourceforge.net/');">Spock Proxy</a> did, see below). This would still mean a lot of work though and loss of the MySQL efforts put into the proxy.
</li>
<li>
<b>Use other sharding technologies and eventually abandon HSCALE</b> As an example <a href="http://spockproxy.sourceforge.net/" onclick="javascript:pageTracker._trackPageview('/http://spockproxy.sourceforge.net/');">Spock Proxy</a>, a fork of MySQL Proxy does a great deal of what we intend to do with HSCALE.
</li>
<li>
<b>Re-implement HSCALE as a JDBC driver</b> Since our applications are written in Java exclusively we could do that.
</ol>
<p><b>Of course I would love to continue HSCALE as a MySQL Proxy application!</b> The main reasons are the efforts MySQL is putting into MySQL Proxy, the extensibility we gain combining multiple LUA scripts (there are more things we do with the proxy apart from HSCALE) and last but not least the community echo HSCALE already received.</p>
<p>As a next step I will take a deeper look into the proxy code and evaluate the efforts needed to add dedicated connections. </p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/08/26/mysql-proxy-vs-hscale/feed/</wfw:commentRss>
		</item>
		<item>
		<title>HSCALE 0.2 released and new project web page</title>
		<link>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/</link>
		<comments>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/#comments</comments>
		<pubDate>Tue, 06 May 2008 20:23:06 +0000</pubDate>
		<dc:creator>pero</dc:creator>
		
		<category><![CDATA[hscale]]></category>

		<category><![CDATA[mysql]]></category>

		<category><![CDATA[mysql-proxy]]></category>

		<guid isPermaLink="false">http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/</guid>
		<description><![CDATA[The main focus of version 0.2 was to improve handling of almost all of SQL. So now you can issue DESC TABLE tbl_name or RENAME TABLE tbl_name TO another_tbl_name on a partitioned table and you get correct results for SHOW TABLES etc.
Other statements are rejected and we settled down for the feature set we want [...]]]></description>
			<content:encoded><![CDATA[<p>The main focus of version 0.2 was to improve handling of almost all of SQL. So now you can issue <code>DESC TABLE tbl_name</code> or <code>RENAME TABLE tbl_name TO another_tbl_name</code> on a partitioned table and you get correct results for <code>SHOW TABLES</code> etc.<br />
Other statements are rejected and we settled down for the feature set we want to provide for <em>full partition scans</em>.<br />
In addition to that there are some performance improvements.</p>
<p>See the <a href="http://jira.hscale.org/browse/HSCALE?report=com.atlassian.jira.plugin.system.project:changelog-panel" onclick="javascript:pageTracker._trackPageview('/http://jira.hscale.org/browse/HSCALE?report=com.atlassian.jira.plugin.system.project:changelog-panel');">full list of changes</a>.</p>
<p>For the next release (0.3) we focus on the dictionary based partition lookup module and further performance improvements.</p>
<p>Finally, there is a new project home page: <a href="http://www.hscale.org" onclick="javascript:pageTracker._trackPageview('/http://www.hscale.org');">http://www.hscale.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pero.blogs.aprilmayjune.org/2008/05/06/hscale-02-released-and-new-project-web-page/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
