Archive for the 'java' Category

MySQL Connector/J randomly hanging at com.mysql.jdbc.util.ReadAheadInputStream.fill

Tuesday, February 2nd, 2010

In the past months we struggled with large SELECT queries just get stuck at:


java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
- locked com.mysql.jdbc.util.ReadAheadInputStream@cb9a81c
com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2494)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2949)
com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2938)
com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3481)
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)
- locked java.lang.Object@70cbccca
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2571)
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:782)
- locked java.lang.Object@70cbccca
com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:625)
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)
org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:260)

Whenever this happened we just restarted the Tomcat server and everything was fine again for some days or weeks. But today it struck us very hard so we finally took the time to hunt this down. It seems to be related to this bug report. Some comments suggested to use SQL_NO_CACHE with your queries.

A lot of people (including me) suggest to disable the MySQL query cache since it may cause severe problems. To disable the query cache at server startup, set the query_cache_size system variable to 0.

This is what we usually do, but one of our servers had query cache turned on. Disabling it solved this problem.

MySQL Partitioning on Application Side

Wednesday, March 26th, 2008

After following the scale up path (i.e. buy bigger boxes) for a while now, we definitely need to scale out as things start to become “unhandy” and simply HUGE. (See small things are better)

We are running an OLTP application with about 200 million transaction per month. Currently we have some very large tables with more than 1,5 billion rows and the total database size is about 600 GB - all InnoDB.

What we want to implement is application based sharding, i.e. splitting up the big tables and distribute them among many smaller servers. Furthermore we need to implement some sort of archiving mechanism since the db size is growing very fast. Putting old data into compressed myisam tables seems to be a good solution here.

So, how to implement this?

These solutions come into mind immediately:

  1. Using MySQL Cluster (this is not really application wise sharding but I mention it for completeness)
  2. Using MySQL Proxy with transparent query rewriting and load balancing
  3. Implement it into a JDBC driver
  4. Implement it into the application data access layer (DAOs)

Each solution comes with a list of pros and cons.

MySQL Cluster
Using the cluster would bring scalability right away but as we do not have any experience with this and had a lot of issues with the other storage engines (first crashes with MyISAM, then crashes with InnoDB, both being unsolved for a while even though the MySQL Enterprise Support did a great job!) I feel quite reluctant switching to a new storage engine. The hardware requirements would be too high too, so I don’t think that this could be the best solution.

Application Based Sharding
For all the other 3 types of partitioning I would use the following mechanism and meta tables.
Say this is a table I would like to split (Please note that this is just an example with some added redundancy (productCategory) to ease partitioning by product category.):

CREATE TABLE orders (
    orderId BIGINT(20) NOT NULL,

    productCategory VARCHAR(50) NOT NULL,
    productId BIGINT(20) NOT NULL,
    date DATETIME NOT NULL,

    PRIMARY KEY (orderId)
);

The column productCategory would be the partition key since there are only about 20 to 100 product categories used and the data is spread almost evenly. The real scenario would be different but this is a close and easy to grasp match.

The partition meta data table could look like this:

CREATE TABLE partition (
    sourceTable VARCHAR(255) NOT NULL, # would be "orders" in our example
    partitionKey VARCHAR(255) NOT NULL, # would be the value of "productCategory"
    dbHost VARCHAR(100) NOT NULL,
    dbPort VARCHAR(100) NOT NULL
    dbName VARCHAR(100) NOT NULL,
    dbUser VARCHAR(100) NOT NULL,
    dbPass VARCHAR(100) NOT NULL,
    dbTable VARCHAR(100) NOT NULL,

    PRIMARY KEY (sourceTable, partitionKey)
);

This way I could assign a different database connection to each product category or a set of categories. Furthermore we can have multiple tables of the same “type” within one database because the table name is configured, too (dbTable). So we could have the tables order_books, orders_hardware, orders_cds on server A and orders_dvds, orders_shirts on server B.

Limitations
This type of partitioning forces all queries to include a partition key (i.e. a productCategory) and IN clauses on the partition key are not possible or at least very hard to implement. Furthermore a full table scan (i.e. selecting rows based on the orderId column) requires to scan all tables. (This could be optimized if you have a correlation between orderId and productCategory in some way which is the case in our scenario - more on that later.).

MySQL Proxy and JDBC Driver
Yesterday I played a bit with the MySQL Proxy and yes it is hell of an awesome thing! I did some sort of query rewriting and load balancing which was easy at first hand.
The approach for both, MySQL Proxy and JDBC driver, would be the same: Rewrite the queries coming from the application by analyzing them and by using hints put into the SQL by the application (where analyzing does not help).

Assume the following query is issued to look up all orders of a product category:

SELECT * FROM orders WHERE productCategory=‘Books’;

After analyzing the query it is clear that we have to lookup the database configuration for the category ‘Books’, so we could rewrite that query (and send it to a different database connection if needed) like this:

SELECT * FROM orders_books WHERE productCategory=‘Books’;

We could even strip off the WHERE clause if there are only books stored in the table “orders_books”.

That was easy. What about more complex queries like (Attention: Bad database design ahead! It is just an example ;)):

SELECT
    orders.*,
    books.*
FROM
    orders,
    books
WHERE
    orders.productName = books.name

We know that the partition key has to be ‘Books’. But how can our query analyzer know that? It can’t. So there might be queries that are valid but cannot be analyzed. One way is to add a where condition AND orders.productCategory = 'Books', but that might not always be possible or wanted.
Another option would be to add hints like this:

SELECT
    orders.*,
    books.*
FROM
    orders /* HINT: productCategory=’Books’ */,
    books
WHERE
    orders.productName = books.name

I think implementing it as a MySQL Proxy LUA script or a JDBC driver would not make a big difference on the approach. I would prefer MySQL Proxy a lot though.

The pros of this approach are very clear:

  1. It is a transparent solution. Only a few parts of the application need to be changed in best case.
  2. It could be easily released as open source since it is not bound to any type of application - a huge plus IMHO!
  3. It keeps the application code simple.

The cons:

  1. Implementation is very challenging. Analyzing the queries would be the toughest part.
  2. Harder to test against the application since it is a runtime dependency. Would have to come up with a good integration test scenario.
  3. Performance could be an issue since the queries need to be parsed.
  4. Application would need to be changed anyway since we cannot issue some sorts of queries anymore. This could be small or a huge impact depending on the application.

Application Data Access Layer (DAO)
This would be the most straightforward way of implementing partitioning. Since we have total control over our application we could adopt the mechanism in any way to meet our needs. The basic concept would be to use a PartitionFactory like this:

public interface PartitionFactory {
 
    public Connection getConnection(String sourceTable, String partitionKey);
 
    public String getTable(String sourceTable, String partitionKey);
}

Within this solution we have to know which partition to use beforehand. Thus every DAO method has to know where to go. This could be done by adding the partition key to every method or rewriting methods to first find out which partition to use.
The path to this solution is clear and very straightforward, I think.

The pros:

  1. Easy to implement since we know what the application wants and does.
  2. Fast. No need to rewrite queries.
  3. Easier to grasp for (database) developers since they see the dependencies right in the code.

The cons:

  1. Proprietary solution unless embedded into other frameworks like Spring, Hibernate or iBatis.
  2. Might blowup the application code making it harder to understand.
  3. A lot of the application needs to be changed.

Summary
We are still in the process of evaluating every solution. Right now I would love to go for the MySQL Proxy solution be it just for its aesthetic image ;) We need to do a bit more prototyping to see if this would really be a good solution.

Any comment, discussion or hint is highly appreciated!

More to follow soon…

DOH! IE displays white page after switching to JSPX

Saturday, February 17th, 2007

I guess almost everybody has been hit by the “IE shows only a blank page” issue while experimenting with XHTML back then. It is because IE requires a closing tag. The following fragment will not work in IE:

<script type="text/javascript" src="…" />

But this works:

<script type="text/javascript" src="…"></script/>

Ok, we learned that.

Today I had a hard time with IE showing a white page and all I had done was converting an old .jsp into a .jspx file.

What happened?

After hitting google with every type of phrase I could imagine describing my problem, I finally found it out: The servlet container (Tomcat 5.5.12) removed the closing script tag because the content of the script tag was empty. DOH!

What did the trick?

For now I do the following:

<script type="text/javascript" src="…"><– –></script/>

Wow. If you love surprises - do web development. ;)