2014년 5월 11일 일요일

LIMIT ROWS MATCHED

* Why

Recently MariaDB 10.0 include LIMIT ROWS EXAMINED features.
I think this features will prevent our services from abnormal behavior(A sudden change of QEP or Operation miss, ... something else..) of MariaDB.

I got new idea from LIMIT ROWS EXAMINED features.
We often use "SELECT COUNT(*) .." query for counting rows matched condition.
We can limit row count with LIMIT clause when we SELECT row itself, But we can't during counting rows.
So, I added small features on MariaDB 10.0.10.



* What : LIMIT ROWS MATCHED 

This new features works before "ORDER BY" and "GROUP BY" and some aggregate function like "COUNT()" and "SUM()".
So we can count rows matched WHERE condition with LIMIT ROWS MATCHED syntax.
Below example is simple usage of LIMIT ROWS MATCHED syntax. (in this example LIMIT ROWS MATCHED works same as just LIMIT clause)

MariaDB [test]CREATE TABLE test (fd_pk INT NOT NULL auto_incrementfd1 INTfd2 INTPRIMARY KEY(fd_pk));
Query OK0 rows affected (0.04 sec)

MariaDB [test]INSERT INTO test VALUES (1,1,1), (2,2,2), (3,3,3), (4,4,4), (5,5,5);
Query OK5 rows affected (0.02 sec)
Records: 5  Duplicates: 0  Warnings: 0

MariaDB [test]SELECT FROM test;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
|     
|    |    |
|     
|    |    |
|     
|    |    |
|     
|    |    |
+
-------+------+------+

5 rows IN SET (0.00 sec)

MariaDB [test]SELECT FROM test LIMIT ROWS MATCHED 2;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
|     
|    |    |
+
-------+------+------+

2 rows IN SET (0.00 sec) 

But if you use COUNT(*) aggregate function, the result looks different.

MariaDB [test]SELECT COUNT(*) FROM test LIMIT 3;
+
----------+

COUNT(*) |
+
----------+

|        |
+
----------+

1 row IN SET (0.01 sec)

MariaDB [test]SELECT COUNT(*) FROM test LIMIT ROWS MATCHED 3;
+
----------+

COUNT(*) |
+
----------+

|        |
+
----------+

1 row IN SET (0.00 sec) 

And LIMIT ROWS MATCHED feature is processed before LIMIT. So sometimes, it looks like LIMIT is not working when you use both of LIMIT n and ROWS MATCHED n.

MariaDB [test]SELECT FROM test LIMIT 3 ROWS MATCHED 2;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
|     
|    |    |
+
-------+------+------+

2 rows IN SET (0.00 sec)  

We can use LIMITED ROWS MATCHED syntax nested fashion.

MariaDB [test]> SELECT FROM (
  SELECT FROM test WHERE fd2 BETWEEN AND 4 LIMIT ROWS MATCHED 2
LIMIT ROWS MATCHED 1;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
+
-------+------+------+

1 row IN SET (0.01 sec)

MariaDB [test]> SELECT FROM (SELECT FROM test WHERE fd2 BETWEEN AND 4 LIMIT ROWS MATCHED 2x LIMIT ROWS MATCHED 2;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
|     
|    |    |
+
-------+------+------+

2 rows IN SET (0.00 sec)

MariaDB [test]&gtSELECT FROM (
  SELECT FROM test WHERE fd2 BETWEEN AND 4 LIMIT ROWS MATCHED 2
LIMIT ROWS MATCHED 3;
+
-------+------+------+

fd_pk fd1  fd2  |
+
-------+------+------+

|     |    |    |
|     
|    |    |
+
-------+------+------+

2 rows IN SET (0.01 sec)  


In above example, derived table(subquery of FROM clause) will cut before third matched row, so outer LIMIT ROWS MATCHED 2 and LIMIT ROWS MATCHED 3 don't change the final result.
But first query of above example, outer LIMIT ROWS MATCHED is 1 (less than inner LIMIT ROWS MATCHED 2) so the result set has just 1 row.

And LIMIT ROWS MATCHED syntax could be used in join also.

MariaDB [test]SELECT FROM test t1test t2test t3 WHERE t1.fd_pk=t2.fd_pk AND t2.fd_pk=t3.fd_pk LIMIT ROWS MATCHED 3;
+
-------+------+------+-------+------+------+-------+------+------+

fd_pk fd1  fd2  fd_pk fd1  fd2  fd_pk fd1  fd2  |
+
-------+------+------+-------+------+------+-------+------+------+

|     |    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
+
-------+------+------+-------+------+------+-------+------+------+

3 rows IN SET (0.02 sec)

MariaDB [test]SELECT FROM test t1test t2test t3 WHERE t1.fd_pk=t2.fd_pk AND t2.fd_pk=t3.fd_pk AND t3.fd1<>1 LIMIT ROWS MATCHED 3;
+
-------+------+------+-------+------+------+-------+------+------+

fd_pk fd1  fd2  fd_pk fd1  fd2  fd_pk fd1  fd2  |
+
-------+------+------+-------+------+------+-------+------+------+

|     |    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
+
-------+------+------+-------+------+------+-------+------+------+

3 rows IN SET (0.01 sec)  

Result set are little weird or different from what you expected.

-- // duplicate test table's row for ORDER BY .. LIMIT ROWS MATCHED n query
MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;
Query OK5 rows affected (0.02 sec)
Records: 5  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK20 rows affected (0.01 sec)
Records: 20  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK40 rows affected (0.01 sec)
Records: 40  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK80 rows affected (0.03 sec)
Records: 80  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK160 rows affected (0.05 sec)
Records: 160  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK320 rows affected (0.12 sec)
Records: 320  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK640 rows affected (0.20 sec)
Records: 640  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;Query OK, 1280 rows affected (0.38 sec)

Records: 1280  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;Query OK, 2560 rows affected (0.74 sec)
Records: 2560  Duplicates: 0  Warnings: 0

MariaDB [test]INSERT INTO test SELECT NULL, fd1fd2 FROM test;

Query OK5120 rows affected (1.47 sec)
Records: 5120  Duplicates: 0  Warnings: 0


MariaDB [test]SELECT 

FROM test t1test t2test t3 
WHERE t1.fd_pk=t2.fd_pk AND t2.fd_pk=t3.fd_pk 
ORDER BY t3.fd2 DESCt2.fd2 DESCt1.fd2 DESC 
LIMIT 3;
+
-------+------+------+-------+------+------+-------+------+------+

fd_pk fd1  fd2  fd_pk fd1  fd2  fd_pk fd1  fd2  |
+
-------+------+------+-------+------+------+-------+------+------+

|   578 |    |    |   578 |    |    |   578 |    |    |
|  
2792 |    |    |  2792 |    |    |  2792 |    |    |
|  
8694 |    |    |  8694 |    |    |  8694 |    |    |
+
-------+------+------+-------+------+------+-------+------+------+

3 rows IN SET (2.27 sec)

MariaDB [test]SELECT 
FROM test t1test t2test t3 
WHERE t1.fd_pk=t2.fd_pk AND t2.fd_pk=t3.fd_pk 
ORDER BY t3.fd2 DESCt2.fd2 DESCt1.fd2 DESC 
LIMIT ROWS MATCHED 3;
+
-------+------+------+-------+------+------+-------+------+------+

fd_pk fd1  fd2  fd_pk fd1  fd2  fd_pk fd1  fd2  |
+
-------+------+------+-------+------+------+-------+------+------+

|     |    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
+
-------+------+------+-------+------+------+-------+------+------+

3 rows IN SET (0.01 sec)  

Both query of above example have "ORDER BY t3.fd2 DESC, t2.fd2 DESC, t1.fd2" clause and same WHERE conditions.
First query use "LIMIT ROWS MATCHED 3" on the other hand second query use just "LIMIT 3".
"LIMIT 3" is processed after sort but "LIMIT ROWS MATCHED 3" is processed before sort operation. So first query need to sort all joined rows, but second query only sort 3 rows.
(Of course, the result set of second query is not what you want, so be cautious)

And LIMIT ROWS MATCHED syntax can be used on outer join query.

MariaDB [test]>  SELECT 
FROM test t1 
  INNER JOIN test t2 ON t2.fd_pk=t1.fd_pk 
  LEFT JOIN test t3 ON t3.fd_pk=t1.fd_pk-
ORDER BY t3.fd2t2.fd2t1.fd2 
LIMIT ROWS MATCHED 3;
+
-------+------+------+-------+------+------+-------+------+------+

fd_pk fd1  fd2  fd_pk fd1  fd2  fd_pk fd1  fd2  |
+
-------+------+------+-------+------+------+-------+------+------+

|     |    |    |     |    |    |  NULL | NULL | NULL |
|     
|    |    |     |    |    |     |    |    |
|     
|    |    |     |    |    |     |    |    |
+
-------+------+------+-------+------+------+-------+------+------+

3 rows IN SET (0.02 sec)  




* Performance (Appendded)

I ran a simple row counting test. 
LIMIT ROWS MATCHED faster about 20~30% than SELECT COUNT(*) .. FROM (SELECT .. LIMIT).

-- // -----------------------------------------------------------------
-- // Counting 10000 rows
-- // -----------------------------------------------------------------

MariaDB [test]SELECT COUNT(*) FROM test limit rows matched 10000;
+
----------+
COUNT(*) |
+
----------+
|    10000 |
+
----------+
1 row IN SET (0.05 sec)

MariaDB [test]SELECT COUNT(*) FROM (SELECT FROM test limit 10000x;
+
----------+
COUNT(*) |
+
----------+
|    10000 |
+
----------+
1 row IN SET (0.06 sec)

-- // -----------------------------------------------------------------
-- // Counting 30000 rows
-- // -----------------------------------------------------------------

MariaDB [test]SELECT COUNT(*) FROM test limit rows matched 30000;
+
----------+
COUNT(*) |
+
----------+
|    30000 |
+
----------+
1 row IN SET (0.14 sec)

MariaDB [test]SELECT COUNT(*) FROM (SELECT FROM test limit 30000x;
+
----------+
COUNT(*) |
+
----------+
|    30000 |
+
----------+
1 row IN SET (0.19 sec)

-- // -----------------------------------------------------------------
-- // Counting 80000 rows
-- // -----------------------------------------------------------------

MariaDB [test]SELECT COUNT(*) FROM test limit rows matched 80000;
+
----------+
COUNT(*) |
+
----------+
|    80000 |
+
----------+
1 row IN SET (0.36 sec)

MariaDB [test]SELECT COUNT(*) FROM (SELECT FROM test limit 80000x;
+
----------+
COUNT(*) |
+
----------+
|    80000 |
+
----------+
1 row IN SET (0.47 sec)



* Finally

written on top of this blog, the priamry purpose of LIMIT ROW MATCHED syntax is limiting rows during count.
In message service or some other services, usally we need to count rows matched where condition. 
But almost case we don't need to count all matched rows. and counting all matched rows need more cpu cycle and disk reads.
Yes, we can limit counting rows with sub-query on FROM clause, but this solution need memory or disk internal temporary table.
Even I saw some people use "SELECT 1 FROM ... LIMIT n" and counting rows on client side. This solution also need additional network bandwidth.

But we can do limited counting job with minimal computing power. Because LIMIT ROWS MATCHED features never use additional network bandwidth and internal temporary table.



* Limitations

1. "LIMIT ROWS MATCHED" syntax can not be used with "LIMIT ROWS EXAMINED" together.
2. "LIMIT ROWS MATCHED n" will push "LIMIT n" automatically. So MariaDB can not employ some semi-join optimizations when query contains "LIMIT ROWS MATCHED n" (This limitation is applied "LIMIT n" query too).


* Download

https://github.com/SunguckLee/MariaDB


** This feature may have some bug,  so please be cautious.