2014년 7월 7일 월요일

MariaDB 10.0.12 Multi Threaded Slave

MariaDB 10.0 MTS

MariaDB 10.0 and MySQL 5.6 both have multi-threaded replication feature, but there's some differences.
Multi-threaded replication of MySQL 5.6 works based on schema(database) and MariaDB 10.0 works based on domain id of GTID.
MariaDB 10.0's multi-threaded replication based on GTID domain id is related with multi-source replication feature.
But we can control connection(session)'s domain id using gtid_domain_id system variable manually for slave multi-thread replication on single-master environment (Not multi-source replication).
Roughly, it is looks like MySQL 5.6 is automatic and MariaDB 10.0 is semi-automatic multi-threaded replication.

Both feature have some benefits as you think. But sometimes we want to allocate another slave sql thread for special purpose query like ALTER or heavy dml statement.
So MariaDB 10.0's semi-automatic multi-threaded replication is not so bad. On the other hand, sometimes schema based multi-threaded replication is useless because of the sharding. Usually sharded mysql instance has only one small database.

And MariaDB 10.0 has another multi-threaded replication feature. In MariaDB knowledge base, MariaDB 10.0 has two types of multi-threaded replication.
First thing is Out-of-ordered multi-threaded replication. it is GTID's domain id based replication written prior paragraph.
And second thing is In-ordered mutli-threaded replication. This type of multi-threaded replication is based on group commit of binary log on master side.
(In this blog, I will focus on second type of MTS based on group commit.)

Before going to multi-threaded replication on slave side, let's check MariaDB 10.0 group commit out first.

Binary Log Group Commit

This is not the first time MySQL binary log group commit is implemented. But this feature is removed in 5.0 because there's some issue and reborn in MySQL 5.5 and 5.6.
Whatever, MariaDB 10.0 group commit affect not only binary log flushing but also innodb transaction commit. 
So InnoDB transaction might get slow when you activate group commit parameters. But not so much if you control it properly.

MariaDB 10.0 supports two system parameters for binary log group commit.
binlog_commit_wait_usec
MariaDB will flush group of binary log events of multiple transactions requested in short time period. But there's no transactions committed at the exact same time. So MariaDB have to wait a little for waiting some transactions combined together
But MariaDB does not know the proper time to wait because it varies SLA and target performance. So MariaDB supports binlog_commit_wait_usec system variable so that DBA can control the time the oldest transaction can tolerate.
binlog_commit_wait_count
DBA can control how often group commit happen with not only time but also the count of transactions. It's transactions' count not binary log events' count.

We can observe how many transactions are committed together by decoding mysql binary log.

First create table "test" with 3 integer columns. And set binlog_commit_wait_usec and binlog_commit_wait_count system variables.
And open 4 connections and run the insert at the same time. You can run 4 insert statement using CSSHX like tool easily.

MariaDB [test]> set global binlog_commit_wait_usec=5000; /* 5 milli-seconds */
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> set global binlog_commit_wait_count=50;
Query OK, 0 rows affected (0.00 sec)

Client1 > insert into test values (1,1,1);
Client2 > insert into test values (2,2,2);
Client3 > insert into test values (3,3,3);
Client4 > insert into test values (4,4,4);

Each transaction will take more time(maximum 5 milli seconds in this case), and some trasnaction (lucky one) will take much less than 5 milli seconds.
On MariaDB 10.0, each transaction has "COMMIT ID" and a few transactions might have the same "COMMIT ID" if they committed within same group commit.
You can decode mysql binary log using mysqlbinlog or SHOW BINLOG EVENTS command and you can this "COMMIT ID" in the "BEGIN GTID ..." line. "COMMIT ID" is printed with "cid=".
Below sample is the result of above 4 clients' test.

MariaDB [test]> show binlog events in 'binlog.000015';
+-----------+-----+--------------+..+------------------------------------------------+
| Log_name  | Pos | Event_type   |..| Info                                           |
+-----------+-----+--------------+..+------------------------------------------------+
| binlog.00 |   4 | Format_desc  |..| Server ver: 10.0.11-MariaDB-log, Binlog ver: 4 |
| binlog.00 | 248 | Gtid_list    |..| [0-1-1213]                                     |
| binlog.00 | 287 | Binlog_check |..| binary-log.000014                              |
| binlog.00 | 327 | Binlog_check |..| binary-log.000015                              |
| binlog.00 | 367 | Gtid         |..| BEGIN GTID 0-1-1214 cid=1786                   |
| binlog.00 | 407 | Query        |..| use `test`; insert into test values (3,3,3)    |
| binlog.00 | 501 | Xid          |..| COMMIT /* xid=1786 */                          |
| binlog.00 | 528 | Gtid         |..| BEGIN GTID 0-1-1215 cid=1786                   |
| binlog.00 | 568 | Query        |..| use `test`; insert into test values (1,1,1)    |
| binlog.00 | 662 | Xid          |..| COMMIT /* xid=1788 */                          |
| binlog.00 | 689 | Gtid         |..| BEGIN GTID 0-1-1216 cid=1787                   |
| binlog.00 | 729 | Query        |..| use `test`; insert into test values (4,4,4)    |
| binlog.00 | 823 | Xid          |..| COMMIT /* xid=1787 */                          |
| binlog.00 | 850 | Gtid         |..| BEGIN GTID 0-1-1217 cid=1787                   |
| binlog.00 | 890 | Query        |..| use `test`; insert into test values (2,2,2)    |
| binlog.00 | 984 | Xid          |..| COMMIT /* xid=1789 */                          |
+-----------+-----+--------------+..+------------------------------------------------+

Binary log file (binlog.000015) has 4 transactions. (Each transaction start with "BEGIN GTID ...")
And all transactions have different GTID but same cid value "1786". This means all 4 transactions are (group) committed together.

MariaDB 10.0 Multi-Threaded Replication

MariaDB 10.0's In-ordered multi-threaded replication works based on binary log group commit.
Coordinator thread of slave side read all transactions which have same cid from relay log and distribute across multiple sql threads.
The reason slave can replay it parallel is all transactions within same "COMMIT ID" does not conflicted(changing same record). No confliction is certified on master MariaDB. (I think this is really cool idea. So I like MariaDB ^^).

So more transactions(group committed) on master more parallelism we can expect. 

Advantages for not grouped transactions

We can get some benefits if there's no group committed transactions on master.
MariaDB could run "commit" of each transaction parallel even though all transactions have different "COMMIT ID".
We can get a performance gain when "commit" itself is heavy like (sync_binlog=1 & log_slave_updates) or (innodb_flush_log_at_trx_commit이=1) configuration.


MTS Performance Test

Hardware spec


  • Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz * 4 with HyperThreading
  • 32G memory
  • 2 SAS + Raid controller (R-1) with 512MB cache


MySQL Configurations


  • innodb_buffer_pool_size = 20G
  • binlog_commit_wait_usec = 1000
  • binlog_commit_wait_count = 50      ## only for MTS env.
  • slave_parallel_threads = 10        ## only for MTS env.
  • slave_parallel_max_queued = 524288 ## 512KB


Sysbench Configurations


  • 20 ~ 50 clients
  • table rows : 10 million
  • run only update_nokey update statement


Test Result


  • 15k Update statement / second  




  • 25k Update statement / second



 In this test I ran 25k update statements(per second), there's no replication delay on MTS replication.
I did not test how many update statements(per second) can be replicated without delay, because it's depend on query characteristics and performance.
And with MTS configuration, Slave server takes a lot of CPU cycles. (But it's fair enough, because Coordinator and SQL threads have to synchronize their status every time.)
And this is not a big deal on stand-by slave as we usually use this replication topology.

  • Group commit vs Master performance

There's one more thing we have to consider, transactions commit performance will be down when binlog_commit_wait_usec and binlog_commit_wait_count is set greater than 0.
Because each transactions have to wait until group conditions are met unless binlog_commit_wait_usec and binlog_commit_wait_count are 0. (Intermittent performance drop, I did not want to tune this because I think this performance drop is nothing to do with group commit performance.)


I ran the test both of group-commit activated and deactivated MariaDB. As you can see, there's some performance regression.
But in this test, this transaction waiting time affected throughput greatly because I made only 50 client threads. If there's 25k update statement (/second) with 5000 client thread, performance difference would be close.
And If we can choose proper waiting time and group-commit count(binlog_commit_wait_usec and binlog_commit_wait_count) this performance gap would not be a problem.



With MTS slave, we can see multiple sql thread replay binary event parallel.

MariaDB [(none)]> show processlist;
+-----+-------------+..+---------+-------------------..-+-----------------------------------------...---+
| Id  | User        |..| Command | State             .. | Info                                    ...   |
+-----+-------------+..+---------+-------------------..-+-----------------------------------------...---+
|   3 | system user |..| Connect | Waiting for prior .. | COMMIT                                  ...   |
|   4 | system user |..| Connect | init              .. | COMMIT                                  ...   |
|   5 | system user |..| Connect | init              .. | UPDATE sbtest set c='491483753-416518378...74 |
|   6 | system user |..| Connect | init              .. | UPDATE sbtest set c='909812426-309814605...27 |
|   7 | system user |..| Connect | Waiting for prior .. | COMMIT                                  ...   |
|   8 | system user |..| Connect | Waiting for prior .. | COMMIT                                  ...   |
|   9 | system user |..| Connect | init              .. | UPDATE sbtest set c='112858395-197504108...38 |
|  10 | system user |..| Connect | freeing items     .. | NULL                                    ...   |
|  11 | system user |..| Connect | init              .. | UPDATE sbtest set c='55183839-759991643-...4- |
|  12 | system user |..| Connect | Waiting for prior .. | COMMIT                                  ...   |
|  14 | root        |..| Sleep   |                   .. | NULL                                    ...   |
|  18 | system user |..| Connect | Waiting for master.. | NULL                                    ...   |
|  19 | system user |..| Connect | Slave has read all.. | NULL                                    ...   |
| 155 | root        |..| Query   | init              .. | show processlist                        ...   |
+-----+-------------+..+---------+-------------------..-+-----------------------------------------...---+

댓글 없음:

댓글 쓰기