2014년 6월 21일 토요일

EXT3 Huge File Remove

Sometimes peoples said "Overhead during dropping InnoDB table".
Dropping table need to scan buffer pool (Especially they need to scan twice in older version).

But I think a real overhead comes from file system.
So you might be already use XFS or EXT4 to avoid this.

Simply I tested 50GB file remove (unlink system call on Linux).



< Linux iostat during file remove (unlink on ext3 filesystem) >

My server has only 6 SAS disk (4==> RAID 1+0 used for mysql data directory, and remained 2 disk used as mysql log) and 1GB Raid controller cache.

In my test, removing 50GB file took 25 seconds. During this time, there's huge disk read iops as you can see above chart. And the whole time during file unlink, disk utilization is 100%.

As you can imagine, If you run drop table which has huge file size MySQL server can't handle use requests until file remove completely. Actually this is not so weird things. it's because of EXT3 file system architecture. 

On the other hand, EXT4 has several features to overcome this kind of fragmentations.


  • Multiblock allocator
    When ext3 appends to a file, it calls the block allocator, once for each block. Consequently, if there are multiple concurrent writers, files can easily become fragmented on disk. However, ext4 uses delayed allocation which allows it to buffer data and allocate groups of blocks. Consequently the multiblock allocator can make better choices about allocating files contiguously on disk. The multiblock allocator can also be used when files are opened in O_DIRECT mode. This feature does not affect the disk format.



  • Delayed allocation
    ext4 uses a performance technique called allocate-on-flush also known as delayed allocation. That is, ext4 delays block allocation until it writes data to disk. (In contrast, some file systems allocate blocks before writing data to disk.) Delayed allocation improves performance and reduces fragmentation by using the actual file size to improve block allocation.


I quoted this from wiki about EXT4 (http://en.wikipedia.org/wiki/Ext4)

XFS also has this kind of optimization. So XFS and EXT4 have fewer fragmented blocks than EXT3. Sometimes MySQL server performance is dropped when removing binary log file internally (Default size of binary log file is 1GB, So we changed it to 100MB).

If you can't change file system to EXT4 or XFS, then you can use linux hard link + truncate command.

< Linux iostat during file remove (truncate file 1GB each 2seconds on ext3 filesystem) >

Tested program does just truncate last 1GB amount of contents of 50GB file and sleep 2 seconds. After that truncate last 1GB amount of contents of remained 49GB. so on...

Disk utilization is really stable. And this time user requests are never blocked.

InnoDB will just call unlink system call when you drop table. And unlink system call never drop when the target file has another hard link(man unlink). So you can make another hard link for huge ibd file before and run drop table, then InnoDB drop only just one link.
After that, you can truncate the hard linked file little by little.


shell> link /mysql_data/db1/huge_table.ibd   /mysql_data/huge_table.ibd.dropped
mysql > drop table huge_table;
shell> ## doing ftruncate "huge_table.ibd.dropped" little by little  

I'm doing this can be possible as MySQL builtin features.
Still I'm doing read and modify mysql code (Not done yet).

https://github.com/SunguckLee/MariaDB/commit/87f05c2619c714007d40b9c07e151cb51ef6eca6

댓글 없음:

댓글 쓰기