mysql innodb_flush_log_at_trx_commit翻译

知道innodb_flush_log_at_trx_commit的意思,但是对它取值0,1,2一直有点模糊不清。特地找了MySQL 5.1的refrence,自己翻译一下。虽然,也有官方的中文版翻译,但是不好意思,有点不相信它。

英文原文如下:
innodb_flush_log_at_trx_commit
Command-Line Format     –innodb_flush_log_at_trx_commit[=#]
Config-File Format     innodb_flush_log_at_trx_commit
Option Sets Variable     Yes, innodb_flush_log_at_trx_commit
Variable Name     innodb_flush_log_at_trx_commit
Variable Scope     Global
Dynamic Variable     Yes
Permitted Values
Type     numeric
Default     1
Valid Values     0, 1, 2

If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.

如果innodb_flush_log_at_trx_commit设置为0,log buffer将每秒一次地写入log file中,并且log file的flush(刷到磁盘)操作同时进行;但是,这种模式下,在事务提交的时候,不会有任何动作。如果innodb_flush_log_at_trx_commit设置为1(默认值),log buffer每次事务提交都会写入log file,并且,flush刷到磁盘中去。如果innodb_flush_log_at_trx_commit设置为2,log buffer在每次事务提交的时候都会写入log file,但是,flush(刷到磁盘)操作并不会同时进行。这种模式下,MySQL会每秒一次地去做flush(刷到磁盘)操作。注意:由于进程调度策略问题,这个“每秒一次的flush(刷到磁盘)操作”并不是保证100%的“每秒”。

The default value of 1 is the value required for ACID compliance. You can achieve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, then only an operating system crash or a power outage can erase the last second of transactions. However, InnoDB’s crash recovery is not affected and thus crash recovery does work regardless of the value.

默认值1是为了ACID (atomicity, consistency, isolation, durability)原子性,一致性,隔离性和持久化的考虑。如果你不把innodb_flush_log_at_trx_commit设置为1,你将获得更好的性能,但是,你在系统崩溃的情况,可能会丢失最多一秒钟的事务数据。当你把innodb_flush_log_at_trx_commit设置为0,mysqld进程的崩溃会导致上一秒钟所有事务数据的丢失。如果你把innodb_flush_log_at_trx_commit设置为2,只有在操作系统崩溃或者系统掉电的情况下,上一秒钟所有事务数据才可能丢失。(下面的这句话到底是针对innodb_flush_log_at_trx_commit为2说的,还是针对前面这一整段说的,我就搞不清楚了,下次问问编写这一段文档的MySQL的人去。感觉是针对整段的:就是说InnoDB的crash recovery会利用log file来恢复数据文件,跟innodb_flush_log_at_trx_commit的值没有关系,管你这个值怎么设置的,我从log file拿到多少数据,就恢复多少数据。)InnoDB的crash recovery崩溃恢复机制并不受这个值的影响,不管这个值设置为多少,crash recovery崩溃恢复机制都会工作。

For the greatest possible durability and consistency in a replication setup using InnoDB with transactions, use innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 in your master server my.cnf file.

为了在使用InnoDB事务的搭建复制环境中,达到最大的持久化和一致性,你需要在你的master主机的my.cnf中设置innodb_flush_log_at_trx_commit = 1并且设置sync_binlog = 1。

Caution

Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. Then the durability of transactions is not guaranteed even with the setting 1, and in the worst case a power outage can even corrupt the InnoDB database. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer. You can also try using the Unix command hdparm to disable the caching of disk writes in hardware caches, or use some other command specific to the hardware vendor.

注意:
很多操作系统和一些磁盘硬件系统并不会真正的做flush-to-disk刷新到磁盘的这个操作。他们即使并没有真正刷到磁盘也会告诉mysqld说flush刷新到磁盘的操作已经完成了。这样的话,即使innodb_flush_log_at_trx_commit设置为1,也不能保证事务的持久化,最糟的情况下,一个主机掉电,就有可能导致InnoDB数据库崩溃。你可以考虑在SCSI磁盘控制器里面或者磁盘本身中,使用带蓄电池后备电源的磁盘缓存disk cache,来提高文件刷新操作的速度,使得这个操作更加安全。你同样可以尝试使用Unix的hdparm命令来阻止硬件缓存hardware cache的写磁盘缓存操作,或者使用其他硬件提供商hardware vendor提供的命令来避免写磁盘缓存。

这里既然提到了sync_binlog就顺便把它也翻译一下。
sync_binlog
Command-Line Format     –sync-binlog=#
Config-File Format     sync_binlog
Option Sets Variable     Yes, sync_binlog
Variable Name     sync_binlog
Variable Scope     Global
Dynamic Variable     Yes
Permitted Values
Platform Bit Size     32
Type     numeric
Default     0
Range     0-4294967295
Permitted Values
Platform Bit Size     64
Type     numeric
Default     0
Range     0-18446744073709547520

If the value of this variable is greater than 0, the MySQL server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log. There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise. The default value of sync_binlog is 0, which does no synchronizing to disk — in this case, the server relies on the operating system to flush the binary log’s contents from to time as for any other file. A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log. However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).

当sync_binlog变量设置为大于0的值时,MySQL在每次“sync_binlog”这么多次写二进制日志binary log时,会使用fdatasync()函数将它的写二进制日志binary log同步到磁盘中去。如果启用了autocommit,那么每一个语句statement就会有一次写操作;否则每个事务对应一个写操作。sync_binlog的默认值是0,这种模式下,MySQL不会同步到磁盘中去。这样的话,MySQL依赖操作系统来刷新二进制日志binary log,就像操作系统刷其他文件的机制一样。当sync_binlog变量设置为1是最安全的,因为在crash崩溃的情况下,你的二进制日志binary log只有可能丢失最多一个语句或者一个事务。但是,这也是最慢的一种方式(除非磁盘有使用带蓄电池后备电源的缓存cache,使得同步到磁盘的操作非常快)。

找了一下man fdatasync:
fdatasync() flushes all data buffers of a file to disk (before the system call returns).  It resembles fsync() but is not required to update the metadata such as access time.
fdatasync() (在系统调用system call返回前)将文件中所有的数据缓存区data buffers都flush刷到磁盘中去。它类似于fsync()函数,但是它不会更新元数据metadata:比如最后访问时间等。

may you success.

6 Comments

  1. whitepoplar说道:

    2是性能最好的,因为不需要每秒都刷,当然前提是服务器有后备电源
    1会在大并发的事务里面造成一些性能损耗

  2. Chanel bags说道:

    Very nice details , I ascertained your web site on google and read a couple of of the other posts. I just added you to my Google News Reader. Keep on up the great mathematical function Look forward to reading extra from you in the future….

  3. pickup.lichun说道:

    发现我们几台机器solaris跑的比较郁闷,最后还是发现是这两个参数都设置为1的结果。
    其实理解innodb_flush_log_at_trx_commit是比较麻烦的,关键是要理解:log buffer将每秒一次地写入log file中,并且log file的flush(刷到磁盘)。有两个缓存需要刷。

  4. […] 文章来源:http://hatemysql.com/2010/05/12/mysql-innodb_flush_log_at_trx_commit%E7%BF%BB%E8%AF%91/ […]

  5. […] 文章来源:http://hatemysql.com/2010/05/12/mysql-innodb_flush_log_at_trx_commit%E7%BF%BB%E8%AF%91/ […]

Leave a Reply