Understanding MySQL’s “Binlog”. The mechanism of Binlog | by Dwen | Jun, 2022

The mechanism of Binlog

Photo by Samantha Gades on Unsplash

This article deeply explains the writing mechanism of the important log binlog in MySQL and the key configurations that affect IO performance and introduces how to use binlogto restore data.

The writing logic of binlogis not complicated. During a transaction’s execution, the log is first written to the binlogcache, and when the transaction is committed, the binlogcache is written to the binlogfile.

Image credit: Author

Binlog cache

The binlog cache temporarily stores binlogdata during the execution of the transaction, and the binlog cache is spatially independent within each thread.

If binlog logging is enabled, MySQL allocates a binary log cache per client.

If you frequently use large transactions, you can increase this cache size for better performance. Its size can be configured through binlog_cache_size, the default is 32768 bytes.

The main function of this parameter is to control the size of the memory occupied by the binlogcache in a single thread.

If the binlog cache space is sufficient, when the transaction is committed, the content in the cache will be emptied and the data will be written to the binlogfiles.

Because the binlogcontent needs to be written at one time when the transaction is committed, no matter how big it is, when the bin log cache cannot fit it, it needs to be temporarily stored on the disk, and then the commit is written to the binlogfiles.

Image credit: Author

The above-mentioned writing binlog is actually split into two parts:

  • Write: First, it will be written to the binlogfiles in the page cache, which is a piece of memory that does not take up disk IOPS.
  • Fsync: Then the bin log is actually persisted from the page cache to the disk when the operating system executes fsync ; it occupies disk IOPS.
Image credit: Author

When to write and fsync?

The timing of writeand fsync is controlled by the parameter sync_binlog:

sync_binlog=0: Indicates that each commit transaction is only written and fsync is not performed, that is, binlogdoes not persist (not recommended).

sync_binlog=1: Indicates that fsyncoccurs every time a transaction is committed.

sync_binlog=N: Indicates that each transaction will write, but N transaction submissions will execute fsyncfor persistence.

Generally speaking, in order to increase IOPS, this parameter is set to 100–1000. The disadvantage is that if the machine crashes before fsyncis executed, the binloglog of the most recent N transactions will be lost at most. Set to 1for data security.

The official documentation describes it as follows:

“By default, the binary log is synchronized to disk at each write (sync_binlog=1). If sync_binlog was not enabled, and the operating system or machine (not only the MySQL server) crashed, there is a chance that the last statements of the binary log could be lost.

To prevent this, enable the sync_binlog system variable to synchronize the binary log to disk after every N commit groups. See Section 5.1.8, “Server System Variables”. The safest value for sync_binlog is 1 (the default), but this is also the slowest.”

What to do if binlog content is missing?

We already know that data can be recovered through redo log, but what if the binlog content is missing?

According to the description of the sync_binlog parameter, if it is set to be greater than one, it means that N transactions will persist in the log to the disk.

At the same time, the innodb_flush_log_at_trx_commit parameter of the redo logis set to 1, which means that fsync will be executed for every transaction committed.

This creates a possibility that if the MySQL server crashes, the redo log has persisted at this time, but the binlog and transactions are still stored in the page cache, and there is no time to execute fsync.

After restarting the service after the downtime, and relying on redo log to restore data, the binloglength will be shorter than the actual data should need.

In the face of complete data but missing binlog, what will happen in this case? At first, I was puzzled. Finally, I found the answer in the official documentation of MySQL 5.7.

The official documentation describes it as follows:

“If the MySQL server discovers at crash recovery that the binary log is shorter than it should have been, it lacks at least one successfully committed InnoDB transaction. This should not happen if sync_binlog=1 and the disk/file system do an actual sync when they are requested to (some do not), so the server prints an error message The binary log *file_name* is shorter than its expected size.

In this case, this binary log is not correct and replication should be restarted from a fresh snapshot of the source’s data.”

According to the official description, if sync_binlog is set to 1, this will not happen. If there is less binlog than expected, an abnormal log of the binlog log will be printed on the server:

The binary log xxx is shorter than its expected size.

So, this situation requires human intervention and recopying from the latest snapshot data source.

How to use Bin Log to restore historical data?

binlog will record all logical operations, and in the form of “append write,” it will not overwrite log files like redo log.

The data in the binlog is reliable and must be the data after the transaction is committed, which is guaranteed by the redo log two-phase commit.

As we all know, we can restore data at any point in time through bin log, how to do it?

In fact, this is a prerequisite. We must have a mechanism for regularly backing up the full amount of data, such as half a month, every week, or every day.

Hypothetical scenario: For example, table data was deleted by mistake at noon today, and the data needs to be retrieved. What should I do?

  • First, find the most recent full backup, such as 1:00 a.m. yesterday, then restore this backup to the temporary database.
  • Then, from the backup time at 1:00 am yesterday, the backup binlog are taken out and replayed in sequence until the data at 12:00 noon today.

In this way, the temporary database reaches the state before the data was accidentally deleted, and then the table data can be taken out from the temporary database and restored to the online formal database as required.

News Credit

%d bloggers like this: