msgbartop
MAC OS X, Linux, Windows and other IT Tips and Tricks
msgbarbottom

10 Oct 13 Rebuild MySQL replication slave LIVE from master

Problem:
You have 2 MySQL servers in the configuration Master ==>> Slave and the slave has gone very badly out of sync with the master. Normally it’s no real problem if you can lock the master tables, make a dump of the databases of the master and import them back into the slave. But the real problem here is that you cannot afford to stop the applications writing on the master MySQL databases while you are rebuilding the slave and putting it back in sync with the master.

Possible solution: Rebuild LIVE the slave from the master’s databases without disturbing the master at all and still be in sync at the end. You might think that while you run the mysqldump, new data will be written to the databases in the master, and the mysqldump might not see them. Or after the mysqldump has finished running, new data might be written to the master, then how will the slave get synchronized with it as well at the end. Well with all the multiple options given to the mysqldump and the way the slave is restarted it gets all solved in this manner:
– Make a list of databases to rebuild to be fed to the mysqldump with only the DBs that need it(no mysql or information_shema databases)
– The databases dump starts and concurrently remembers what it is still receiving from applications.
– At the end of the dump it also dumps this data received in the meantime.
– Then it immediately flushes his binlog. This way the new (unrecorded data in dump file) is getting written to the new binlog file.
– The dump file gets imported into the slave mySQL server, deleting every database before importing the new data.
– At the end of the import action, before we start the replication process, we instruct the slave mysql server to point itself at pointer position:4(normal master binlog pointer start position) of the continuously growing master binlog.
– Then we instruct the slave to start the replication as slave.

No data is lost, the master never got disturbed while doing this and the slave is in sync with the master again. In order to do all this without making mistakes, I created a script which does the work while I’m having a cup of coffee and maybe two.

Important note: The configuration ‘master replicated to slave’ is not a very stable one concerning keeping the replication in sync. If the slave goes down and restarts , all is fine, but if the master goes down and restarts then very often the slave is no more in sync.
My experience is that, when only 2 servers(master and slave) are involved in a replication system, here is my recommendation which offers a much more stable replication system:
You do ‘loop replication’ between them.
In other words, every server is a master and a slave of the other.
It is explained in details in this article: http://tipstricks.itmatrix.eu/?p=1320

Note 2:
This script below uses 2 methods of dumping and importing in the data. One uses the ‘pipe’ method where no temporary dump file is created. The dump data in immediately fed into the slave via the pipe. This method has the advantage of being fast and not requiring to create a large temporary dump file. The disadvantage is that, with some very big databases the process got often broken. I didn’t try it too many times with big(2-12GB) since one waits a long time to see a failure. For this reason I implemented the temporary dump file method. In this method the dump file is created locally in the master, then carried to the slave at the same path and then imported in the slave MySQL server. If this ‘dump file’ method is used you will need to run this script on the master.

Note 3:
In this following script the databases ‘information_schema‘ and ‘mysql‘ are excluded from being rebuilt. In case one of them is corrupt then you need to take them out of the EXCEPTIONS list at the beginning of the script. See the original information site on mysqldump at: https://dev.mysql.com/doc/refman/5.1/en/mysqldump.html

If needed adapt the temporary dump file path in the script if needed.

So here is the bash script that does it all:
#!/bin/bash
# Name: MySQL_REBUILD_LIVE_SLAVE_from_MASTER
# Purpose: Rebuilds live all Databases of a replication SLAVE based on a good running MASTER
# This rebuilt action is done without stopping the MASTER MySQL server or even locking the tables
# or the destination server (SLAVE)
# Syntax: MySQL_REBUILD_LIVE_SLAVE_from_MASTER Master:Port Slave:Port {F|P}
# eg. MySQL_REBUILD_LIVE_SLAVE_from_MASTER db1:3306 www1:3306 P
# {F|P} - Dump method F) Uses a full temporary dump file method P) Uses the pipe method
# Note: In some situation where the databases are very big (GB) the Pipe (P) method might not work well
# In these case it is recommended to use the File(F) method
# Changes: 05 Dec 2008 First implementation of script
# 10 Oct 2013 Added the File(F) transfer method, added --skip-lock-tables to the dump command, added the manual entry of root PW
# Notes 1: In the case of using the Dump file method, the file will be transferred via rsync to the slave for importing.
# Therefore the key of the user running the script must be already installed in slave before running the script
#-----------------------------------------------------------------------------------
# Some constants:
DUMPFILE="/tmp/mydump.sql"
# Resync the databases except the following Databases
EXCEPTIONS="information_schema mysql test"
#
# Functions
# Syntax: testhost addr port. Returns: hostOK=1
testhost () {
hostOK=0
if (nmap -n --host_timeout 1600 --max_rtt_timeout 1600 -p $2 -P0 $1 2>/dev/null | grep -q "open" &>/dev/null); then
hostOK=1
fi
};
#
usage () {
echo "ERROR: Somethig is wrong with the given arguments"
echo "Syntax: MySQL_REBUILD_LIVE_SLAVE_from_MASTER Master:Port Slave:port {F|P}"
echo " eg. MySQL_REBUILD_LIVE_SLAVE_from_MASTER master1:3306 slave1:3306 P"
exit 1
}
#
# Check the command syntax
if [ $# -ne 3 ]; then
usage
fi
#
# Get the mysql root password
read -s -p “Please enter the MySQL root password: ” rootpw
#
#- Check the hosts info validity
if ! (echo $1 | grep ':'); then
echo "ERROR: The 2nd parameter(master) must be the combination 'host:port'"
exit 3
fi
#
if ! (echo $2 | grep ':'); then
echo "ERROR: The third parameter must be the combination 'host:port'"
exit 4
fi
#
method=$3
#
# Check the hosts connectivity of master host
Mhost=$(echo $1 | cut -d: -f1)
Mport=$(echo $1 | cut -d: -f2)
#
testhost $Mhost $Mport
if [ $hostOK = "0" ]; then
echo "ERROR: The master $Mhost:$Mport does not respond"
exit 5
fi
#
# Check the hosts connectivity of slave host
#
Shost=$(echo $2 | cut -d: -f1)
Sport=$(echo $2 | cut -d: -f2)
#
testhost $Shost $Sport
if [ $hostOK = "0" ]; then
echo "ERROR: The slave $Shost:$Sport does not respond"
exit 6
fi
#
# Stop and reset the slave
echo "STOP SLAVE; RESET SLAVE;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw
#
databases=""
for DB in $(echo "show databases;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw | grep -v Database) ; do
# Only delete/add the databases that are not in the Exceptions list
if ! (echo $EXCEPTIONS | grep -q $DB); then
# here I was deleting the databases one by one before recreating them in slave
# I replaced this by the option --add-drop-database in mysqldump
#echo "Deleting database $DB on Slave $Shost:$Sport"
#echo "DROP DATABASE $DB;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw
# Corrected. Thanks to Dorian Kavaja
if [ "$databases" == "" ]; then
databases=$DB
else
databases=$databases’ ‘$DB
fi
fi
done
# The following lines is thanks to Dorian Kavaja
#because sometimes the log file on the master was changed before the dump was finished
echo “FLUSH LOGS;” | mysql –host=$Mhost –port=$Mport -u root –password=$rootpw
echo “RESET MASTER;” | mysql –host=$Mhost –port=$Mport -u root –password=$rootpw
#
# Now do the transfer
# Added “–routines –triggers” to the mysqldump command, so triggers and functions were included in the dump.
# Thanks to Dorian Kavaja
case $method in
P) # Transfer all databases from master to slave directly using a pipe(P)
echo "Transfering the all databases from master $Mhost:$Mport into slave $Shost:$Sport directly"
mysqldump -h $Mhost --port=$Mport -u root --password=$rootpw \
--single-transaction --flush-logs --master-data=2 --skip-lock-tables \
--add-drop-database --delete-master-logs --hex-blob –-routines –-triggers --databases $databases \
| mysql -h $Shost --port=$Sport -u root --password=$rootpw
;;
#
F) # Transfer the databases using a dump file
echo "Dumping the all databases from master $Mhost:$Mport into file $DUMPFILE"
mysqldump -h $Mhost --port=$Mport -u root --password=$rootpw \
--single-transaction --flush-logs --master-data=2 --skip-lock-tables \
--add-drop-database --delete-master-logs --hex-blob –-routines –-triggers --databases $databases > $DUMPFILE
#
echo "Transferring the dump file $DUMPFILE from Master $Mhost to slave $Shost via compressed rsync"
rsync -vz $DUMPFILE $Shost:$DMPFILE
echo "Importing the dump file ($DUMPFILE) into slave MySQL server $Shost"
ssh $Shost "mysql -h $Shost --port=$Sport -u root --password=$rootpw < $DUMPFILE"
;;
#
*) usage ;;
esac
#
# Find out the master binlog file name
masterlogfile=$(echo "SHOW MASTER STATUS\G;" | mysql --host=$Mhost --port=$Mport -u root --password=$rootpw | grep "File:" | cut -d: -f2 | cut -d" " -f2)
#
# Sync the slave with master binlog position 4
echo "CHANGE MASTER TO master_log_file='$masterlogfile',master_log_pos=4;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw
#
# Start the replication on slave
echo "START SLAVE;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw
sleep 3
#
# Show slave status to see if all is in sync
echo "SHOW SLAVE STATUS \G;" | mysql -h $Shost --port=$Sport -u root --password=$rootpw

Extra Note: In order to avoid a stop of replication when duplicate entries error ocurs, it was suggested to add the following configuration in /etc/mysql/my.cfg
slave-skip-errors=1045,126,1062,1580

Happy replication resycing.

Reader's Comments

  1.    

    Thanks for this useful script!

    Everything is very clear, but I don’t understand why you set the master_log_pos=4, could you explain this?

    Reply to this comment
    •    

      I set the start pointer to position 4 because as I read about it in Internet (can’t remember where 🙁 ) the start position of a band new master binary logs is position 4 and not 0 or 1.

      Reply to this comment
  2.    

    First of all thank you for this awesome script. I have large DB with high writes and this script was very useful because it allowed me to re-sync the replication without downtime from the master. But some things were not working on Ubuntu so please allow me some modifications and maybe some improvements.

    Line 43, changed to:
    read -s -p “Please enter the MySQL root password: ” rootpw

    before it was not working, this way it also hides the password

    On line 98 I’ve added these:
    echo “FLUSH LOGS;” | mysql –host=$Mhost –port=$Mport -u root –password=$rootpw
    echo “RESET MASTER;” | mysql –host=$Mhost –port=$Mport -u root –password=$rootpw

    because sometimes the log file on the master was changed before the dump was finished

    Added “–routines –triggers” to the mysqldump command, so triggers and functions were included in the dump.

    On line 90, where the databases list is populated, it becomes like:
    if [ “$databases” == “” ]; then
    databases=$DB
    else
    databases=$databases’ ‘$DB
    fi

    The comma separated list is no supported with mysqldump and the strings comparison was not working.

    Also, I was inclined on retrieving the “masterlogfile” before the whole duplication process, as it may still occur that the log file changes before the duplications completes, but this is just a safety measure.

    Reply to this comment
    •    

      Hi Dorian,
      thanks so much for the tips and extending the effectiveness of the script. I made the changes but didn’t test them myself. I rely on your judgement since you already did the testing in your own system. I like when people add to such posts like this.

      Reply to this comment
      •    

        One more note, since I already used the modified version on our server, people should expect replication errors after the sync is finished. It took us about an hour for this process and the dump already contained some of the inserts/updates which later are to be executed by the binlog. Just do some error skipping and at the end you have a synced replication.

        Important note: If there are tables without primary keys, they might end with duplicated rows. This was not our case, but just to be clear as this might have consequences on the data integrity.

        Reply to this comment
  3.    

    You might consider adding mysql error#1242 to the list of skipped errors. in my case this was causing replication to fail but the original query on the master would have failed anyway, Keeping in mind that mysql will right all statements to the bin-log that *could* have updated data

    Reply to this comment
  4.    

    Wow. this script has a LOT of syntax issues.

    Reply to this comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: