Postfix: Backup MX

In mail-servers architecture We should always have some backup MX defined for every mail server. It’s very simple why – to have a redundant mail-server architecture and just to be sure, that no emails are returned with an error while our mail-server is having issues.

In the simplest scenario let’s assume that We have only one mail server (mail.somedomain.com). We’d like to start a backup MX server for this. We can do this in a few simple steps:

Step 1: Backup postfix configuration

On the backup server We should change some postfix configuration in main.cf file. We should add / change relay_domains, set maximal_queue_lifetime, smtpd_recipient_restrictions and We should create relay_recipient_maps:

relay_recipient_maps = hash:/etc/postfix/relay_recipients
maximal_queue_lifetime = 30d
smtpd_recipient_restrictions =
       [...]
       permit_mx_backup
relay_domains = $mydestination somedomain.com
permit_mx_backup_networks = 128.128.128.0/24 201.201.201.0/24

Now let’s explain the following configuration:

  • relay_recipient_maps = hash:/etc/postfix/relay_recipients – this is optional, but I advise to use this parameter. It defines a hash table containing valid recipients. If the backup system wouldn’t know all the valid mailboxes it would have to accept all the emails – including spam for non-existing adresses. With knowledge of legal addresses backup server is able to bounce back emails that have invalid recipient set. This does not apply in environment using catchall mailboxes to catch all the emails. I attached a sample relay_recipients file below. Remember to use postmap command after every change in this file: postmap /etc/postfix/relay_recipients
  • maximal_queue_lifetime = 30d – default value for Postfix is 5 days. This number sets the time period in which backup server will try to deliver emails to the main server – so this is maximum time of downtime for main server until mails are bounced back to their original senders with an error.
  • relay_domains = $mydestination somedomain.com – this parameter will allow postfix to relay emails for somedomain.com
  • permit_mx_backup – security, see http://www.postfix.org/postconf.5.html#permit_mx_backup
  • permit_mx_backup_networks – security, see http://www.postfix.org/postconf.5.html#permit_mx_backup_networks

And the sample relay_recipients_file:

user1@somedomain.com   any_value
user2@somedomain.com   any_value
user3@somedomain.com   any_value
user4@somedomain.com   any_value
user5@somedomain.com   any_value

So as You see – You should have replicated users addresses on the MX server in the relay_recipients file.

Step 2: DNS configuration

Having only one mail server it is enough to have only one MX record in our DNS zone file:

[user@server ~]# dig mx somedomain.com
;; ANSWER SECTION:
somedomain.com.		86400	IN	MX	10 mail.somedomain.com.

Here We see our only MX record with 10 priority pointing to the A record mail.somedomain.com. In order to create a new record for our backup MX server We should first add a new A record, like:

mail2.somedomain.com.  	86400	IN	A	129.129.129.129

And then We can create a new MX record with lower priority:

mail.somedomain.com.	86400	IN	MX	20 mail2.somedomain.com.

Step 3: Flushing messages

When main MX server is down, and backup server gets some messages to hold those until main server is back – It moves those messages immediately to the flush queue. Now those messages can be delivered via flush daemon, which is run every some time (set in /etc/postfix/master.cf):

flush     unix  n       -       n       1000?   0       flush

Here the “1000?” stands for 1000 seconds every which flush daemon is activated (until it is not already running – this is why we use here question mark after 1000).

Now we can set how often messages should be flushed via the running flush daemon using the fast_flush_refresh_time param (default set to 12h). So every 12h messages that haven’t had redelivery requested are being kicked automaticly.

When our master server is back We could just flush all the messages manually:

postqueue -f

But above command will flush all the messages in the flush queue – this might not be the best solution as the backup MX can be a slave for a bunch of main MX servers – are you sure You would like to flush all those messages from all those servers when only one is back online?

Better solution is to use:

postqueue -s somedomain.com

Above command will flush only the messages from the given domain – and that’s what We would like to do. But We have to know, that We can use this command only when We have this domain configured as “fast_flush_domains”. Again – We’re lucky, because default “fast_flush_domains” value is:

fast_flush_domains = $relay_domains

And If We configured our somedomain.com as “$relay_domain” – then our flush command will work :) If not then We only have to set:

fast_flush_domains = $relay_domains somedomain.com

And when our main MX comes back again – We can flush this domain on the backup MX – it’s good to be wrapped with some script :)

And We’re good to go – from now (after correct DNS entries’ propagation, so in max 72 hours) our backup MX should work and receive emails when master mail server is offline.

MySQL statement based replication with triggers, events, procedures, functions and variables

Before starting statement based replication in MySQL database We have to be aware of some specific behaviors for this environment. This kind of replication (statement based) writes each query that modifies data to the Binary Log in order to replicate them on the slave or to use as a point-in-time recovery (PITR). Because of this kind of query logging We should be aware how MySQL replication engine behaves with some special queries like triggers, functions, procedures or events.

Functions

Function calls are logged directly to Binary Log, so If You forget to create on slave any function that is created on master – You will break your replication and probably You’ll see error like below:

Last_Error: Error 'FUNCTION postfix.recount_quota not exist' on query. Default database: 'postfix'. Query: 'UPDATE user_imap SET quota=(recount_quota())'

When promoting slave to master no additional steps according to functions are required - everything is needed is having functions defined in both: master and slave.

Procedures

Procedure calls are not replicated as in functions - this is important to know. Only the queries inside the procedures get logged to the Binary Log, so You don't have to create procedures on slaves.

In order to promote slave to master you should have procedures created on slave - so it is wise to have all the procedures created on both - master and slaves.

Events

Events created on master server get replicated to the slave with the DISABLE ON SLAVE option - that's why those events are not reexecuted on every slave in our MySQL architecture and we have no duplicated and corrupted data. MySQL logs only queries from inside the event so only those queries are replicated via Binary Log.

In order to promote slave to master according to events we have to do some more job. I've created a simple event below (We create it on the master):

mysql> CREATE EVENT mysql_heartbeat
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 MINUTE
DO INSERT UPDATE `mysql_stat`.`heartbeat` SET `last`=CURTIME();

Now it's replicated on slave via Binary Log - below I've placed replication entry for that event from Binary Log:

CREATE DEFINER=`user`@`localhost` EVENT `mysql_heartbeat` ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 MINUTE DO INSERT UPDATE `mysql_stat`.`heartbeat` SET `last`=CURTIME();

And how it looks like after replication on the slave:

CREATE DEFINER=`user`@`localhost` EVENT `mysql_heartbeat` ON SCHEDULE AT '2012-01-06 21:12:56' ON COMPLETION NOT PRESERVE DISABLE ON SLAVE DO INSERT UPDATE `mysql_stat`.`heartbeat` SET `last`=CURTIME();

So now with this knowledge a little procedure to promote slave to master using events:

  • Disabling event manager on slave with SET GLOBAL event_scheduler = OFF;
  • Enabling all the events with ALTER EVENT `event_name` ENABLE - We have to do this for each event, so writing a little script is very helpful here.
  • Enabling event manager with SET GLOBAL event_scheduler = ON;

In order to demote back the master to slave You should follow the previous procedure with a little change on ALTER EVENT - here You just need to DISABLE all the events (not ENABLE).

Triggers

In order to have triggers running properly on master and slaves You have to define them in both - master and slave servers. MySQL statement based replication replicates only the original query to the Binary Log - not the subsequent triggered statements.

When promoting slave to master no additional steps according to triggers are required - everything is needed is having triggers defined in both: master and slave.

Mixed triggers / procedures / functions calls

Let's imagine that We have a trigger, that triggers a procedure which uses a function call. How will this behave in statement based replication?

  1. We should have trigger defined on both: master and slave
  2. We don't have to have procedure defined on the slave - only on master is enough
  3. We should have function defined on both: master and slave

Despite of all - my advice is to keep function, triggers, procedures and events defined on all the servers (masters and slaves) - just to be sure, that We can always promote slave to master without any issues.

And one more thing before finishing this post. If You plan to start replication with just copying FRM, MYI, MYD and InnoDB files You should also dump any functions / triggers and stored procedures on master (or slave) and then import those on the new slave. You can do it (for every database) with:

mysqldump --routines --no-create-info --no-data --no-create-db --skip-opt <database> > dumpfile.sql

And recreate those on the new box:

mysql <database> < dumpfile.sql

MySQL tunneling via SSH and error “channel: open failed: connect failed: Connection refused”

Lately I wrote a short article about MySQL tunneling via SSH in order to start safe MySQL replication. Afterwards I noticed some problems with creating a new SSH tunnel for MySQL connection on a quite different environment. After creating SSH tunnel and trying to connect via this tunnel to the SSH server I received SSH error on tunnel error-log:

channel 2: open failed: connect failed: Connection refused

or:

channel 3: open failed: connect failed: Connection refused

And below:

ERROR 2013 (HY000): Lost connection to MySQL server during query

in the MySQL terminal.

First of all We have to make sure, that our tunnel is working properly, so We just kill the current tunnel and create new one without "-f" and "-N" options:

ssh -p 2345 mysql_tunnel@mysqlmaster-server.com -L 4406:mysqlmaster-server.com:3306

If everything is ok, then We can assume that tunnel is working fine. We can also try to create another tunnel to some other service on different target port and then just try if this other service is working via the tunnel - just to exclude any problems with SSH tunneling.

My problem was that MySQL was configured in the way it was blocking any connections outside localhost. It is default MySQL configuration - We can achieve it via my.cnf entries:

bind-address = 127.0.0.1

or:

skip-networking

So in order to make our MySQL accessible via our tunnel We have to comment out the skip-networking line and make sure that We are connecting to the correct IP addr in our tunnel. For example If we have in our my.cnf this line:

bind-address = 127.0.0.1

Then our tunnel should look like:

ssh -p 2345 -f mysql_tunnel@mysqlmaster-server.com -L 4406:127.0.0.1:3306 -N

(notice that 127.0.0.1 in the above command).

If We would bind our MySQL to some other IP, like:

bind-address = 192.168.0.12

Then We should change our tunneling parameters:

ssh -p 2345 -f mysql_tunnel@mysqlmaster-server.com -L 4406:192.168.0.12:3306 -N

After commenting out that skip-networking our security depends on IP address We are binding the MySQL to. If it's local IP addres in DMZ, than there is no security breaches here. Unwise would be to bind to the WAN address and leave MySQL port opened without any SSL encryption or without filtering traffic by the client IP addr...