Tag Archives: dns

DNS resolution problem – dig working, ping not

Today I was reconfiguring my internal laptop network as I use virtual machines a lot (KVM ftw) and using /etc/hosts was not scaling anymore. I could use DNSmasq but I prefer BIND – so I installed & chrooted it, and configured as caching-name server that properly resolv my internal zone ‘local’ (as I use adresses like ‘git.local’ or ‘dev1.local’ or whatever). Next I had to make NetworkManager to use this local DNS in the first place instead of those given from DHCP. I could of course edit properly /etc/resolv.conf and protect it with immutable attribute but I suppose, that NM developers didn’t take it into consideration, that resolv.conf would be unwritable and hell knows what would happen then. So I added:

To the /etc/sysconfig/network-scripts/ifcfg-Auto_WLANname and things were ok – after restarting network I had what i wanted to have in /etc/resolv.conf. I resolved VPN resolving problem in a similar way.

So now I wanted to start my work with VMs and it appeared that I couldn’t make a connection to any of those:

Weird… And check this:

So WTF with this DNS?

This time WTF was much bigger. Local DNS appears to be working correctly. So I thought that this Fedora claimed that won’t be resolving ‘local’ adresses via DNS. Just to confirm this idea I used tcpdump:  “tcpdump -n port 53″ – in the meantime trying to ping ‘git.local’ host. And nothing there – tcpdump was silent. bingo – Fedora was not using DNS at all to resolve this one. So why? Let’s see:

Ok – we have nscd in the first way (which is not running on my laptop) and next we have Avahi… but where the hell is DNS? Let’s see /etc/nsswitch.conf:

Ha – now everything is clear! You can read about Avahi, MDNS and ‘local’ domains here: http://avahi.org/wiki/AvahiAndUnicastDotLocal

Solution? There are two. Firstly we could just replace above nsswitch.conf entry with the following (of course only when NOT using Avahi):

Second solution – we could reconfigure Avahi – just as You can read in the above URL:

Now only restart Avahi, web browsers and everything should be working fine.

Digging know how

Dig is one of the most important tools that every sysop uses in day-to-day work. It gives us the possibility to trace resolving path of domains, checking status of domain records or even getting whole definitions of records for a particular domain (under some circumstances…). We won’t write here an essay about DNS and it’s functionality – You san always read it here: http://en.wikipedia.org/wiki/Domain_Name_System or even better, here: https://webhostinggeeks.com/guides/dns/

Assuming, that You ave already dig installed (if not – try yum install bind-utils on CentOs or whatever else on different distros) We can start with explaining how to use properly dig.

1. Simple query

Let’s check Gamedesire’s domain www.gamedesire.com:

Above We can see a bunch of data.. “status: NOERROR” is answer status – NOERROR means, that query was resolved properly. ANSWER SECTION contains of information about resolved records (here We can see 2 A records for www.gamedesire.com: 174.123.95.100 and 174.123.95.101). Next We have AUTHORITY SECTION which gives us an answer to the question: “what are the authoritative DNS servers for this domain?”. We have also some statistic data (query time, ADDITIONAL SECTION with DNS servers info etc). And remember to check answer flags (in above example: qr rd ra). You can find explanation of those in RFC 1035 §4.1.1..

What happens when domain name is not resolved properly? Let’s try with non-existing sysop.gamedesire.com:

As we can see the answer status is NXDOMAIN here – it means NoneXistingDomain.

2. Querying specific records

There are plenty of DNS record types. Most commonly used are A, MX, TXT and CNAME. Here is the explanation and complete list of those types: http://en.wikipedia.org/wiki/List_of_DNS_record_types. So now – how can We query for a MX domain?

So We see above in ANSWER SECTION all found MX records. We can ask for any record type with this method.

3. Tracing DNS query

Every DNS query takes some hierarchical steps. Knowing those could be very helpful under some circumstances:

We see above all the DNS query steps – first question to the root servers for a proper TLD DNS server (org), next the question to the TLD org’s server for a proper wikipedia.org NS servers, and then the question to the wikipedia’s NS servers for a resolution to the name ‘wikipedia.org’ which appears to be a CNAMEs for something more…

4. Shortening the output

Digging is a quite verbose action – by design – much verbosity is good for debugging purposes. It’s good to know how can We reduce output of this command – for example when We would like to wrap dig command with some monitoring script. For this purpose I suggest to get known with:

Let’s try with the strongest one from above – +short will reduce all the dig’s “noise”:

We can join +short with eg. “+trace”:

You can try Yourself with other params.

5. Asking specific DNS server

It is very good practice during checking DNS resolutions (especially while transferring domains etc) to ask query to a couple of DNS servers. We can ask particular DNS server (but only when this server allows us to do so). Remember that by default dig uses DNSes listed in Your /etc/resolv.conf file. Let’s try to ask google’s DNSes first:

And now some ThePlanet’s:

6. The authority

In this example:

We can see, that AA bit is set here (flags: qr aa rd). So ns1.theplanet.com is authoritative for gamedesire.com domain (as in AUTHORITY section). Now let’s try to dig it again using some other DNS server:

Here We can see, that there is no AA bit set, and also TTL values are lower than 86400 for www.gamedesire.com (We can see 19128 value). Why is that? Because this answer is cached somewhere in the middle and our current DNS server is not authorative for gamedesire.com . If We would repeat this query this TTL value would be dropping every each question. We can’t tell looking at above dig output where this query was cached – to know this We would have to repeat the query with recursion disabled and step manually through all the DNS tree (but in 9/10 cases It will be You local DNS cache… which is caching answers).

7. Tracing DIG execution

As I wrote above We can set +trace param using dig to trace resolving path in the DNS tree. But how can We trace what exact queries are sent and received? Surely with tcpdump:

From the tcpdump(8) manpage: Name server requests are formatted as:

I rather suggest reading this manpages Yourself – just look for “UDP Name Server Requests” section.

Postfix: Backup MX

In mail-servers architecture We should always have some backup MX defined for every mail server. It’s very simple why – to have a redundant mail-server architecture and just to be sure, that no emails are returned with an error while our mail-server is having issues.

In the simplest scenario let’s assume that We have only one mail server (mail.somedomain.com). We’d like to start a backup MX server for this. We can do this in a few simple steps:

Step 1: Backup postfix configuration

On the backup server We should change some postfix configuration in main.cf file. We should add / change relay_domains, set maximal_queue_lifetime, smtpd_recipient_restrictions and We should create relay_recipient_maps:

Now let’s explain the following configuration:

  • relay_recipient_maps = hash:/etc/postfix/relay_recipients – this is optional, but I advise to use this parameter. It defines a hash table containing valid recipients. If the backup system wouldn’t know all the valid mailboxes it would have to accept all the emails – including spam for non-existing adresses. With knowledge of legal addresses backup server is able to bounce back emails that have invalid recipient set. This does not apply in environment using catchall mailboxes to catch all the emails. I attached a sample relay_recipients file below. Remember to use postmap command after every change in this file: postmap /etc/postfix/relay_recipients
  • maximal_queue_lifetime = 30d – default value for Postfix is 5 days. This number sets the time period in which backup server will try to deliver emails to the main server – so this is maximum time of downtime for main server until mails are bounced back to their original senders with an error.
  • relay_domains = $mydestination somedomain.com – this parameter will allow postfix to relay emails for somedomain.com
  • permit_mx_backup – security, see http://www.postfix.org/postconf.5.html#permit_mx_backup
  • permit_mx_backup_networks – security, see http://www.postfix.org/postconf.5.html#permit_mx_backup_networks

And the sample relay_recipients_file:

So as You see – You should have replicated users addresses on the MX server in the relay_recipients file.

Step 2: DNS configuration

Having only one mail server it is enough to have only one MX record in our DNS zone file:

Here We see our only MX record with 10 priority pointing to the A record mail.somedomain.com. In order to create a new record for our backup MX server We should first add a new A record, like:

And then We can create a new MX record with lower priority:

Step 3: Flushing messages

When main MX server is down, and backup server gets some messages to hold those until main server is back – It moves those messages immediately to the flush queue. Now those messages can be delivered via flush daemon, which is run every some time (set in /etc/postfix/master.cf):

Here the “1000?” stands for 1000 seconds every which flush daemon is activated (until it is not already running – this is why we use here question mark after 1000).

Now we can set how often messages should be flushed via the running flush daemon using the fast_flush_refresh_time param (default set to 12h). So every 12h messages that haven’t had redelivery requested are being kicked automaticly.

When our master server is back We could just flush all the messages manually:

But above command will flush all the messages in the flush queue – this might not be the best solution as the backup MX can be a slave for a bunch of main MX servers – are you sure You would like to flush all those messages from all those servers when only one is back online?

Better solution is to use:

Above command will flush only the messages from the given domain – and that’s what We would like to do. But We have to know, that We can use this command only when We have this domain configured as “fast_flush_domains”. Again – We’re lucky, because default “fast_flush_domains” value is:

And If We configured our somedomain.com as “$relay_domain” – then our flush command will work :) If not then We only have to set:

And when our main MX comes back again – We can flush this domain on the backup MX – it’s good to be wrapped with some script :)

And We’re good to go – from now (after correct DNS entries’ propagation, so in max 72 hours) our backup MX should work and receive emails when master mail server is offline.