NONCOMPLIANT  $Id: NONCOMPLIANT,v 1.2 2005/06/10 08:33:26 andrewm Exp $

This file is the hall of shame for software that doesn't honour DNS TTL values.
This means that they will keep old values indefinitely, and try to contact the
wrong site.  It's not your fault.  Buggy software like this should be fixed.

 - Various HTTP proxies
 - Kerio Mailserver (tested Linux, version 6.0.9)
 - Microsoft Exchange 5.5
 - nscd with "enable-cache  hosts  no" (Linux)
 - Sun java --rumour (versions?)





VARIOUS (UNIDENTIFIED) HTTP PROXIES

  The violation of DNS TTLs by HTTP Proxy software and ISPs ("On the
  Responsiveness of DNS-based Network Control")

  http://www.imconf.net/imc-2004/papers/p21-pang.pdf





KERIO MAIL SERVER kerio-mailserver-6.0.9-linux

Summary: Ignores DNS TTL, caches record indefinitely (it seems)
Mitigation: None, it seems
Aggravation: They have had DNS poision problems on their other products
Workaround: Restart kerio mail server
Workaround: ipconfig /flushdns (on windows -- untested -- let me know if it
	works)
Fix available: Fixed in version 6.1 (available circa July 2005)

Here's how we know.  We fire up kerio mail server, running strace:

strace -feconnect,recv,recvfrom,send,sendto /opt/kerio/mailserver/mailserver

> Process 12727 attached
> Process 12728 attached
> [pid 12727] recvfrom(32, 0xbfffe900, 512, 0, 0xbfffeb00, 0xbfffe8fc) = -1 EAGAIN (Resource temporarily unavailable)
> Process 12729 attached
> Process 12730 attached
> Process 12731 attached
> Process 12732 attached
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12727] recvfrom(32, 0xbfffee30, 512, 0, 0xbffff030, 0xbfffee2c) = -1 EAGAIN (Resource temporarily unavailable)
> Process 12733 attached

we send it a test mail:
> [pid 12733] send(47, "220 crystal Kerio MailServer 6.0"..., 48, 0) = 48
> [pid 12733] recv(47, "EHLO first-time.test\r\n", 4096, 0) = 22
> [pid 12733] send(47, "250-crystal\r\n250-AUTH CRAM-MD5 P"..., 160, 0) = 160
> [pid 12733] recv(47, "MAIL FROM: <xxxrew@ledge.co.za>\r"..., 4096, 0) = 33
> [pid 12733] sendto(48, "\0\1\1\0\0\1\0\0\0\0\0\0\5ledge\2co\2za\0\0\17\0\1", 29, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, 16) = 29
> [pid 12733] recvfrom(48, "\0\1\205\200\0\1\0\1\0\1\0\2\5ledge\2co\2za\0\0\17\0\1"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, [16]) = 103
> [pid 12733] send(47, "250 2.1.0 Sender <xxxrew@ledge.c"..., 42, 0) = 42
> [pid 12733] recv(47, "RCPT TO: <test@sgroup.co.za>\r\n", 4096, 0) = 30
> [pid 12733] send(47, "250 2.1.5 Recipient <test@sgroup"..., 53, 0) = 53
> [pid 12733] recv(47, "DATA\r\n", 4096, 0) = 6
> [pid 12733] send(47, "354 Enter mail, end with CRLF.CR"..., 36, 0) = 36
> [pid 12733] recv(47, "Subject: first time\r\n", 4096, 0) = 21
> [pid 12733] recv(47, "\r\n", 4096, 0)   = 2
> [pid 12733] recv(47, "Fri Apr 15 16:02:49 SAST 2005\r\n", 4096, 0) = 31
> [pid 12733] recv(47, ".\r\n", 4096, 0)  = 3
> [pid 12733] recvfrom(32, 0x414a5e20, 512, 0, 0x414a6020, 0x414a5e1c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12733] send(47, "250 2.0.0 425fcb3c-00000000 Mess"..., 59, 0) = 59
> [pid 12733] recv(47, Process 12734 attached
>  <unfinished ...>

Kerio looks up the MX record for sgroup.co.za, and then looks up
dddns.sgroup.co.za:
> [pid 12734] sendto(48, "\0\2\1\0\0\1\0\0\0\0\0\0\6sgroup\2co\2za\0\0\17\0\1", 30, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, 16) = 30
> [pid 12734] recvfrom(48, "\0\2\201\200\0\1\0\2\0\7\0\7\6sgroup\2co\2za\0\0\17\0\1"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, [16]) = 381
> [pid 12734] sendto(48, "\0\3\1\0\0\1\0\0\0\0\0\0\5dddns\6sgroup\2co\2za\0"..., 36, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, 16) = 36
> [pid 12733] <... recv resumed> "quit\r\n", 4096, 0) = 6
> [pid 12733] send(47, "221 2.0.0 SMTP closing connectio"..., 35, 0) = 35
> Process 12733 detached
> [pid 12734] recvfrom(48, "\0\3\201\200\0\1\0\1\0\2\0\0\5dddns\6sgroup\2co\2za\0"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, [16]) = 88
It's 165.146.* or whatever.  The TTL is 10 seconds.

The mail is delivered to the address:
> [pid 12734] connect(47, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("165.146.101.200")}, 16) = 0
> [pid 12734] recv(47, "220 default-gw.internal.sgroup.c"..., 4096, 0) = 52
> [pid 12734] send(47, "EHLO crystal\r\n", 14, 0) = 14
> [pid 12734] recv(47, "250-default-gw.internal.sgroup.c"..., 4096, 0) = 107
> [pid 12734] send(47, "MAIL FROM:<xxxrew@ledge.co.za> S"..., 41, 0) = 41
> [pid 12734] recv(47, "250 Ok\r\n", 4096, 0) = 8
> [pid 12734] send(47, "RCPT TO:<test@sgroup.co.za>\r\n", 29, 0) = 29
> [pid 12734] recv(47, "250 Ok\r\n", 4096, 0) = 8
> [pid 12734] send(47, "DATA\r\n", 6, 0)  = 6
> [pid 12734] recv(47, "354 End data with <CR><LF>.<CR><"..., 4096, 0) = 37
> [pid 12734] send(47, "Received: from first-time.test ("..., 201, 0) = 201
> [pid 12734] recv(47, "250 Ok: queued as ED6F166F4\r\n", 4096, 0) = 29
> [pid 12734] send(47, "QUIT\r\n", 6, 0)  = 6
> [pid 12734] recv(47, "221 Bye\r\n", 4096, 0) = 9
> Process 12734 detached
> Process 12735 attached

We wait 1 minute and then send another mail:
> [pid 12735] send(47, "220 crystal Kerio MailServer 6.0"..., 48, 0) = 48
> [pid 12735] recv(47, "EHLO second-time\r\n", 4096, 0) = 18
> [pid 12735] send(47, "250-crystal\r\n250-AUTH CRAM-MD5 P"..., 160, 0) = 160
> [pid 12735] recv(47, "MAIL FROM: <xxxrew+secondtime@le"..., 4096, 0) = 44
> [pid 12735] sendto(48, "\0\4\1\0\0\1\0\0\0\0\0\0\5ledge\2co\2za\0\0\17\0\1", 29, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, 16) = 29
> [pid 12735] recvfrom(48, "\0\4\205\200\0\1\0\1\0\1\0\2\5ledge\2co\2za\0\0\17\0\1"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.0.12")}, [16]) = 103
even though the ledge.co.za MX record has not timed out, it is rechecked.

> [pid 12735] send(47, "250 2.1.0 Sender <xxxrew+secondt"..., 53, 0) = 53
> [pid 12735] recv(47, "RCPT TO: <test@sgroup.co.za>\r\n", 4096, 0) = 30
> [pid 12735] send(47, "250 2.1.5 Recipient <test@sgroup"..., 53, 0) = 53
> [pid 12735] recv(47, "DATA\r\n", 4096, 0) = 6
> [pid 12735] send(47, "354 Enter mail, end with CRLF.CR"..., 36, 0) = 36
> [pid 12735] recv(47, "Subject: again - second time\r\n", 4096, 0) = 30
> [pid 12735] recv(47, "\r\n", 4096, 0)   = 2
> [pid 12735] recv(47, "blah\r\n", 4096, 0) = 6
> [pid 12735] recv(47, ".\r\n", 4096, 0)  = 3

Now Kerio has the second mail.  Note that it sends no further DNS requests:
> [pid 12735] recvfrom(32, 0x416a6e20, 512, 0, 0x416a7020, 0x416a6e1c) = -1 EAGAIN (Resource temporarily unavailable)
> [pid 12735] send(47, "250 2.0.0 425fcbad-00000001 Mess"..., 59, 0) = 59
> [pid 12735] recv(47, Process 12736 attached
>  <unfinished ...>

It simply re-uses the (expired) record:
> [pid 12736] connect(48, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("165.146.101.200")}, 16 <unfinished ...>
> [pid 12735] <... recv resumed> "quit\r\n", 4096, 0) = 6
> [pid 12735] send(47, "221 2.0.0 SMTP closing connectio"..., 35, 0) = 35
> Process 12735 detached
> [pid 12736] <... connect resumed> )     = 0
> [pid 12736] recv(48, "220 default-gw.internal.sgroup.c"..., 4096, 0) = 52
> [pid 12736] send(48, "EHLO crystal\r\n", 14, 0) = 14
> [pid 12736] recv(48, "250-default-gw.internal.sgroup.c"..., 4096, 0) = 107
> [pid 12736] send(48, "MAIL FROM:<xxxrew+secondtime@led"..., 52, 0) = 52
> [pid 12736] recv(48, "250 Ok\r\n", 4096, 0) = 8
> [pid 12736] send(48, "RCPT TO:<test@sgroup.co.za>\r\n", 29, 0) = 29
> [pid 12736] recv(48, "250 Ok\r\n", 4096, 0) = 8
> [pid 12736] send(48, "DATA\r\n", 6, 0)  = 6
> [pid 12736] recv(48, "354 End data with <CR><LF>.<CR><"..., 4096, 0) = 37
> [pid 12736] send(48, "Received: from second-time ([10."..., 181, 0) = 181
> [pid 12736] recv(48, "250 Ok: queued as 701D36933\r\n", 4096, 0) = 29
> [pid 12736] send(48, "QUIT\r\n", 6, 0)  = 6
> [pid 12736] recv(48, "221 Bye\r\n", 4096, 0) = 9
> Process 12736 detached
> Process 12728 detached
> Process 12727 detached
> Process 12729 detached
> Process 12730 detached
> Process 12731 detached
> Process 12732 detached





NSCD WITH "ENABLE-CACHE HOSTS NO" (er... affects what?)

Summary: Linux software might cache things too long
Mitigation: Only for 1 hour
Workaround: Disable hosts caching
Fix available: Maybe (some say yes and some say no)

NSCD, the name services cache daemon, caches information about users, groups,
and network hosts.

  http://archives.neohapsis.com/archives/postfix/2001-03/1346.html says

  Last I heard, direct from a developer, nscd did not do many things 
  properly with DNS lookups either (including negative caching, but also 
  other things w.r.t. TTL, etc.). Word was that it was unlikely to ever 
  be fixed, but that was a release or so ago when I heard that.... Things 
  may have gotten worse since then! :-)
  
  So, nscd should *NEVER* be used for hostnames on a machine that 
  interacts with the public DNS. ...
  
If software trying to contact you is using nscd to resolve your host name, then
their settings probably look like this in /etc/nscd.conf:
        enable-cache            hosts           yes   # DEFAULT IS "no"
	positive-time-to-live   hosts           3600

So, TTL is discarded, and a fixed value of 1 hour is used.

enable-cache for hosts is somewhat rare, and a configuration error, since the
feature has never been fixed.





SUN JAVA (unconfirmed)

Summary: Java apps using the standard APIs keep DNS forever (like kerio)
Mitigation: unconfirmed
Workaround: Huh?
Fix: If you're lucky

http://ask.slashdot.org/article.pl?sid=05/04/18/198259 says ...

	The Sun JVM implementation implements it's own DNS caching for any name
	resolution done by the networking APIs. By default the TTL for cached
	entries is... FOREVER. Not only that, but they will cache NEGATIVE
	LOOKUPS, so that if your resolution fails the first time it will fail
	forever.

	The only solution is to restart your app (duh) or set the TTL as a
	system property on JVM startup.

	I personally spent a few minutes staring at the monitor in shock when I
	first found this behavior by debugging a problem all the way down to
	the Java API source. Boggles the mind. Everyone else I've read who've
	'discovered' this little known problem have had similar reactions.

	This is unrelated to the TTL issue discussed in the article, but I try
	to take every opportunity possible to scream 'WTF Sun!?!?'





MICROSOFT EXCHANGE 5.5

Summary: Ignores TTL
Mitigation: If the server's busy, it discards "older" DNS records too
Workaround: Hack the registry
Fix available: Yes (if you get a service pack)

	http://support.microsoft.com/default.aspx?scid=KB;en-us;285023&

	"In Exchange Server 5.5, the Internet Mail Service does not follow the
	Time to Live (TTL) value in Domain Name Service (DNS) records. When the
	Internet Mail Service queries DNS for connections, the Internet Mail
	Service caches up to 1,000 domains (by default), irrespective of the
	TTL for those records."

    ie. mail delivery to DDDNS from MS Exchange 5.5 may be erratic if the
    exchange server is mostly idle and the DDDNS IP changes frequently.  As
    usual for Microsoft's software, there is a special registry setting that
    enables the correct behaviour (while introducing other subtle bugs):
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MSExchangeIMC\Parameters
    EnableTTL = 1

