{"id":979,"date":"2015-04-07T15:49:34","date_gmt":"2015-04-07T13:49:34","guid":{"rendered":"\/\/www.mcgill.org.za\/stuff\/?p=979"},"modified":"2015-04-07T15:49:34","modified_gmt":"2015-04-07T13:49:34","slug":"five-diagnostic-questions-for-email","status":"publish","type":"post","link":"https:\/\/www.mcgill.org.za\/stuff\/archives\/979","title":{"rendered":"Five diagnostic questions for email"},"content":{"rendered":"<p>I used to face a lot of problems with e-mail of this sort:<\/p>\n<ul>\n<li>The e-mail is not working<\/li>\n<li>Person X is not getting an important mail<\/li>\n<li>There&#8217;s something wrong with the mail system<\/li>\n<li>Please check that the mail servers are working &#8211; people are saying something&#8217;s wrong<\/li>\n<li>Nobody can send mail<\/li>\n<li>Nobody is getting mail<\/li>\n<li>Person X is getting spam<\/li>\n<li>Company X says they can&#8217;t send mail to any of our servers<\/li>\n<\/ul>\n<p>Eventually I figured out what specific information actually helps to get to the bottom of the problem &#8211; and it is these five things:<\/p>\n<ol>\n<li><strong>Sender<\/strong> e-mail address<\/li>\n<li><strong>Recipient<\/strong> e-mail address<\/li>\n<li><strong>Time<\/strong> of sending the mail<\/li>\n<li>SMTP <strong>server<\/strong> that handled the mail<\/li>\n<li>What was the <strong>error<\/strong><\/li>\n<\/ol>\n<p>Experience says that without all of these bits of information, or a reasonable facsimile for each, you have a pretty good chance of being led a merry dance &#8211; fishing for a problem, and maybe finding it, but probably not.\u00a0 (The not-so handy acronym &#8220;STRES&#8221; is not in the dictionary, and really doesn&#8217;t help much, so don&#8217;t bother with it.)<\/p>\n<p>The reason for each of these questions is:<\/p>\n<ol>\n<li>Sender e-mail address: appears in log files, might be invalid in some way, might be blacklisted.\u00a0 Actually, the envelope sender appears in the logs, and the From: header does not.<\/li>\n<li>Recipient e-mail address: appears in log files, might be invalid in some way, might be affected by DNS, IP routing, SMTP server failure, local delivery failure (e.g. long queue), spam filtering failure or &#8220;incorrect&#8221; success.\u00a0 This is the most important part of the report: if you have to choose just one thing to know about the mail failure you are diagnosing, choose the recipient address.<\/li>\n<li>Time of sending: appears in the log files, tells you which particular log file to look in, tells you whether it&#8217;s possibly a transient failure, related to an outage, related to an update.<\/li>\n<li>SMTP server that handled the mail: you have to start somewhere &#8211; the last place the mail was seen alive is a good place &#8211; sometimes the mail has not even left the sending computer, and the answer to this question tells you that.<\/li>\n<li>What was the error: it is amazing how many times I get asked to solve a problem with nobody telling what the actual problem is.\u00a0 It&#8217;s obvious to the person asking the question, but somehow the telepathic message never gets through.\u00a0 Knowing the actual error makes such a difference.\u00a0 Sometimes the error is not even a failure &#8211; &#8220;the mail system is broken&#8221; can mean &#8220;I am getting my mail, and some of it is spam.&#8221;<\/li>\n<\/ol>\n<p>So what kind of errors mess up your mail?<\/p>\n<ul>\n<li>Sometimes mail disappears into the void &#8211; although if you know which <strong>SMTP server<\/strong> last handled it, the void is less formless &#8211; especially if that server is under your administration.<\/li>\n<li>Sometimes that server cannot deliver it because of blacklisting, greylisting, load.\u00a0 Knowing the last <strong>SMTP server<\/strong> and the <strong>time<\/strong> can identify the problem.\u00a0 Often enough, the <strong>error<\/strong> says what the problem was.<\/li>\n<li>The weirdest failure is people who will not take mail from you, because they cannot verify the sender address, because their system is misbehaving (e.g. it doesn&#8217;t speak SMTP properly, like exim&#8217;s verify_sender callout)<\/li>\n<li>The same mail is received over and over: at the bottom of the pile you find that either it was sent over and over (not often though), or it was forwarded in some kind of loop, or the end-user system is downloading it over and over.\u00a0 When you have the <strong>sender<\/strong> and<strong> recipient<\/strong> address, you can verify that there is only one of these mails in the end-user mailbox, and that it is marked as already-read.<\/li>\n<li>DNS failures: the worst DNS failures are the ones where someone thinks they have two DNS servers, but they have just one overloaded virtual machine with two IP addresses and a congested and contended network interface.\u00a0 When you have the <strong>sender<\/strong> and <strong>recipient <\/strong>addresses, you can test the DNS configuration for each of them.<\/li>\n<li>Sometimes the system administrator deletes your mail.\u00a0 Yep.\u00a0 If you&#8217;re sending spam, you can expect that.\u00a0 If there&#8217;s a large system failure (e.g. one server out of 17 fails for a week after building up a large backlog), then working through the backlog of mail can be impossible because of available resources of CPU time, network bandwidth on the sender and the recipients.\u00a0 Late delivery generates its own problems and queries in any case.\u00a0 It&#8217;s one of those &#8220;if it&#8217;s important they will phone again&#8221; moments.\u00a0 Anti-spam systems discriminate against old mail as well &#8211; if it spent its time sitting in a mail queue somewhere, it may be because it overwhelmed that system&#8217;s capacity by its sheer spammyness (spamosity?).\u00a0 When you have the last <strong>SMTP server<\/strong> you can properly assign blame.\u00a0 If you are to blame, you can confess.<\/li>\n<\/ul>\n<p>Additionally, since these are the <strong>correct<\/strong> questions for almost any mail problem, you can reject all mail problem reports that do not include these details.\u00a0 \u263a\u00a0 (Don&#8217;t try this at home folks.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I used to face a lot of problems with e-mail of this sort: The e-mail is not working Person X is not getting an important mail There&#8217;s something wrong with the mail system Please check that the mail servers are &hellip; <a href=\"https:\/\/www.mcgill.org.za\/stuff\/archives\/979\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[179,180,190,181],"class_list":["post-979","post","type-post","status-publish","format-standard","hentry","category-stuff","tag-mail","tag-smtp","tag-stuff","tag-troubleshooting"],"_links":{"self":[{"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/posts\/979","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/comments?post=979"}],"version-history":[{"count":3,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/posts\/979\/revisions"}],"predecessor-version":[{"id":982,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/posts\/979\/revisions\/982"}],"wp:attachment":[{"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/media?parent=979"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/categories?post=979"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mcgill.org.za\/stuff\/wp-json\/wp\/v2\/tags?post=979"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}