-
Notifications
You must be signed in to change notification settings - Fork 22
Warmshowers Mandrill Mail Handling
We use Mandrill for mailhandling, both incoming and outgoing. See Randy for the credentials (we only have one set of credentials).
Incoming mail is super-critical and a fundamental mission of the site. Most member communications now goes through this route, and that means that if this breaks, things are in really bad shape. There have been times that a large incoming email (with a huge attachment) has blocked the incoming mails. At one point, and perhaps still, a single failed hit on our incoming webservice could block all mail until it was resolved.
Incoming mail going to [email protected] (the reply-to on all email from the site) gets routed via SMTP to Mandrill, and Mandrill then shoots us a hit on the service at https://www.warmshowers.org/services/rest/mandrill_events with the full details about the message. (The Mandrill admin links for this are https://mandrillapp.com/inbound/routes?domain=reply.warmshowers.org and current events can be viewed at https://mandrillapp.com/settings/webhooks/batches?id=3). The mandrill_incoming module is configured at https://www.warmshowers.org/admin/build/services/mandrill_incoming, but I doubt you'd need to change it.
If you get an alert about Mandrill not being able to get through to the webhook, you can visit
- The events link to see what the problem is; the options are to retry, stop trying, etc.
- The Warmshowers admin/reports/dblog (choose mandrill_incoming and perhaps other).
Note that Nagios monitoring is also set up for incoming mail, and after 2 hours of no mail received, there will be an email alert from Nagios.
Successful incoming email automatically gets transformed into a private message on the site, and that automatically results in a notification to the recipient.
Unfortunately, Mandrill refuses to process additional inbound webhooks if one fails - it just keeps trying and trying that one, blocking all other mail. We have opened tickets and followed them extensively, but they don't see the problem here.
In this case, the solution is to get the misbehaving batch out of the way. To do that, go to https://mandrillapp.com/settings/webhooks/batches?id=3 and
- (optional) Click "View batch" on the misbehaving batch and download it, noting the curl recipe provided.
- Return to https://mandrillapp.com/settings/webhooks/batches?id=3 and click "Give up" on the misbehaving batch.
In most cases the type of mandrill inbound failure can be diagnosed and worked around. For example, for a "413 Request Entity Too Large", the size of entity allowed in nginx and php5-fpm can be increased.
Incoming messages to the Mandrill webhook need to meet a number of criteria. They have to have a valid hash, have to be from a user that has privileges to post a message, and have to have a valid message to reply to. If any of these items fail, a notification is sent to an admin. Most of the time it's just an invitation to Linkedin or something stupid like that ("Ralph Schmage has invited you to check out Dropbox [MANDRILL FAILED]" or "I've lost all my money and I'm stuck in Manilla [MANDRILL FAILED]"). But on occasion a member has mangled an email somehow, and it has to be manually put together and sent on, as with "Mail clients that can't do reply-to etc." below.
Email from the site has a reply-to address composed of the hash and @reply.warmshowers.org, like "messages+mid.342659.1.eMCkb8MrYClHd4nq0LhGpE8HHXB=@reply.warmshowers.org". It also includes the mid..hash at the end of the subject line.
The vast majority of our mail is correctly handled by people's email client doing a reply-to to the special messages... email reply, and then the to: address is used for routing.
A few people have email clients that don't respect the reply-to header; those emails land in the [email protected] account. There is a filter rule on the [email protected] account which looks at the subject (which also has the routing hash) and if it finds the mid+routing hash there, it forwards it on to [email protected]. This all seems to work most of the time. There are a few cases where they reply-all and this results in two posts in private messaging.
Then there are just a few people with email clients which don't respect reply-to AND break the subject header in half, losing the routing hash. This happens ever couple of days, and when it does I copy the body into an email, copy the routing hash into the subject, and send the whole thing to [email protected]. That just solves it.
It's easy enough to re-send problem mail; the key is stripping out any garbage of multiple replies, etc:
- Mail to [email protected]
- Change the subject to the mid hash you'll find somewhere in the message, like [mid.1157433.88386.8a5NbQ2XJOhQgaJWA+fQg/82tNN=]
- Leave in the body only the actual message, nothing else.
It doesn't matter who you send the mail as, it will be interpreted and dumped into the proper message thread.
Outgoing mail is super easy - The site just gives outgoing mail to postfix locally, which is set up to send all mail to Mandrill using SMTP.
The great thing about Mandrill is that it gives us delivery reports about every single email, so it's really easy to debug what's going on with a particular user (at least to the point of delivery to their provider... no way to find out why it went into spam folder, etc.)
If you log into http://mandrillapp.com you get a dashboard which allows you to search with a number of searches. If you just put an email into the search box that's usually good enough.
On occasion, a user will mark mail as spam. I just visit the Rejection blacklist (Gear Icon->Rejection Blacklist) and make sure that anything that is on the rejection list because marked as spam is removed.
If a user was bouncing but the bounce is resolved, you might need to also remove them from the rejection blacklist. This is unusual though. Most bouncers never recover.
2014-01-15 Notification from Mandrill: [Mandrill Alert] Webhook Failing: https://www.warmshowers.org/services/rest/mandrill_events (403, signature validation)
This type of notification requires immediate action, because all reply email for Warmshowers.org is blocked by an event of this type.
- The Inbound routes page and view events show the story from Mandrill's side. They're getting a 403 denied from us.
- I visited the dblog and filtered by "mandrill_incomin" to see what the situation was.
- I note that one message is failing signature validation, resulting in a 403 and blocking all other messages. I believe this can only be a bug on Mandrill's side.
- Immediately prior to the signature validation watchdog error is a full transcription of the message.
- I created and sent an email with the body being nothing but the relevant part of the body copied from the message transcription, the to: being [email protected], and the subject being the mid string (from the email: Routing: mid.365583.39325.JBRemXilquJyjjl/k0PVqL8stmQ=). This will then go through with the same content. I could also have used the message link in the email and pasted the info into the message on the site, explaining who I was and why I was pasting the message.
- I used the view/download option on Mandrill's site to save away the message for possible future debugging with them.
- I then clicked the "Give up" button on the view events page on Mandrill for that message, hoping that the next one doesn't have this problem.
- Processing then continued just fine on incoming emails.
- I sent the problem email to [email protected] by getting the body and mid information from the https://www.warmshowers.org/admin/reports/dblog. See this example fixed message. Just sending the message with the correct routing subject, a body, and to:[email protected] does the job.
Note that this may have been the same problem described in the body truncation issue, #325, which was apparently a problem on our side, although I have no idea why it happens.
[MANDRILL FAILED] Means that we weren't able to process the incoming message. Today we got one of these. Some pieces:
Hi theoldestonehouse, you have received a message on Warmshowers.org from Allan in Florida:
> Routing email: messages+mid.366051.36052.KYpy9cKC/[email protected]
> Site link: https://www.warmshowers.org/user/36052/messages/view/364790
> Sender: https://www.warmshowers.org/user/44652
- The dblog filtered by mandrill_incomin shows right away what happened event. The message id was invalid, which is often a result of a deleted user.
- Clicking the user link in the message (user/44652) shows page not found
- Clicking the message link shows page not found
- Searching recent messages we see that Allan in Florida deleted his account today.
No further action is required. Had this been an important message we might have either tried to get it to Allan or notified the sender that it could not be delivered.
Refer to this email example
PM email notifications are from [email protected] and Reply-To: a special hash with full routing information. Example: messages+mid.966823.54051.b/3cqkb/rggngp/[email protected]. Normal, working mail clients will reply to that address, which is routed to Mandrill and all is well. Then there are some mail clients which fail to respect the reply-to, and then email ends up at [email protected]. For those, we have a gmail filter which forwards the mail to [email protected], where it's routed based on the Subject header. However, there exist mail clients which manage to both not reply to the reply-to AND break the subject header. Then we have no routing, and it just lands in the [email protected] mailbox.
The email example referenced above is an example of that. To deliver it, we just have to send a valid message. See this example fixed message. Just sending the message with the correct routing subject, a body, and to:[email protected] does the job.
It would also be perfectly acceptable to follow the message link in this message, paste the body, and explain that you're an admin delivering a message that was malformed.
Note that there is yet another case where people reply-all to a message, which results in an email coming into [email protected] (but it's also addressed to the full routing address). In this case, the recipient will get two copies - the first delivered by normal mandrill routing, the second as a result of [email protected] email forwarding it to [email protected].
No action is required on this type of problem.