Password scanning
2008-05-09 00:11AJ's idea of scanning email for passwords has provoked a lot of strange suggestions, online and in person. Elaborate password cracking clusters, phishing my own users, hardware acceleration, etc... (When AJ suggested the Idea, my first thought was to use a Bloom filter to find passwords, since you can't recover the contents of a Bloom filter without brute force - but it's very hard to delete items from a Bloom filter, which in this scenario must be done whenever any user changes their password. No, that too is not going to work.)
The whole idea is very borderline: is it worth spending significant effort when the phishing success rate is much less than 1% per incident, and the current fashion for phishing universities is probably short-lived? (This week we got about 2000 messages from the phishers and 5 users replied.) On the other hand it would be very interesting to find out what the detection rate would be. Would there be any false positives? i.e. unintentional password appearances? What is the legitimate positive rate? e.g. senior staff sending their passwords to their PAs? (The latter is against our AUP but it is common practice.) How much password sharing is there outside the anticipated scenarios?
It seems that it's worth making the point that it isn't hard for me to get my users' plaintext passwords: I could just instrument our various SASL implementations. But we (me and my colleagues) don't do that because sysadmins are safer not knowing their users' secrets. This is why we don't log message subjects, and why our accidental-deletion recovery tools don't require seeing any message contents. We don't look at the contents of a user's account in any detail without permission from that user - and even then we'd very much prefer not to know that we're recovering email that was deleted by their jilted lover who obtained their password from a shared browser history.
From my point of view, the interesting thing is that it is feasible to detect when a user is sending their own password in a message, using just a standard Unix encrypted password file and some simple code: crypt every word the user sends and compare with that user's crypted password. This is just a few hundred lines of C, including hooks into our authentication database and email content scanner, and choosing the right words. My prototype code can check 2000 MD5 crypted words per second per CPU, and should be able to skip most words in a message since they are outside our password rules.
There has been a lot of traffic on various mailing lists about these phishing attacks, especially notifications of new reply addresses. But we don't want to be in the business of maintaining blacklists of addresses our users mustn't send to. Password scanning seems like a simple way of avoiding that tar pit, which is I think the main attraction. So why do I think it's absurd?
The whole idea is very borderline: is it worth spending significant effort when the phishing success rate is much less than 1% per incident, and the current fashion for phishing universities is probably short-lived? (This week we got about 2000 messages from the phishers and 5 users replied.) On the other hand it would be very interesting to find out what the detection rate would be. Would there be any false positives? i.e. unintentional password appearances? What is the legitimate positive rate? e.g. senior staff sending their passwords to their PAs? (The latter is against our AUP but it is common practice.) How much password sharing is there outside the anticipated scenarios?
It seems that it's worth making the point that it isn't hard for me to get my users' plaintext passwords: I could just instrument our various SASL implementations. But we (me and my colleagues) don't do that because sysadmins are safer not knowing their users' secrets. This is why we don't log message subjects, and why our accidental-deletion recovery tools don't require seeing any message contents. We don't look at the contents of a user's account in any detail without permission from that user - and even then we'd very much prefer not to know that we're recovering email that was deleted by their jilted lover who obtained their password from a shared browser history.
From my point of view, the interesting thing is that it is feasible to detect when a user is sending their own password in a message, using just a standard Unix encrypted password file and some simple code: crypt every word the user sends and compare with that user's crypted password. This is just a few hundred lines of C, including hooks into our authentication database and email content scanner, and choosing the right words. My prototype code can check 2000 MD5 crypted words per second per CPU, and should be able to skip most words in a message since they are outside our password rules.
There has been a lot of traffic on various mailing lists about these phishing attacks, especially notifications of new reply addresses. But we don't want to be in the business of maintaining blacklists of addresses our users mustn't send to. Password scanning seems like a simple way of avoiding that tar pit, which is I think the main attraction. So why do I think it's absurd?
no subject
Date: 2008-05-09 20:51 (UTC)We want latency to be a few seconds typically. At the moment the delay can be quite variable because of the way the batch-mode scanner works, but this summer I'm replacing it with an SMTP-time scanner which I hope should improve that. (Though it has the disadvantage of handling overload much less gracefully... I hope we never see that in action!)
Cyrus (our message store software) supports ACLs and shared folders. However our current partitioning and replication architecture prevents us from using the feature.
This is partly because Cyrus's usual cluster architecture relies on a separate mailbox location server (mapping mailbox names to message store machines), which we don't like because it's a single point of failure. We have our own proxy which maps users to message stores, and is much simpler and more resilient than Cyrus's. However we might be able to adapt the Cyrus proxy to use a more static mailbox->server map so we could avoid the SPOF and enable users to access mailboxes that aren't on their home servers.
The other aspect of the problem is that our original replication protocol had built-in assumptions about how mailboxes relate to accounts, e.g. w.r.t. the database that keeps track of which user has read which messages. The replication had some re-working when it was incorporated into the official Cyrus codebase, so this might have been fixed. However I am not certain about this.
no subject
Date: 2008-05-10 04:59 (UTC)Fortunately usually over an SSL/TLS tunnel :-)