more HELO statistics
2004-12-02 14:29Counting all offered messages (rejected or not), we saw 1 447 252 different HELO names in the last month. If I count the number of dots in each name, the resulting histogram is as follows. The small end (0-2 dots) is inflated by incompetence and forgery. The big end (>10 dots) is 99.99% abuse.
Of the messages we accept, 274 902 different HELO names were used (19% of the total). If I count the number of dots in each name, the resulting histogram looks like this:
A lot of these are clearly bogus, for example 80 characters of random
words concatenated with an IP address, like
Antigone.meter.ernet.ne.jpsouthparkmail.comnetlane.comlouiskoo.comjpopmail.comtw60.186.213.104
or a random collection of concatenated domain names, like
cave.ngs.ouse.hello.nlsammail.compcmail.com.twsouthparkmail.com
(These should obviously be added to my HELO heuristics!) After removing them, there are 272 890 HELO names. If I count the number of dots in each name, the resulting histogram looks like this:
This still includes various stupidities. 26631 of the 37272 single dot names ending in com|net|org have no name servers so are invalid. Of the unfiltered list, 208323 of the 288884 com|net|org names are invalid.
Edit: Actually, if you use less-strict DNS validity checking those numbers are 22015 (instead of 26631) and 206556 (instead of 208323).
25765
450511 .
218188 ..
432343 ...
197647 ....
33647 ..... 5
28485 ......
19790 .......
4582 ........
2040 .........
3069 .......... 10
7005 ...........
9483 ............
7722 .............
4390 ..............
1840 ............... 15
568 ................
150 .................
23 ..................
3 ...................
1 .................... 20
Of the messages we accept, 274 902 different HELO names were used (19% of the total). If I count the number of dots in each name, the resulting histogram looks like this:
5723
69182 .
84906 ..
75131 ...
26182 ....
4723 ..... 5
4436 ......
2686 .......
279 ........
123 .........
123 .......... 10
317 ...........
447 ............
320 .............
211 ..............
87 ............... 15
21 ................
4 .................
1 ..................
A lot of these are clearly bogus, for example 80 characters of random
words concatenated with an IP address, like
Antigone.meter.ernet.ne.jpsouthparkmail.comnetlane.comlouiskoo.comjpopmail.comtw60.186.213.104
or a random collection of concatenated domain names, like
cave.ngs.ouse.hello.nlsammail.compcmail.com.twsouthparkmail.com
(These should obviously be added to my HELO heuristics!) After removing them, there are 272 890 HELO names. If I count the number of dots in each name, the resulting histogram looks like this:
5723
69182 .
84905 ..
75130 ...
26176 ....
4688 ..... 5
4334 ......
2521 .......
179 ........
47 .........
0 .......... 10
2 ...........
3 ............
This still includes various stupidities. 26631 of the 37272 single dot names ending in com|net|org have no name servers so are invalid. Of the unfiltered list, 208323 of the 288884 com|net|org names are invalid.
Edit: Actually, if you use less-strict DNS validity checking those numbers are 22015 (instead of 26631) and 206556 (instead of 208323).
no subject
Date: 2004-12-03 03:00 (UTC)As a result of my recent data mining :-) I've added three new HELO blocking rules:
block if the HELO name contains a double dot
block if the HELO name is 55 characters or more
(this deals with a large proportion of class D)
The final one is based on DNS lookups. Take the final two components of the name. If the top level component has name servers but the second level component does not, it's a bogus name.If the top level component has no name servers it's incompetence rather than malice so we give them the benefit of the doubt. This should deal with a large proportion of the abusive names.
The other large component of class D is consumer Internet access addresses, which I prefer to leave to blacklist maintainers to deal with.