Discussion:
[Sqlgrey-users] warn_if_reject to populate the db
/dev/rob0
2005-06-26 13:01:12 UTC
Permalink
I've got it going now, was rather easy. I started in test mode, with
"warn_if_reject check_policy_service inet:127.0.0.1:2501" in main.cf.
Will this really populate my database? Since nothing is being rejected,
it looks like non-returning triplets. Nothing is autowhitelisted until
the first appearance after the greylist period, correct?

Other things I'm wondering, and please forgive me if they're in the
archives[1]:

1. localhost:2501 vs. Unix socket
Wouldn't a socket be slightly faster than TCP?

2. Running under control of master(8)
That would be convenient, start and stop with Postfix; are there other
benefits? Why the standalone choice?

3. Database population commands
I'm totally lost with SQL (hence the poor choice of mysql), can someone
help with the manual commands I'd use to add to the database?

4. Database population scripts
Is there something I could run against user's maildirs which would add
entries to the AWL? If not should I commission such a project from my
private farm of Perl coders[2]; I mean, would there be interest?

5. External files vs. database tables
$conf_dir/clients_*_whitelist* - why flat files vs. having additional
database tables?

6. dyn_fqdn.regexp
That's quite an expression. I didn't figure the whole thing out, but I
did look for a string "\.res\." which is commonly used for dynamic
space, e.g., *.res.rr.com. (residential customers.) Perhaps the second
dot should be any non-alpha character (-, _, ., digit), and to be safe
there should be at least 2 domain segments following and at least one
segment preceding (implied by the leading dot.)

7. Coordination with infidels
Greylisters, regardless of their MTA and choice of implementation, are
all in this together. We're all going to run into the same issues with
stupid and/or big providers which have problems getting real mail
through greylisting. I didn't see a list or forum at greylisting.org.
What is being done to coordinate with outsiders? I personally have
subscribed to the postgrey list, where just this morning a thread of
general grey interest was started (well, just one post so far.)

8. Beyond grey
This is a biggie which probably warrants its own thread. This is all
about spam abatement. What about integrating other antispam strategies
under the roof of the same policy service? Yes, this belongs in its own
thread. I'll write more of my thoughts about that later.

Thanks, Lionel, this looks good so far. I went live with a small but
heavily-spammed domain yesterday evening, and no spam has been seen
there since. (The sqlgrey is last in a long list of restrictions with
numerous RBL checks.)



[1] Is it just me, or are Sourceforge list archives atrocious?

[2] a/k/a /dev/wife. I might need some help with #3 above to get her
started, but OTOH she has some PostgreSQL knowledge.
--
mail to this address is discarded unless "/dev/rob0"
or "not-spam" is in Subject: header
Michel Bouissou
2005-06-26 13:37:25 UTC
Permalink
Post by /dev/rob0
I've got it going now, was rather easy. I started in test mode, with
"warn_if_reject check_policy_service inet:127.0.0.1:2501" in main.cf.
Will this really populate my database? Since nothing is being rejected,
it looks like non-returning triplets. Nothing is autowhitelisted until
the first appearance after the greylist period, correct?
You're right, it won't populate much. Only senders that will come several
times (because they send several messages to the same recipient) in a 24h
period will end in the AWL tables.

But I'm not sure that this "warn_if_reject" temporary strategy is of any
interest, unless you operate a _very_ high volume server. You can as well
start from scratch with an empty DB and let it populate by itself. Rememeber
that only the 1st mail from any given sender will be delayed...
Post by /dev/rob0
1. localhost:2501 vs. Unix socket
Wouldn't a socket be slightly faster than TCP?
Probably, but works only if on the same machine. TCP is more general, so to
use sockets, Lionel would need to implement both methods. Given the little
amount of data transferred between Postfix and sqlgrey for a given
connection, using sockets would probably make a difference only on very high
volume system.
Post by /dev/rob0
2. Running under control of master(8)
That would be convenient, start and stop with Postfix; are there other
benefits? Why the standalone choice?
Running under the control of master would be interesting to me as well.
Post by /dev/rob0
3. Database population commands
I'm totally lost with SQL (hence the poor choice of mysql), can someone
help with the manual commands I'd use to add to the database?
Reading the fine MySQL doc would probably help with the basic SQL commands,
but you'd better let the DB populate by itself.

BTW, why state that MySQL is a poor choice ?
Post by /dev/rob0
4. Database population scripts
Is there something I could run against user's maildirs which would add
entries to the AWL? If not should I commission such a project from my
private farm of Perl coders[2]; I mean, would there be interest?
I don't think there's any interest in trying to artificially populate the DB.
Just let it run ;-)
Post by /dev/rob0
6. dyn_fqdn.regexp
That's quite an expression.
;-)
Post by /dev/rob0
I didn't figure the whole thing out, but I
did look for a string "\.res\." which is commonly used for dynamic
space, e.g., *.res.rr.com. (residential customers.) Perhaps the second
dot should be any non-alpha character (-, _, ., digit), and to be safe
there should be at least 2 domain segments following and at least one
segment preceding (implied by the leading dot.)
I don't pretend that the regexp is perfect, it's only heuristic, but it would
be interesting adding your modification only if you find samples of existing
hostnames that don't get properly classified. (i.e. your example may already
get classified corresctly if the last byte of the IP address is part of the
name) The more things you add in the regexp, the longer it will take to
process, and the higher chances it could collide with other non-dynamic names
in an undesired manner.

But feel free to experiment with your own copy and suggest improvements that
show useful...
Post by /dev/rob0
7. Coordination with infidels
Greylisters, regardless of their MTA and choice of implementation, are
all in this together. We're all going to run into the same issues with
stupid and/or big providers which have problems getting real mail
through greylisting.
So far, I don't have any problem running SQLgrey, and the problematic servers
IPs/names can be reported to this list for inclusion in the provided (and
auto-updating) whitelists...
Post by /dev/rob0
8. Beyond grey
This is a biggie which probably warrants its own thread. This is all
about spam abatement. What about integrating other antispam strategies
under the roof of the same policy service? Yes, this belongs in its own
thread. I'll write more of my thoughts about that later.
I think about it the Unix way. I prefer to use several distinct tools of my
choice, each one doing _one_ thing and doing it well. I wouldn't like any
bigger system that would integrate different methods, I personally prefer
doing my own cooking.

Postfix can call several "policy services" without any problem...
Post by /dev/rob0
Thanks, Lionel, this looks good so far. I went live with a small but
heavily-spammed domain yesterday evening, and no spam has been seen
there since. (The sqlgrey is last in a long list of restrictions with
numerous RBL checks.)
My 2 cents: Put SQLgrey _before_ RBLs. You'll save external network calls so
your system will run faster, and you'll save unnecessary load onto the RBL
servers as well...
Post by /dev/rob0
[2] a/k/a /dev/wife. I might need some help with #3 above to get her
started, but OTOH she has some PostgreSQL knowledge.
About /dev/wife, please see below...
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E

[***@blonde etc]$ su
Password: zyva
[***@blonde etc]# modprobe mariage
WARNING: Module "mariage.o" not under GPL license. Inserting "mariage.o"
will taint the kernel.
Module "mariage.o" inserted with 18712 warnings. Please check system log.
blonde kernel: Discovering new devices...
blonde kernel: Found device: belle_mere
blonde kernel: Activated device: belle_mere. Have a nice day.
blonde kernel: Found device tree for /dev/mari
devfsd: Creating symlink /dev/coiffeur => /dev/mari/portefeuille
devfsd: Creating symlink /dev/estheticienne => /dev/mari/portefeuille
devfsd: Creating symlink /dev/robes => /dev/mari/portefeuille
blonde kernel: Activated device: uterus. Please wait until completion.
blonde kernel: Found child device: lardon_#1
blonde kernel: Found child device: lardon_#2
devfsd: Device nibards owner changed to lardon_#2, group lardons,
mode 660
blonde kernel: Removing inactive device: foufoune
devfsd: Creating symlink /dev/foufoune => /dev/migraine
devfsd: Modifying symlink /dev/brain => /dev/random
WARNING: Superuser privileges removed from user: mari
WARNING: You will be logged out. Have a nice day, anyway try to.
[***@blonde etc]$ ls /sex
Permission denied
[***@blonde etc]$ rmmod mariage
bash: rmmod: command not found
[***@blonde etc]$ /sbin/rmmod mariage
rmmod: Operation not permitted
[***@blonde etc]$ su
Password: zyva
su: Incorrect password.
[***@blonde etc]$ damned je suis refait !
bash: damned: command not found
[***@blonde etc]$ /sbin/insmod maitresse
maitresse: Operation not permitted

-+- GSM in topless - Bien configurer son mariage -+-
Michel Bouissou
2005-06-26 14:01:11 UTC
Permalink
Post by Michel Bouissou
Post by /dev/rob0
I didn't figure the whole thing out, but I
did look for a string "\.res\." which is commonly used for dynamic
space, e.g., *.res.rr.com. (residential customers.) Perhaps the second
dot should be any non-alpha character (-, _, ., digit), and to be safe
there should be at least 2 domain segments following and at least one
segment preceding (implied by the leading dot.)
I don't pretend that the regexp is perfect, it's only heuristic, but it
would be interesting adding your modification only if you find samples of
existing hostnames that don't get properly classified. (i.e. your example
may already get classified corresctly if the last byte of the IP address is
part of the name) The more things you add in the regexp, the longer it will
take to process, and the higher chances it could collide with other
non-dynamic names in an undesired manner.
But feel free to experiment with your own copy and suggest improvements
that show useful...
All the samples of *.res.rr.com that I can find in my logs also include the IP
address in their hostname, so SQLgrey's "smart" algo will already consider
them as "dyamic/enduser" without the need for making the regexp more complex.

Some examples at random:
108-64.200-68.tampabay.res.rr.com
160-35.26-24.tampabay.res.rr.com
180-17.26-24.se.res.rr.com
193-135.207-68.elmore.res.rr.com
2-85.207-68.tampabay.res.rr.com
61.186.204.68.cfl.res.rr.com
67.147.204.68.cfl.res.rr.com
cpe-24-164-130-13.si.res.rr.com
cpe-24-165-149-211.midsouth.res.rr.com
cpe-24-166-0-41.indy.res.rr.com
cpe-24-166-232-151.columbus.res.rr.com
cpe-67-11-214-14.satx.res.rr.com
cpe-67-49-227-231.dc.res.rr.com
cpe-67-49-62-53.socal.res.rr.com
cpe-67-9-163-159.austin.res.rr.com

etc.
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michel Bouissou
2005-06-26 14:06:45 UTC
Permalink
Post by Michel Bouissou
Post by Michel Bouissou
Post by /dev/rob0
I didn't figure the whole thing out, but I
did look for a string "\.res\." which is commonly used for dynamic
space, e.g., *.res.rr.com. (residential customers.) Perhaps the second
dot should be any non-alpha character (-, _, ., digit), and to be safe
there should be at least 2 domain segments following and at least one
segment preceding (implied by the leading dot.)
I don't pretend that the regexp is perfect, it's only heuristic, but it
would be interesting adding your modification only if you find samples of
existing hostnames that don't get properly classified. (i.e. your example
may already get classified corresctly if the last byte of the IP address
is part of the name) The more things you add in the regexp, the longer it
will take to process, and the higher chances it could collide with other
non-dynamic names in an undesired manner.
But feel free to experiment with your own copy and suggest improvements
that show useful...
All the samples of *.res.rr.com that I can find in my logs also include the
IP address in their hostname, so SQLgrey's "smart" algo will already
consider them as "dyamic/enduser" without the need for making the regexp
more complex.
More, checking my logs, I don't see any other ISP besides rr.com and Verizon
that uses a ".res." part in their hostnames, and both rr.com and Verizon also
put the IP address in... Verizon addresses also begin with "pool-", that the
regexp already catches.

Verizon samples:

pool-151-200-35-11.res.east.verizon.net[151.200.35.11]
pool-138-88-28-49.res.east.verizon.net[138.88.28.49]

etc.
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michael Storz
2005-06-27 08:23:57 UTC
Permalink
Hi Michel,
Post by Michel Bouissou
More, checking my logs, I don't see any other ISP besides rr.com and Verizon
that uses a ".res." part in their hostnames, and both rr.com and Verizon also
put the IP address in... Verizon addresses also begin with "pool-", that the
regexp already catches.
pool-151-200-35-11.res.east.verizon.net[151.200.35.11]
pool-138-88-28-49.res.east.verizon.net[138.88.28.49]
etc.
FYI, a grep through my logs show some other ISPs in addition:

host641191066a.pnpl.res.tor.fcibroadband.com
host6614614859.dsl.res.tor.fcibroadband.com

r63h83.res.gatech.edu

152-130-1-16.res.net.va.gov

216-201-133-129.res.logixcom.net
216-201-134-162.res.logixcom.net
216-201-149-243.res.logixcom.net

63-110-249-163.res.evv.cable.sigecom.net
65-218-56-81.res.evv.cable.sigecom.net
63-87-215-131.res.nb.cable.sigecom.net
65-195-96-103.res.nb.cable.sigecom.net
65-195-96-253.res.nb.cable.sigecom.net

host-se80.res.openband.net
host-se87.res.openband.net
host-si112.res.openband.net
host-si198.res.openband.net
host-sj22.res.openband.net
host-sk219.res.openband.net
host-sk249.res.openband.net
host-sk87.res.openband.net
host-sl48.res.openband.net
host487.res.openband.net
user308.res.openband.net
user353.res.openband.net
user414.res.openband.net
user445.res.openband.net

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Lionel Bouton
2005-06-28 14:05:25 UTC
Permalink
Post by Michel Bouissou
Post by /dev/rob0
1. localhost:2501 vs. Unix socket
Wouldn't a socket be slightly faster than TCP?
Probably, but works only if on the same machine. TCP is more general, so to
use sockets, Lionel would need to implement both methods. Given the little
amount of data transferred between Postfix and sqlgrey for a given
connection, using sockets would probably make a difference only on very high
volume system.
Even there the difference would probably be unnoticable. There are far
more complex computations involved than a TCP/IP connection. Putting
back Unix sockets would be easy, but it will be one more parameter to
set up...
Post by Michel Bouissou
Post by /dev/rob0
2. Running under control of master(8)
That would be convenient, start and stop with Postfix; are there other
benefits? Why the standalone choice?
Running under the control of master would be interesting to me as well.
Being standalone allows SQLgrey to be put anywhere the admin see fit.
Although running under the control of master would be a definitive plus.
I'll add it to my TODO.
Post by Michel Bouissou
Post by /dev/rob0
3. Database population commands
I'm totally lost with SQL (hence the poor choice of mysql), can someone
help with the manual commands I'd use to add to the database?
Reading the fine MySQL doc would probably help with the basic SQL commands,
but you'd better let the DB populate by itself.
BTW, why state that MySQL is a poor choice ?
It's often buggy and doesn't follow the standard ?
Post by Michel Bouissou
Post by /dev/rob0
4. Database population scripts
Is there something I could run against user's maildirs which would add
entries to the AWL? If not should I commission such a project from my
private farm of Perl coders[2]; I mean, would there be interest?
I don't think there's any interest in trying to artificially populate the DB.
Just let it run ;-)
One could imagine various ways of (dynamically) modifying the DB to fine
tune the greylisting processus. Everyone is welcomed to experiment with
this, but I've no guidelines on this subject, the keyword is 'experiment'.
Post by Michel Bouissou
Post by /dev/rob0
6. dyn_fqdn.regexp
That's quite an expression.
;-)
I know Michel is proud of this one :-)
Post by Michel Bouissou
Post by /dev/rob0
I didn't figure the whole thing out, but I
did look for a string "\.res\." which is commonly used for dynamic
space, e.g., *.res.rr.com. (residential customers.) Perhaps the second
dot should be any non-alpha character (-, _, ., digit), and to be safe
there should be at least 2 domain segments following and at least one
segment preceding (implied by the leading dot.)
I don't pretend that the regexp is perfect, it's only heuristic, but it would
be interesting adding your modification only if you find samples of existing
hostnames that don't get properly classified. (i.e. your example may already
get classified corresctly if the last byte of the IP address is part of the
name) The more things you add in the regexp, the longer it will take to
process, and the higher chances it could collide with other non-dynamic names
in an undesired manner.
But feel free to experiment with your own copy and suggest improvements that
show useful...
Yep, we can even redistribute the modifications you make to clients
running sqlgrey_update_config.
Post by Michel Bouissou
Post by /dev/rob0
8. Beyond grey
This is a biggie which probably warrants its own thread. This is all
about spam abatement. What about integrating other antispam strategies
under the roof of the same policy service? Yes, this belongs in its own
thread. I'll write more of my thoughts about that later.
I think about it the Unix way. I prefer to use several distinct tools of my
choice, each one doing _one_ thing and doing it well. I wouldn't like any
bigger system that would integrate different methods, I personally prefer
doing my own cooking.
Postfix can call several "policy services" without any problem...
Combining results of various checks at the Postfix level is rather
cumbersome. This is why I added a reference to SPF in my TODO: my idea
was that it wouldn't bring much benefit to greylist already known good
MTAs. We could combine a domain whitelist with SPF checks: if the source
domain is in the whitelist and the SPF checks are OK, don't greylist.
This is far in the future, though, I'll need to break SQLgrey into
modules first.
Post by Michel Bouissou
Post by /dev/rob0
Thanks, Lionel, this looks good so far. I went live with a small but
heavily-spammed domain yesterday evening, and no spam has been seen
there since. (The sqlgrey is last in a long list of restrictions with
numerous RBL checks.)
My 2 cents: Put SQLgrey _before_ RBLs. You'll save external network calls so
your system will run faster, and you'll save unnecessary load onto the RBL
servers as well...
That would be my recommendation too. Only put RBLs first if you want to
have statistics about how much more greylisting brings to you if you
have RBLs already defined.

Lionel
Michel Bouissou
2005-06-28 14:24:36 UTC
Permalink
Post by Lionel Bouton
Combining results of various checks at the Postfix level is rather
cumbersome.
I wouldn't say "cumbersome", I would say "flexible". It allows you to
configure exactly what you want, how you want it, with the tools of your
choice.

SPF and greylisting have nothing to do together. If you first integrate SPF
into SQLgrey, then why not integrate DNSBLs as well ? Then RHBLs... And
then...

Furthermore, my current traffic show that SPF actually stops very _little_
mail, so its efficiency is still very marginal, compared to greylisting that
stops the vast majority of junk...
Post by Lionel Bouton
This is why I added a reference to SPF in my TODO: my idea
was that it wouldn't bring much benefit to greylist already known good
MTAs.
SPF doesn't define any "good" MTA by itself. It only lists "domain-approved"
MTAs. If spammerdomain.com defines spammermachine as authorized for the
domain, then it's OK for SPF. Mr. Joe Spammer can also put a "+all" SPF
record for his spammerdomain.com, and then any open proxy out there will be
"SPF approved" for relaying his spam...
Post by Lionel Bouton
We could combine a domain whitelist with SPF checks: if the source
domain is in the whitelist and the SPF checks are OK, don't greylist.
Yes, it can be useful in this way _only_ with a manual whitelist. But why
bother, as the greylisting system will create its AWL automatically with much
less effort than having to maintain a manual WL ?
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Lionel Bouton
2005-06-28 14:45:21 UTC
Permalink
Post by Michel Bouissou
Post by Lionel Bouton
We could combine a domain whitelist with SPF checks: if the source
domain is in the whitelist and the SPF checks are OK, don't greylist.
Yes, it can be useful in this way _only_ with a manual whitelist. But why
bother, as the greylisting system will create its AWL automatically with much
less effort than having to maintain a manual WL ?
The whitelist could be dynamically loaded from a central repository...

But this example is probably not what would be coded, it doesn't bring
enough benefit for the trouble. The idea is there though: some
combinations are better done in Postfix, some other in the policy server.

Lionel.
Michel Bouissou
2005-06-28 14:55:35 UTC
Permalink
Post by Lionel Bouton
Post by Michel Bouissou
Yes, it can be useful in this way _only_ with a manual whitelist. But why
bother, as the greylisting system will create its AWL automatically with
much less effort than having to maintain a manual WL ?
The whitelist could be dynamically loaded from a central repository...
I don't like that much this central repository thing. The nice thing with
greylisting is that it is a purely dynamic system that doesn't need any
external output, and doesn't need to rely on other people's appreciation
(what you do when you use RBLs...)

SQLgrey currently has a central repository only for the whitelists that are
"exceptions that don't work with greylisting", so this is understandable, and
for the regexps, that haven't changed at all so far, and that are expected to
change very little (maybe once in the future if the .res. thing proves
useful ?)

Even now, people have no obligation to use the auto-update script, so sqlgrey
can continue to be used purely locally at the admin choice.
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michael Storz
2005-06-28 14:59:55 UTC
Permalink
Post by Michel Bouissou
SPF and greylisting have nothing to do together. If you first integrate SPF
into SQLgrey, then why not integrate DNSBLs as well ? Then RHBLs... And
then...
Exactly, this is one of the next steps which need to be done. The only
problem at the moment is the delay DNS request can introduce into the
handling of the incomming connections.

The check for DNSBLs would be inserted before a connection would be put
into teh connect table. This means all established communication relations
will still work if an MTA gets on a blacklist, because of the AWLs. Only
new communication relations will not be allowed. We have already tested
this with a cronjob, which deleted entries from table connect, if the ip
address is on a DNSBL and it works.

We do not use it at the moment, because

- I want to have the correct error text given back (is not possible with a
implementation by a cronjob)
- I would like to have optin/optout for this feature
- I have not yet setup all the different DNSBLs a want to use as local
zones to have a fast query

But till the end of this year I will be running sqlgrey with DNSBLs,
because I think more and more spammers will we able to get through
greylisting.

We already have one spammer who gets through very often. It is a porno
spammer with an advertisement in german. Yoy can check, if this spammers
is included in your tables with the following select

select * from domain_awl where sender_domain = "amuro.net" or
sender_domain = "dbzmail.com" or sender_domain = "fastermail.com" or
sender_domain = "glay.org" or sender_domain = "indiatimes.com" or
sender_domain = "jaydemail.com" or sender_domain = "kittymail.com" or
sender_domain = "operamail.com" or sender_domain = "outgun.com" or
sender_domain = "surfy.net" or sender_domain = "wongfaye.com" or
sender_domain = "yyhmail.com" order by src;

If for an ip address several domains are reported, then you have the same
problem as we have.
Post by Michel Bouissou
Furthermore, my current traffic show that SPF actually stops very _little_
mail, so its efficiency is still very marginal, compared to greylisting that
stops the vast majority of junk...
Post by Lionel Bouton
This is why I added a reference to SPF in my TODO: my idea
was that it wouldn't bring much benefit to greylist already known good
MTAs.
SPF doesn't define any "good" MTA by itself. It only lists "domain-approved"
MTAs. If spammerdomain.com defines spammermachine as authorized for the
domain, then it's OK for SPF. Mr. Joe Spammer can also put a "+all" SPF
record for his spammerdomain.com, and then any open proxy out there will be
"SPF approved" for relaying his spam...
Post by Lionel Bouton
We could combine a domain whitelist with SPF checks: if the source
domain is in the whitelist and the SPF checks are OK, don't greylist.
Yes, it can be useful in this way _only_ with a manual whitelist. But why
bother, as the greylisting system will create its AWL automatically with much
less effort than having to maintain a manual WL ?
SPF should only be used as a hint. Every MTA has to prove that it is a
well behaved MTA, that means it retries correctly. But if you have this
prove then SPF is a good hint to move the domain to domain_awl instead of
waiting till enough entries exist for grouping. And yes, you will move
spammer domains too. But remember greylisting has nothing to do with
sorting MTA into good or bad in relation to spam. It sorts into good or
bad in relation to retries. If you have a spammer MTA you need other tools
to stop this spammer.

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Michael Storz
2005-06-28 14:39:13 UTC
Permalink
Post by Lionel Bouton
Combining results of various checks at the Postfix level is rather
cumbersome. This is why I added a reference to SPF in my TODO: my idea
was that it wouldn't bring much benefit to greylist already known good
MTAs. We could combine a domain whitelist with SPF checks: if the source
domain is in the whitelist and the SPF checks are OK, don't greylist.
This is far in the future, though, I'll need to break SQLgrey into
modules first.
We have SPF running together with sqlgrey (cron_job). The basic idee is,

if count_from_awl >= X {
result = check_for_spf;
if result = 'pass' {
move_domain_from_mail_to_domain_awl
}
}

Lionel, if you program the forked process which will handle all the
propagations, moves, inserts then I will augment this process with our
algorithms

- SPF-check
- MX-check
- A-check

And it would be nice to have a field for every table entry about the
creating algorithm (grouping or one of the above), as I explained before.

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Michel Bouissou
2005-06-28 14:55:07 UTC
Permalink
Post by Michael Storz
We have SPF running together with sqlgrey (cron_job). The basic idee is,
[...]
Post by Michael Storz
Lionel, if you program the forked process which will handle all the
propagations, moves, inserts then I will augment this process with our
algorithms
- SPF-check
- MX-check
- A-check
I'm afraid SQLgrey could become bloated with a lot of features that wouldn't
be useful for the vast majority of its users, and which usefulness would need
to be carefully studied before implementing, as that would make its
configuration and understanding far much complicated...

We have to be careful to avoid the M$ Word syndrome ;-)
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michael Storz
2005-06-28 15:09:47 UTC
Permalink
We have SPF running together with sqlgrey (cron_job). The basic idee is=
,
[...]
Lionel, if you program the forked process which will handle all the
propagations, moves, inserts then I will augment this process with our
algorithms
- SPF-check
- MX-check
- A-check
I'm afraid SQLgrey could become bloated with a lot of features that would=
n't
be useful for the vast majority of its users, and which usefulness would =
need
to be carefully studied before implementing, as that would make its
configuration and understanding far much complicated...
We have to be careful to avoid the M$ Word syndrome ;-)
These algorithms run for about 4 months at our site and they have proven
to be very successful. But it will always be a choice of Lionel to
incorporate our patches into sqlgrey and therefore have the combined power
of pushing sqlgrey further. BTW, I have a bunch of patches sitting in the
queue to be send ot Lionel. I still waiting to see if ne of these patches
is working correctly now.

If you mean with the term users the people who benefit from these
additions then we have a lot of users of this category :-)

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Michel Bouissou
2005-06-28 15:25:13 UTC
Permalink
Post by Michael Storz
These algorithms run for about 4 months at our site and they have proven
to be very successful.
I have no doubt about it. But to me the question is about the vocation of
SQLgrey. Is SQLgrey supposed to be a (very efficient) greylisting system, or
is it evoluating to become an exhaustive MTA anti-spam policy server
implementing each and every possible way of filtering spam (before queue).

I am personally interested in the 1st option, and not in the 2nd, as may good
solutions already exists for the "non-greylisting" methods, and because IMHO
it doesn't make much sense to mix all this together in a single system,
unless you want to end up with something as heavy and complex as
swiss-army-knife tools such as amavisd-new (for example).

I'm much more in favour of the Unix philosophy : Build with bricks, one brick
does one thing and does it well. In such a vision, IMHO, the role of SQLgrey
is to be one of the bricks of an antispam solution, no to try to be the
complete wall on its own.

Let the MTA be the place where to integrate the bricks and build the wall
according to its admin's choices.

But that's just my 2c...
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Lionel Bouton
2005-06-28 21:00:54 UTC
Permalink
Post by Michel Bouissou
Post by Michael Storz
These algorithms run for about 4 months at our site and they have proven
to be very successful.
I have no doubt about it. But to me the question is about the vocation of
SQLgrey. Is SQLgrey supposed to be a (very efficient) greylisting system, or
is it evoluating to become an exhaustive MTA anti-spam policy server
implementing each and every possible way of filtering spam (before queue)..
SQLgrey already does more than pure greylisting. It uses a specific
whitelist and AWLs, 1.7.x even does throttling. Whatever will be added
to SQLgrey will try to help fine-tune the greylisting process more or
less in the same way:
- either try to avoid greylisting "know good" clients as is done today,
- maybe discriminate among senders, enforcing different reconnect delay
or reconnect tries before they enter AWLs and/or are allowed to pass.

The keyword here is *combination*: if we can combine other informations
about the connection to help fine-tune the greylisting process *and* it
is impossible to do with Postfix alone or really difficult (Michel, I
don't want to awaken another thread but you are the one asking for "MAIL
FROM:" whitelists in SQLgrey although it is rather easy to use them with
Postfix, your position here surprises me...) then adding code to SQLgrey
makes sense (especially when SQLgrey will be modular).

Lionel.
Michel Bouissou
2005-06-29 15:24:15 UTC
Permalink
Please don't take this as the start of a troll-war Lionel, I just want to
answer your points...
Post by Lionel Bouton
SQLgrey already does more than pure greylisting. It uses a specific
whitelist and AWLs, 1.7.x even does throttling.
All this _is_ pure greylisting. It is intelligent greylisting, but it is only
based upon whether or not hosts are able to reconnect for resending
temporarily rejected messages, with AWLs "memorizing" it automatically, and
throttling avoiding that a machine that does not retry fills up a connect
table with hundreds or thousands of dummy "first tries".

And the manual whitelists are just here to be able to let pass some rare
"known good" hosts which couldn't pass greylisting otherwise, because they
don't behave as they should (they are "good" hosts, but they violate RFCs).

This is only greylisting. Good and intelligent one.
Post by Lionel Bouton
Whatever will be added
to SQLgrey will try to help fine-tune the greylisting process more or
- either try to avoid greylisting "know good" clients as is done today,
- maybe discriminate among senders, enforcing different reconnect delay
or reconnect tries before they enter AWLs and/or are allowed to pass.
Then what you use for this "discrimination" clearly makes you step out of
greylisting, because you will take into account information of different
kinds, that have nothing to do with greylisting itself.
Post by Lionel Bouton
The keyword here is *combination*: if we can combine other informations
about the connection to help fine-tune the greylisting process
I don't believe that those kinds of "combinations" will make the greylisting
any better. Only heavier, more complicated, more bug-prone...
Post by Lionel Bouton
*and* it is impossible to do with Postfix alone or really difficult (Michel,
I don't want to awaken another thread but you are the one asking for "MAIL
FROM:" whitelists in SQLgrey
That's true. When you build a filter of any kind, making provisions for
(manual) exceptions that should be allowed thru seems to me to stay within
the scope of the given filter.
Post by Lionel Bouton
although it is rather easy to use them with Postfix,
Nope. Because if you have a series of 10 Postfix restrictions, and let's says
SQLgrey comes #5, but you want to skip greylisting for a given sender, if you
use to Postfix table to say "OK" for the given sender, then you don't only
skip greylisting, but also skip the 4 following other rules, which you might
want to enforce. That's why it's impractical.

Yes, I know, you can create specific chains of "smtpd_restriction_classes" as
well, but the number of possible combinations grows exponentially as the
number of rules or filters you use grows.

That's why I think that it makes sense that each filters makes provisions for
its exceptions...
Post by Lionel Bouton
your position here surprises me...)
I hope I made it clearer...
Post by Lionel Bouton
then adding code to SQLgrey makes sense (especially when SQLgrey will be
modular).
Now your position suprises me ;-) I've always seen you very reluctant about
adding new features or algos to SQLgrey, and very hard to convice that a
suggestion may be useful. It takes time and effort before convicing you that
a given idea could be good and worth implementing. I think this is good,
because this way of working allows you to keep the software simple and
efficient, and avoid it becoming bloated with tons of more or less
useful(-less) features...

So well, I admit this easily, even if you refuse to integrate some features I
would like, as a sender regexp whitelist.

And now I see you considering adding tons of new features that have strictly
nothing to do with greylisting, and for which the usefulness of combination
with greylisting still needs to be precisely demonstrated (which I doubt very
much).

So now I am suprised by your own positions ;-)

Mine is clear and simple : Let's SQLgrey be the best and more complete
greylisting tool out there. Let it do well everyting that relates to
greylisting. And don't do anything that doesn't directly relate to
greylisting...

But of course, this is your "child", so you're the boss ;-)
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michel Bouissou
2005-06-28 15:35:26 UTC
Permalink
Post by Lionel Bouton
This is why I added a reference to SPF in my TODO
And if you add SPF, then tomorrow you'll have to consider adding M$
CallerID/SenderID as well, and then Yahoo! DomainKeys, then SRS validity
check, then the same for SES, then for BATV, then for the new algorithm of
the day ;-)
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michael Storz
2005-06-29 08:10:44 UTC
Permalink
Post by Michel Bouissou
Post by Lionel Bouton
This is why I added a reference to SPF in my TODO
And if you add SPF, then tomorrow you'll have to consider adding M$
CallerID/SenderID as well, and then Yahoo! DomainKeys, then SRS validity
check, then the same for SES, then for BATV, then for the new algorithm o=
f
Post by Michel Bouissou
the day ;-)
If these additions will strengthen the greylisting process, then they
should be implemented.


BTW, deverping BATV-adresses will be in the patch I am testing at the
moment. We need this, because abuse.net uses BATV and we do not want to
delay their emails.

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Who Knows
2005-06-29 14:27:34 UTC
Permalink
Post by Michael Storz
Post by Michel Bouissou
Post by Lionel Bouton
This is why I added a reference to SPF in my TODO
And if you add SPF, then tomorrow you'll have to consider adding M$
CallerID/SenderID as well, and then Yahoo! DomainKeys, then SRS validity
check, then the same for SES, then for BATV, then for the new algorithm of
the day ;-)
If these additions will strengthen the greylisting process, then they
should be implemented.
Okay, here is my $0.02. They DO NOT strengthen the greylisting process
as they have
nothing to do with greylisting. They are simply other methods for
combatting SPAM.

Any utility that attempts to be the omnicient "do all" ultimately fails.
Therefore
my vote is that SQLgrey should do what is does best... greylisting.

Jim
Michel Bouissou
2005-06-29 15:05:01 UTC
Permalink
Post by Who Knows
Post by Michael Storz
If these additions will strengthen the greylisting process, then they
should be implemented.
Okay, here is my $0.02. They DO NOT strengthen the greylisting process
as they have nothing to do with greylisting. They are simply other methods
for combatting SPAM.
Absolutely.
Post by Who Knows
Any utility that attempts to be the omnicient "do all" ultimately fails.
Therefore my vote is that SQLgrey should do what is does best...
greylisting.
<AOL>
Me too !
</AOL>
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Ray Booysen
2005-06-29 15:09:17 UTC
Permalink
Post by Michel Bouissou
Post by Who Knows
Post by Michael Storz
If these additions will strengthen the greylisting process, then they
should be implemented.
Okay, here is my $0.02. They DO NOT strengthen the greylisting process
as they have nothing to do with greylisting. They are simply other methods
for combatting SPAM.
Absolutely.
Post by Who Knows
Any utility that attempts to be the omnicient "do all" ultimately fails.
Therefore my vote is that SQLgrey should do what is does best...
greylisting.
<AOL>
Me too !
</AOL>
<ENRON>
Me too !
</ENRON>
Michael Storz
2005-06-30 08:35:01 UTC
Permalink
Post by Who Knows
Okay, here is my $0.02. They DO NOT strengthen the greylisting process
as they have
nothing to do with greylisting. They are simply other methods for
combatting SPAM.
Any utility that attempts to be the omnicient "do all" ultimately fails.
Therefore
my vote is that SQLgrey should do what is does best... greylisting.
Jim
I do not think there was a request to "enhance" sqlgrey to a "do all"
daemon. I agree with you every new feature must be related to greylisting.
I suppose you do not want to run a pure greylisting server, because that
means to switch off the from_awl und domain_awl. Pure greylisting is done
with connect and a simple whitelist for not well behaving MTAs. And this
list can be very short, as I have experienced in the meantime.

At the moment you use AWLs, you are not using pure greylisting anymore.
And I think you ARE using sqlgrey because of its excellent AWLs.

The goal of all future enhancements are twofold:

- they should put as many entries as possible in AWLs to reduce the delay
of regular messages
- they should reduce the possibility that spam mails make use of entries
in the whitelists

You must always see these two goals together, they cannot be separated.

Coming to SPF. If you want to use SPF to combat spam directly, then you
should use a program/policy server outside of sqlgrey. The way I am using
SPF is to get entries faster in domain_awl than using group_domain_level
alone. SPF is a hint, that the sending IP is authorized to send emails
with this originator. But it is only a hint. The MTA has still to prove it
is a well behaved MTA (at the moment we use 3 retries = 3 entries in
from_awl, whereas group_domain_level is set to 10). This kind of usage of
SPF is a way to strengthen greylisting/sqlgrey.

The same is true for the rcpt_awl, we need for forwarded emails. We have a
lot of employees, which separate their email addresses for business and
private use. For their privat emails they use an address with a freemail
provider. All emails arriving at the freemail provider are forwarded to
their business address, which is allowed in our university environment.
This way they have separated their addresses, but still have all emails
together in one mailbox, where they can be accessed immediately.

The other case is, a lot of employees of the universities come from other
universities. When they moved, they set up a forward to an email address
here.

In both cases we want to accept such forwarded emails immediately without
any delay. But with from_awl and domain_awl alone, it is not possible. The
sending MTA may have an entry in domain_awl, but normally only for the
domains it is responsible. But often the forwarded emails have an
originator with a different domain. Therefore every new originator will go
through greylisting, which makes no sense, because often we now that the
sending MTA is a well behaved MTA and we trust this MTA.

With rcpt_awl this is different. In this table we will include the ip
address of the MTA and the recipient address, but no information about the
originator. That means we need only one entry per forward and all emails
coming this way will be accepted immediately. And this means greylisting
is strengthened again.

I hope it is now clear what I meant with:

"If these additions will strengthen the greylisting process, then they
should be implemented."

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840
Michel Bouissou
2005-06-30 09:23:46 UTC
Permalink
Post by Michael Storz
I suppose you do not want to run a pure greylisting server, because that
means to switch off the from_awl und domain_awl. Pure greylisting is done
with connect and a simple whitelist for not well behaving MTAs. And this
list can be very short, as I have experienced in the meantime.
At the moment you use AWLs, you are not using pure greylisting anymore.
And I think you ARE using sqlgrey because of its excellent AWLs.
AWLs are just "greylisting with a memory". I don't agree that using AWLs isn't
"pure greylisting".
Post by Michael Storz
[...] The way I am using SPF is to get entries faster in domain_awl than
using group_domain_level alone. SPF is a hint, that the sending IP is
authorized to send emails with this originator. But it is only a hint. The
MTA has still to prove it is a well behaved MTA [...]
The major objections that I see to integrating any kind of SPF check in
sqlgrey is that :

1/ SPF not having anything to do with greylisting, it isn't pure greylisting
at all anymore.

2/ SPF being a complex algorithm, it necessitates a complex implementation in
SQLgrey, which is both heavy and bug-prone. And as the SPF standard might
evolve, it would need SQLgrey to follow its evolutions -- we're adding a lot
of foreseeable trouble there.
Furthermore there are other proposed standards competing with SPF, so if you
consider adding one, then you'll have to consider adding others, and this is
endless.

3/ SPF cannot be done using only the information that Postfix provides to the
SQLgrey policy server. SPF needs at least one, possibly several, DNS calls,
which will have a very noticeable negative impact on SQLgrey performance. I
object to the idea that SQLgrey would need to perform any kind of network
request to be able to make a decision. One could say that DNS calls are fast,
but it's not always the case. The remote DNS server you call can sometimes be
very slow, unreachable or unresponsive...
Post by Michael Storz
The same is true for the rcpt_awl, we need for forwarded emails. We have a
lot of employees, which separate their email addresses for business and
private use. For their privat emails they use an address with a freemail
provider. All emails arriving at the freemail provider are forwarded to
their business address, which is allowed in our university environment.
This way they have separated their addresses, but still have all emails
together in one mailbox, where they can be accessed immediately.
The other case is, a lot of employees of the universities come from other
universities. When they moved, they set up a forward to an email address
here.
In both cases we want to accept such forwarded emails immediately without
any delay. But with from_awl and domain_awl alone, it is not possible. The
sending MTA may have an entry in domain_awl, but normally only for the
domains it is responsible. But often the forwarded emails have an
originator with a different domain. Therefore every new originator will go
through greylisting, which makes no sense, because often we now that the
sending MTA is a well behaved MTA and we trust this MTA.
With rcpt_awl this is different. In this table we will include the ip
address of the MTA and the recipient address, but no information about the
originator. That means we need only one entry per forward and all emails
coming this way will be accepted immediately. And this means greylisting
is strengthened again.
Regarding email forwarded from other universities, I believe the number of
such universities isn't that high and doesn't grow that fast. You'd probably
be better adding the know MTAs of those universities to an existing manual
whitelist, as you know those servers are true MTAs, well behaved.

Same could be done rather easily for the 4 or 5 major "free email providers
and forwarders" you're talking about. That makes what ? Hotmail, MSN, Yahoo,
a couple of biggest ISPs in your country, and you'd probably cover more than
80% of your forwarding greylisting issues (if one believes the 80/20 law ;-)

For the remaining 20%, it would concern mainly "private forwarded mail", and
in any case, the from_awl would take care that only the first mail from A to
B thru "forwarder" would be greylisted. And if "forwarder" forwards a
noticeable amount of email from any given domain, it would end in domain_awl
for this domain anyway...

So I'm really not sure that your forwarding issue necessitates the added
complexity and performance cost that adding new tables to SQLgrey would
necessitate.

For example, rcpt_awl seems plain useless to me, as if you manually put an
entry in rcpt_awl for a couple MTA_IP / recipient, then you mean that MTA_IP
is well behaved. If it's well behaved for "recipient", it's also well behaved
for any other recipient, and MTA_IP can perfectly be added to the existing
manual whitelists without the need of adding a supplementary table, isn't
it ?

Cheers.
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Michael Storz
2005-06-30 10:50:16 UTC
Permalink
Post by Michel Bouissou
The major objections that I see to integrating any kind of SPF check in
1/ SPF not having anything to do with greylisting, it isn't pure greylisting
at all anymore.
This is just a statement without any reasons behind it. Therefore I just
state the opposite: it has to do with greylisting.
Post by Michel Bouissou
2/ SPF being a complex algorithm, it necessitates a complex implementation in
SQLgrey, which is both heavy and bug-prone. And as the SPF standard might
evolve, it would need SQLgrey to follow its evolutions -- we're adding a lot
of foreseeable trouble there.
Wrong. Here is the relevant code out of my program:

use Mail::SPF::Query;
my %query_options = ( ipv4 => $conn_ip,
sender => $sender_domain,
helo => $sender_domain, # should be helo-domain, but helo-domain not yet avail.
my $query = new Mail::SPF::Query(%query_options);
my ($result, $smtp_comment, $header_comment) = $query->result;
if ($result eq 'pass') {
...

I think these few lines of code cannot be called a complex implementation.
Post by Michel Bouissou
Furthermore there are other proposed standards competing with SPF, so if you
consider adding one, then you'll have to consider adding others, and this is
endless.
As long as the relevant information is outside the evelope, e.g.
information in the header, these standards cannot be used directly by
greylisting. If, however, other standards exist, which operate on the
envelope, we will see, if they could be used by greylisting. But as long
as I do not know them, I cannot say anything about them.
Post by Michel Bouissou
3/ SPF cannot be done using only the information that Postfix provides to the
SQLgrey policy server. SPF needs at least one, possibly several, DNS calls,
which will have a very noticeable negative impact on SQLgrey performance. I
object to the idea that SQLgrey would need to perform any kind of network
request to be able to make a decision. One could say that DNS calls are fast,
but it's not always the case. The remote DNS server you call can sometimes be
very slow, unreachable or unresponsive...
This is true, the actual implementation of sqlgrey does not allow any
algorithm which uses DNS request which will go out into the Internet. If
we want to use such algorithms, then either sqlgrey must go from multiplex
to prefork mode, or the propagation algorithms must use a sideline
process, similar to the cleanup process. I am aware of this problem since
the beginning.
Post by Michel Bouissou
Post by Michael Storz
The same is true for the rcpt_awl, we need for forwarded emails. We have a
lot of employees, which separate their email addresses for business and
private use. For their privat emails they use an address with a freemail
provider. All emails arriving at the freemail provider are forwarded to
their business address, which is allowed in our university environment.
This way they have separated their addresses, but still have all emails
together in one mailbox, where they can be accessed immediately.
The other case is, a lot of employees of the universities come from other
universities. When they moved, they set up a forward to an email address
here.
In both cases we want to accept such forwarded emails immediately without
any delay. But with from_awl and domain_awl alone, it is not possible. The
sending MTA may have an entry in domain_awl, but normally only for the
domains it is responsible. But often the forwarded emails have an
originator with a different domain. Therefore every new originator will go
through greylisting, which makes no sense, because often we now that the
sending MTA is a well behaved MTA and we trust this MTA.
With rcpt_awl this is different. In this table we will include the ip
address of the MTA and the recipient address, but no information about the
originator. That means we need only one entry per forward and all emails
coming this way will be accepted immediately. And this means greylisting
is strengthened again.
Regarding email forwarded from other universities, I believe the number of
such universities isn't that high and doesn't grow that fast. You'd probably
be better adding the know MTAs of those universities to an existing manual
whitelist, as you know those servers are true MTAs, well behaved.
Same could be done rather easily for the 4 or 5 major "free email providers
and forwarders" you're talking about. That makes what ? Hotmail, MSN, Yahoo,
a couple of biggest ISPs in your country, and you'd probably cover more than
80% of your forwarding greylisting issues (if one believes the 80/20 law ;-)
For the remaining 20%, it would concern mainly "private forwarded mail", and
in any case, the from_awl would take care that only the first mail from A to
B thru "forwarder" would be greylisted. And if "forwarder" forwards a
noticeable amount of email from any given domain, it would end in domain_awl
for this domain anyway...
So I'm really not sure that your forwarding issue necessitates the added
complexity and performance cost that adding new tables to SQLgrey would
necessitate.
For example, rcpt_awl seems plain useless to me, as if you manually put an
entry in rcpt_awl for a couple MTA_IP / recipient, then you mean that MTA_IP
is well behaved. If it's well behaved for "recipient", it's also well behaved
for any other recipient, and MTA_IP can perfectly be added to the existing
manual whitelists without the need of adding a supplementary table, isn't
it ?
The most expensive thing at our computer centre ist person power, whereas
computers are cheap. Therefore manual maintenance must be avoided at all
costs. Instead automatic maintenance of whitelists must be done. This is
the reason why we choose sqlgrey, as sqlgrey implements AWLs! rcpt_awl is
an automatic whitelist as the name suggests. The manual whitelist for ip
addresses must only be used as a last resort.

Michael Storz
-------------------------------------------------
Leibniz-Rechenzentrum ! <mailto:***@lrz.de>
Barer Str. 21 ! Fax: +49 89 2809460
80333 Muenchen, Germany ! Tel: +49 89 289-28840

Michel Bouissou
2005-06-30 07:08:07 UTC
Permalink
Post by Lionel Bouton
Post by Michel Bouissou
BTW, why state that MySQL is a poor choice ?
It's often buggy and doesn't follow the standard ?
I didn't use MySQL until a few weeks ago, so I couldn't tell for older
versions, but MySQL 4.1.12 seems to work perfectly for me.

And it's 15 times faster (!!!) than PostgreSQL for the very DB-intensive DSPAM
(this nice crowd of bugs ;-)

15 times faster, that was a good reason for me to migrate...

Of course for sqlgrey and less intensive DB applications, PostgreSQL was
satisfactory to me, but I didn't want to have several DB servers running on
my system, so I migrated all my DBs from PostgreSQL to MySQL.

Cheers.
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Lionel Bouton
2005-06-30 08:25:33 UTC
Permalink
Post by Michel Bouissou
Post by Lionel Bouton
Post by Michel Bouissou
BTW, why state that MySQL is a poor choice ?
It's often buggy and doesn't follow the standard ?
I didn't use MySQL until a few weeks ago, so I couldn't tell for older
versions, but MySQL 4.1.12 seems to work perfectly for me.
And it's 15 times faster (!!!) than PostgreSQL for the very DB-intensive DSPAM
(this nice crowd of bugs ;-)
DSPAM is a special case. I've had a quick look into its backend drivers:
they use proprietary features of MySQL to speed things up. Their
PostgreSQL driver is known to be less worked on too.

For pure *transactional* (MySQL's MyISAM tables aren't in the same
competitive field) SQL, MySQL and PostgreSQL are roughly at the same
performance level although PostgreSQL is often reported to scale better
under concurrent accesses.

On the stability front, of all the database admins I've had personnal
contacts with, only the MySQL ones witnessed crashes (and chronic ones
at that) and data loss not related to hardware failures. It was each
time on heavily loaded, very big databases (tables in the GB or tens of
GB range), running on multiple CPUs...
The stability seems better since 6 months ago in the 4.1.x branch tough.
But for 6 years I've personnaly worked with PostgreSQL in various kinds
of loads and queries, I've *never* had a crash. In fact I've a
PostgreSQL 7.2 server in production for 3 years now with around 500 days
of uptime, which is the time since the last kernel upgrade (I don't know
the exact uptime as it wrapped at around 450 days and now shows 42 days
:-) ) hosting 10 databases with a total constant load around 10
queries/second.

In the quirkiness department, in SQLgrey, setting aside the now()
function declaration for SQLite, although I originally developped
SQLgrey with MySQL, all the special cases are for it, from the top of my
head:
- the "I update the timestamp because I think it's better for you" bugfix,
- the need to tell the client to autoreconnect because the server drops
the connection without being told to,
- the heavy SQL involved with timestamp differences.

The first two introduced bugs in SQLgrey, the last one made me spent
quite some time looking for a working SQL syntax...
Post by Michel Bouissou
15 times faster, that was a good reason for me to migrate...
<OT> Yep. In my case, the lack of performance is one reason I consider
evaluating other statistical filters. Even with MySQL, DSPAM doesn't
seem to scale well enough for thousands of users. Too bad, DSPAM comes
with a nice web interface out of the box. I'll have to do some php/perl
web hacking and I lack time for this :-(</OT>

Lionel.
Michel Bouissou
2005-06-30 09:45:09 UTC
Permalink
Post by Lionel Bouton
Post by Michel Bouissou
Post by Lionel Bouton
Post by Michel Bouissou
BTW, why state that MySQL is a poor choice ?
It's often buggy and doesn't follow the standard ?
I didn't use MySQL until a few weeks ago, so I couldn't tell for older
versions, but MySQL 4.1.12 seems to work perfectly for me.
And it's 15 times faster (!!!) than PostgreSQL for the very DB-intensive
DSPAM (this nice crowd of bugs ;-)
they use proprietary features of MySQL to speed things up. Their
PostgreSQL driver is known to be less worked on too.
That's true. But in my case, I had to focus on the actual result ;-)

It also seems that the "proprietary features of MySQL" that allows to speed
things up are missing in PostgreSQL.
Post by Lionel Bouton
For pure *transactional* (MySQL's MyISAM tables aren't in the same
competitive field) SQL, MySQL and PostgreSQL are roughly at the same
performance level although PostgreSQL is often reported to scale better
under concurrent accesses.
I'm using "InnoDB" tables in MySQL, which are transactional (although I
configured MySQL not in "completely atomic" mode, as I didn't need full
transactional features, and to speed things up). In this configuration, and
on the same machine, DSPAM still is 15 times faster using MySQL than using
PostgreSQL...
Post by Lionel Bouton
On the stability front, of all the database admins I've had personnal
contacts with, only the MySQL ones witnessed crashes (and chronic ones
at that) and data loss not related to hardware failures.
InnoDB tables are reported to be very safe and robust (and that's why I
choosed to use them).
Post by Lionel Bouton
But for 6 years I've personnaly worked with PostgreSQL in various kinds
of loads and queries, I've *never* had a crash.
I don't have any doubt that PostgreSQL is very good and stable. I've used it
for years as well, and have never lost data, nor have I seen any PostgreSQL
crash.

Only this speed issue with DSPAM made me shift from Pg to MySQL.
Post by Lionel Bouton
In the quirkiness department, in SQLgrey, setting aside the now()
function declaration for SQLite, although I originally developped
SQLgrey with MySQL, all the special cases are for it, from the top of my
- the "I update the timestamp because I think it's better for you" bugfix,
- the need to tell the client to autoreconnect because the server drops
the connection without being told to,
- the heavy SQL involved with timestamp differences.
The first two introduced bugs in SQLgrey, the last one made me spent
quite some time looking for a working SQL syntax...
Well, I don't know for "the server drops the connection without being told
to", but the fact that 2 different SQL servers behave differently on some
details isn't surprising, and wouldn't make me say that one is better than
the other, as long as this is documented -- which is the case for timestamp
fields for example.
Post by Lionel Bouton
<OT> Yep. In my case, the lack of performance is one reason I consider
evaluating other statistical filters. Even with MySQL, DSPAM doesn't
seem to scale well enough for thousands of users. Too bad, DSPAM comes
with a nice web interface out of the box. I'll have to do some php/perl
web hacking and I lack time for this :-(</OT>
The things that mostly pisses me off with DSPAM is the number of bugs it
contains, the fact that long reported bugs aren't fixed (as the developper
prefers adding new features rather than fixing known bugs, and quite often
seems not to want to hear that his software may have bugs...), plus the fact
that half of the times the fixing of a bug introduces another new bug.

I have more than serious doubts about the quality of this software and its
development.

OTOH, it "mostly works" (surprisingly enough ;-) and when it works, it usually
works very good.

But for example I've noticed than on my system, since I've upgraded to the
latest (supposedly stable) DSPAM 3.4.8, retraining missed spams and false
positives doesn't work anymore :-((

DSPAM says it has done it and doesn't complain, but the tokens in the DB
aren't actually reversed :-((
</OT>
--
Michel Bouissou <***@bouissou.net> OpenPGP ID 0xDDE8AC6E
Loading...