Discussion:
[Sqlgrey-users] Sqlgrey-users Digest, Vol 14, Issue 1
Philippe Chaintreuil
2008-03-28 12:51:08 UTC
Permalink
Dec 15 19:51:51 hobbes sqlgrey: dbaccess: warning: couldn't do
query: UPDATE from_awl SET last_seen = NOW(), first_seen =
first_seen WHERE sender_name = 'schuhb?ckoubut' AND
Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '=',
reconnecting to DB
Date: Tue, 18 Dec 2007 23:36:43 +0100
Wow. Digest subscription to this list is *slow*. I just got your
response today March 28th, 2008! I had almost forgotten I wrote that
original e-mail.

List owner: You might want to turn on mailman's "digest_send_periodic"
option (on the Admin->Digest Options page of the web interface).
Hum, is your database configured to use UTF-8?
I think it is. I'm using MySQL, and I'm not that well versed in it's
deep inner-workings. Probably about a year ago, I read something that
said I should get that utf8_general_ci,COERCIBLE in there. I did that
and it obviously didn't seem to do much.
If so you can get such errors, as email adresses are sent from Postfix
as pure ASCII which can (and does regularly) use invalid UTF-8
sequences. You'll have to convert it to ASCII (latin-1 will do, it's
fine when fed with ASCII data).
I should really add an entry related to non-ANSI characters
(accentuated mainly) to the FAQ :-)
I second the FAQ suggestion on this front -- it should also probably
have the answer to my next question as well:

Ummm... How do I fix this in MySQL?

It should also probably get added to the INSTALL instructions as well.


Sorry if this has all been addressed since Lionel replied back in December.
--
.----------. ,----------== Peep ==----------,
/ .-. .-. \ `-== (Philippe Chaintreuil) ==-`
/ | | | | \
\ `-' `-' _/ FidoNet Netmail Address:
/\ .--. / | 1:2613/118
\ | / / / / InterNet Address:
/ | `--' /\ \ ***@parallaxshift.com
/`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST--
"We're just two lost souls swimming in a fish bowl..."
Lionel Bouton
2008-03-28 13:49:16 UTC
Permalink
Post by Philippe Chaintreuil
Date: Tue, 18 Dec 2007 23:36:43 +0100
Wow. Digest subscription to this list is *slow*. I just got your
response today March 28th, 2008! I had almost forgotten I wrote that
original e-mail.
List owner: You might want to turn on mailman's "digest_send_periodic"
option (on the Admin->Digest Options page of the web interface).
It was already set. I changed the period to daily instead of monthly.
But I suspect a bug...

Lionel
Philippe Chaintreuil
2008-03-28 13:51:28 UTC
Permalink
Post by Lionel Bouton
It was already set. I changed the period to daily instead of monthly.
But I suspect a bug...
Hmmm.... Strange. Anyway, seems like the list isn't high-traffic, so
I just dropped the digest mode for myself. That fixes it for me.
--
.----------. ,----------== Peep ==----------,
/ .-. .-. \ `-== (Philippe Chaintreuil) ==-`
/ | | | | \
\ `-' `-' _/ FidoNet Netmail Address:
/\ .--. / | 1:2613/118
\ | / / / / InterNet Address:
/ | `--' /\ \ ***@parallaxshift.com
/`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST--
"We're just two lost souls swimming in a fish bowl..."
Philippe Chaintreuil
2008-04-02 18:00:37 UTC
Permalink
Dec 15 19:51:51 hobbes sqlgrey: dbaccess: warning: couldn't do
query: UPDATE from_awl SET last_seen = NOW(), first_seen =
first_seen WHERE sender_name = 'schuhb?ckoubut' AND
Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '=',
reconnecting to DB
Okay, here's some more detail.

Fresh log entry:

-----------------------------------------------------------------------
Apr 2 12:48:42 hobbes sqlgrey: warning: Use of uninitialized value in
concatenation (.) or string at /usr/sbin/sqlgrey line 1139.
Apr 2 12:48:42 hobbes sqlgrey: dbaccess: error: couldn't access
from_awl table:

Apr 2 12:48:42 hobbes sqlgrey: grey: from awl match: updating
212.56.148.57(212.56.148.57),
il<E4>***@fastmail.fm(il<E4>***@fastmail.fm)
Apr 2 12:48:42 hobbes sqlgrey: dbaccess: warning: couldn't do query:
UPDATE from_awl SET last_seen = NOW(), first_seen = first_seen WHERE
sender_name = 'il<E4>nderkebym' AND sender_domain = 'fastmail.fm' AND
src = '212.56.148.57': Illegal mix of collations
(latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for
operation '=', reconnecting to DB
-----------------------------------------------------------------------

MySQL information:

-----------------------------------------------------------------------
mysql> SHOW VARIABLES LIKE '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE '%collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
-----------------------------------------------------------------------

So in general, my system is setup to use UTF-8. Keep in mind that
client and connection are referring to the mysql command line client here.

-----------------------------------------------------------------------
mysql> SHOW CHARACTER SET LIKE 'latin1';
+---------+----------------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+----------------------+-------------------+--------+
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
+---------+----------------------+-------------------+--------+
1 row in set (0.00 sec)
-----------------------------------------------------------------------

This shows that latin1 causes the latin1_swedish_ci collation to be set
as default.

-----------------------------------------------------------------------
mysql> SHOW FULL COLUMNS FROM from_awl ;
+---------------------------------------------+
| Field: sender_name |
| Type: varchar(64) |
| Collation: latin1_swedish_ci |
| Null: NO |
| Key: PRI |
| Default: NULL |
| Extra: |
| Privileges: select,insert,update,references |
| Comment: |
+---------------------------------------------+
| Field: sender_domain |
| Type: varchar(255) |
| Collation: latin1_swedish_ci |
| Null: NO |
| Key: PRI |
| Default: NULL |
| Extra: |
| Privileges: select,insert,update,references |
| Comment: |
+---------------------------------------------+
| Field: src |
| Type: varchar(39) |
| Collation: latin1_swedish_ci |
| Null: NO |
| Key: PRI |
| Default: NULL |
| Extra: |
| Privileges: select,insert,update,references |
| Comment: |
+---------------------------------------------+
| Field: first_seen |
| Type: timestamp |
| Collation: NULL |
| Null: NO |
| Key: |
| Default: CURRENT_TIMESTAMP |
| Extra: |
| Privileges: select,insert,update,references |
| Comment: |
+---------------------------------------------+
| Field: last_seen |
| Type: timestamp |
| Collation: NULL |
| Null: NO |
| Key: MUL |
| Default: 0000-00-00 00:00:00 |
| Extra: |
| Privileges: select,insert,update,references |
| Comment: |
+---------------------------------------------+
5 rows in set (0.01 sec)
-----------------------------------------------------------------------
[I reformatted that output so it wouldn't have to wrap.]

So the Swedish collation is on all the text fields....

And here are a few of the relevant lines from an mysqldump of my
sqlgrey database:
-----------------------------------------------------------------------
--
-- Table structure for table `from_awl`
--

DROP TABLE IF EXISTS `from_awl`;
SET @saved_cs_client = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `from_awl` (
`sender_name` varchar(64) NOT NULL,
`sender_domain` varchar(255) NOT NULL,
`src` varchar(39) NOT NULL,
`first_seen` timestamp NOT NULL default CURRENT_TIMESTAMP on update
CURRENT_TIMESTAMP,
`last_seen` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`src`,`sender_domain`,`sender_name`),
KEY `from_awl_lseen` (`last_seen`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SET character_set_client = @saved_cs_client;
-----------------------------------------------------------------------

2nd line from the bottom has "DEFAULT CHARSET=latin1;" which shows that
I followed the HOWTO's instructions when creating the databases. (I
checked another database and it has "DEFAULT CHARSET=utf8" like I'd
expect if I didn't set something explicitly.)



So I have two questions:

1.) What's wrong?
2.) How can I fix it?


Right now, my leaning is that the connection from sqlgrey is UTF-8
since that's my system's default and that's causing the error about utf8
and swedish fighting.
Now I know next to nothing about non-ASCII encodings, MySQL's deep dark
secrets, and Perl. But I did find this page about setting the
connection's encoding for mysql:

http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
--
.----------. ,----------== Peep ==----------,
/ .-. .-. \ `-== (Philippe Chaintreuil) ==-`
/ | | | | \
\ `-' `-' _/ FidoNet Netmail Address:
/\ .--. / | 1:2613/118
\ | / / / / InterNet Address:
/ | `--' /\ \ ***@parallaxshift.com
/`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST--
"We're just two lost souls swimming in a fish bowl..."
Lionel Bouton
2008-04-02 18:22:00 UTC
Permalink
Post by Philippe Chaintreuil
1.) What's wrong?
MySQL ? I'm wondering why it defaults to a charset different than the
database one. It seems counterintuitive at best. If you set your
database charset to a non-default value it's for a damn good reason...
Post by Philippe Chaintreuil
2.) How can I fix it?
Without patching SQLgrey, you might have luck changing the my.cnf file
to change the default client charset (assuming you don't have other
clients needing to connect with a different charset to the same system).
If you want to hack SQLgrey, look for the end of the connectdb method,
it already has a special case for MySQL (auto-reconnect as MySQL
routinely drops conections by default...).
You can then send the SQL query setting the character set you need at
this point :

$self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1")

The MySQL doc isn't clear about what happens when the client
auto-reconnects... so your mileage may vary. It would be great if the
DBD driver would allow to set the character set at connection-time, but
I had no luck browsing it's documentation either.

Lionel.
Philippe Chaintreuil
2008-05-31 11:12:45 UTC
Permalink
Post by Lionel Bouton
Without patching SQLgrey, you might have luck changing the my.cnf file
to change the default client charset (assuming you don't have other
clients needing to connect with a different charset to the same
system). If you want to hack SQLgrey, look for the end of the
connectdb method, it already has a special case for MySQL
(auto-reconnect as MySQL routinely drops conections by default...).
You can then send the SQL query setting the character set you need at
$self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1")
The MySQL doc isn't clear about what happens when the client
auto-reconnects... so your mileage may vary. It would be great if the
DBD driver would allow to set the character set at connection-time,
but I had no luck browsing it's documentation either.
So I made this change back on April 3rd:

------------------------------------------------------------------------
# mysql drops the connection, we have some glue code
# to reinit the connection, but better use mysql DBD code
if ($self->MySQL()) {
$self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1")
or $self->mylog('dbaccess', 0,
"can't set connection character set to LATIN1: $DBI::errstr");

$self->{sqlgrey}{dbh}->{mysql_auto_reconnect} = 1;
}
------------------------------------------------------------------------

and yesterday I got this:

------------------------------------------------------------------------
May 30 14:23:24 hobbes postfix/smtpd[28381]: warning: 190.172.137.78:
address not listed for hostname 190-172-137-78.speedy.com.ar
May 30 14:23:24 hobbes postfix/smtpd[28381]: connect from
unknown[190.172.137.78]
May 30 14:23:26 hobbes postfix/pickup[28070]: 73E491AFEA9: uid=102
from=<sqlgrey>
May 30 14:23:26 hobbes postfix/cleanup[28424]: 73E491AFEA9:
message-id=<***@hobbes.parallaxshift.com>
May 30 14:23:26 hobbes postfix/qmgr[15197]: 73E491AFEA9:
from=<***@hobbes.parallaxshift.com>, size=490, nrcpt=1 (queue active)

May 30 14:23:26 hobbes sqlgrey: warning: Use of uninitialized value in
concatenation (.) or string at /usr/sbin/sqlgrey line 1143.
May 30 14:23:26 hobbes sqlgrey: dbaccess: error: couldn't access
from_awl table:
May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating
190.172.137.78(190.172.137.78),
m<FC>***@lycosmail.com(m<FC>***@lycosmail.com)
------------------------------------------------------------------------

So I got a new warning about a bad concatenation/string when I got a
non [A-Za-z] letter (the "<FC>"). The line referenced above is the
mylog() line below from is_in_from_awl():

------------------------------------------------------------------------
if (!defined $sth or !$sth->execute($sender_name, $sender_domain,
$host)) {
$self->db_unavailable();
$self->mylog('dbaccess', 0, "error: couldn't access $from_awl
table: $DBI::errstr");
return 1; # in doubt, accept
} else {
------------------------------------------------------------------------

I went back through my logs trying to see if I could find any other non
[A-Za-z] characters in e-mail addresses in sqlgrey lines since April and
I couldn't find any. (Note that does not mean that they're weren't any
-- I had 26 megs of sqlgrey lines from that time and I don't quite know
what to grep for to find 'em.)

So the two questions I have at this point are:

1.) Lionel, any suggestions what to do to debug this warning so I can
get the real error?
2.) Anybody know a grep regexp for pulling out e-mail addresses with
those non [A-Za-z] characters?
--
.----------. ,----------== Peep ==----------,
/ .-. .-. \ `-== (Philippe Chaintreuil) ==-`
/ | | | | \ On The Other Side Of The Wall!
\ `-' `-' _/ FidoNet Netmail Address:
/\ .--. / | 1:2613/118
\ | / / / / InterNet Address:
/ | `--' /\ \ ***@parallaxshift.com
/`-------' \ \ --PiCTuRe By aN uNKNoWN aSCii aRTiST--
"We're just two lost souls swimming in a fish bowl..."
Dan Faerch
2008-05-31 11:48:01 UTC
Permalink
Is it even legal to use international chars in an email-address?


Well anyway. I assume its this line you want to grep out:

May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating
190.172.137.78(190.172.137.78),
m<FC>***@lycosmail.com(m<FC>***@lycosmail.com)


If so, try this:
$ grep "from awl match:" mail.log | grep -viE "\([[:graph:]]+@"

If it fails, use this as a "less accurate" alternative:
$ grep "from awl match:" mail.log | grep -viE "\([a-z?~*=.#%&+/_0-9-]+@"

(replace "mail.log" with whatever file youre grepping)

- Dan
Post by Philippe Chaintreuil
Post by Lionel Bouton
Without patching SQLgrey, you might have luck changing the my.cnf file
to change the default client charset (assuming you don't have other
clients needing to connect with a different charset to the same
system). If you want to hack SQLgrey, look for the end of the
connectdb method, it already has a special case for MySQL
(auto-reconnect as MySQL routinely drops conections by default...).
You can then send the SQL query setting the character set you need at
$self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1")
The MySQL doc isn't clear about what happens when the client
auto-reconnects... so your mileage may vary. It would be great if the
DBD driver would allow to set the character set at connection-time,
but I had no luck browsing it's documentation either.
------------------------------------------------------------------------
# mysql drops the connection, we have some glue code
# to reinit the connection, but better use mysql DBD code
if ($self->MySQL()) {
$self->{sqlgrey}{dbh}->do("SET CHARACTER SET LATIN1")
or $self->mylog('dbaccess', 0,
"can't set connection character set to LATIN1: $DBI::errstr");
$self->{sqlgrey}{dbh}->{mysql_auto_reconnect} = 1;
}
------------------------------------------------------------------------
------------------------------------------------------------------------
address not listed for hostname 190-172-137-78.speedy.com.ar
May 30 14:23:24 hobbes postfix/smtpd[28381]: connect from
unknown[190.172.137.78]
May 30 14:23:26 hobbes postfix/pickup[28070]: 73E491AFEA9: uid=102
from=<sqlgrey>
May 30 14:23:26 hobbes sqlgrey: warning: Use of uninitialized value in
concatenation (.) or string at /usr/sbin/sqlgrey line 1143.
May 30 14:23:26 hobbes sqlgrey: dbaccess: error: couldn't access
May 30 14:23:26 hobbes sqlgrey: grey: from awl match: updating
190.172.137.78(190.172.137.78),
------------------------------------------------------------------------
So I got a new warning about a bad concatenation/string when I got a
non [A-Za-z] letter (the "<FC>"). The line referenced above is the
------------------------------------------------------------------------
if (!defined $sth or !$sth->execute($sender_name, $sender_domain,
$host)) {
$self->db_unavailable();
$self->mylog('dbaccess', 0, "error: couldn't access $from_awl
table: $DBI::errstr");
return 1; # in doubt, accept
} else {
------------------------------------------------------------------------
I went back through my logs trying to see if I could find any other non
[A-Za-z] characters in e-mail addresses in sqlgrey lines since April and
I couldn't find any. (Note that does not mean that they're weren't any
-- I had 26 megs of sqlgrey lines from that time and I don't quite know
what to grep for to find 'em.)
1.) Lionel, any suggestions what to do to debug this warning so I can
get the real error?
2.) Anybody know a grep regexp for pulling out e-mail addresses with
those non [A-Za-z] characters?
Dan Faerch
2008-05-31 13:35:35 UTC
Permalink
Hey all..

During "bounce storms" like backscatter and such, we startet getting
timeouts between postfix and sqlgrey. After a little poking, i loosly
concluded that sql hangs for a bit and thus sqlgrey hans for a bit.

The sql-hang occurs, afaik, when the sql-master (we run in clustering
mode) gets hammered with inserts and updates.
Im gonna upgrade the master sql-server at some point, but i also looked
a bit at what "write" statements that hammers the master.

Of course inserts into the connect (new connection triplet) is by far
the top hitter when it comes to writes. But during a bounce storm,
UPDATES to From_AWL actually makes out around 20-25% of all writes.
But its not necessary to update EVERY time, since in a bounce scenario,
most sender adresses will be blank. (ie NULL senders) and many IPs will
return often.

So yesterday i whipped up a way of not updating every time for the same
record. And it turns out it made a huge difference.
From_AWL writings are now reduced by 97%

What ive done:
is_in_from_awl normally SELECT's "1" if record exists in AWL.
I now select the "last_seen" timestamp. The function now also returns
this timestamp, instead of 1 for all "true" scenarios.
It will still return 0 on false, thus no logic needs changing elsewhere
due to this change.

The timestamp is compared to time() to see how long ago last update was
done. If enough time has passed, it makes the UPDATE to the table.

Right now the UPDATE delay is configured as $dflt{awl_age}(in seconds) *
0.004167.
This makes ~6 hours, if awl_age=60 days and ~3 hours if awl_age=30 days
and so forth.

I dont really see a downside to this approach. All From_AWL entries will
still be updated on the first hit within the time period. so worst case
scenario, would be that a from_awl entry would be cleaned out 3 hours
early. Compared to a period of 30 days, it doenst really matter.
In the log it will say: "from awl match: updating" (as before) when
actually updating the table and when NOT updating it will simply say
"from awl match:" (without the word "updating")


Oh yeah, and the feature can be enabled/disabled with the flag
"db_load_reduction".

So.. This is a "request for comments" to make sure i didnt miss any
pitfalls and such. A patch for consideration is attached.


- Dan Faerch

Loading...