Discussion:
Queue bugs, attn Pat :)
Alex Hudson
2008-10-13 11:10:16 UTC
Permalink
Bugfinder general Jur found these messages in his queue log:

Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1
Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1
Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1
Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1
Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1
Oct 12 11:31:38 y104 syslog[17710]: DEBUG queue - Handled command
from 127.0.0.1

From looking at the code, this seems to be a path in the command loop
where it's handling commands it doesn't recognise. Because it's spinning
around in this big loop, I'm assuming that there is a bug - probably in
SMTP - where the agent and the queue are getting "out of sync" protocol
wise, and the queue is trying to interpret commands out of a message
body or something.

So, there are two things here:

a. I wonder if we can work out somehow what's actually going on here and
causing stuff to fail badly. Jur had run out of disk space, so
potentially something is ignoring an 'out of space' error condition and
sending the mail on anyway (this seems likely to me).
b. why are we trying to tolerate command protocol errors? It seems
pretty likely to me that one error can easily be compounded by another.
Why not just quit the connection? At worse, the agent will reconnect and
make the same mistake.

Cheers,

Alex.
Alex Hudson
2008-10-15 12:06:07 UTC
Permalink
More bugs :)

I've just fixed a couple of obvious bugs in smtpc on trunk. One of them
was the fact we weren't closing connections to remote SMTP servers when
delivery failed, only on success - so on my machine I had loads of
connections open to the same SMTP server.

One problem I have noticed: when we do DELIVER_TRY_LATER for temporary
errors, the "later" appears to be "one or two seconds" - it retries
delivery almost immediately, which is pretty clearly wrong.

I think this is a queue bug too?

Thanks

Alex.
Patrick Felt
2008-10-16 03:17:47 UTC
Permalink
this is most definately a queue bug. i'll check into that one as i'm not sure that the queue runner checks the timestamp on the mail envelope. if not that is where the retry is occuring.

as for the other email, are we sure that we want to kill the connection if an invalid queue command come through? that seems like it would break the protocol model (which is based off smtp). why there are so many on jur's machine? Dunno other than, and this is a question for jur, is the queue port (8670) blocked at the firewall? we don't want people connecting to 8670 or 689 that aren't trusted.


-----Original Message-----
From: Alex Hudson <alex-b4STkJ5ddZioClj4AeEUq9i2O/***@public.gmane.org>
To: bongo-devel-8nu/***@public.gmane.org
Date: Wed, 15 Oct 2008 13:06:07 +0100
Subject: Re: [Bongo-devel] Queue bugs, attn Pat :)

More bugs :)

I've just fixed a couple of obvious bugs in smtpc on trunk. One of them
was the fact we weren't closing connections to remote SMTP servers when
delivery failed, only on success - so on my machine I had loads of
connections open to the same SMTP server.

One problem I have noticed: when we do DELIVER_TRY_LATER for temporary
errors, the "later" appears to be "one or two seconds" - it retries
delivery almost immediately, which is pretty clearly wrong.

I think this is a queue bug too?

Thanks

Alex.
Alex Hudson
2008-10-16 07:05:14 UTC
Permalink
Post by Patrick Felt
this is most definately a queue bug. i'll check into that one as i'm not sure that the queue runner checks the timestamp on the mail envelope. if not that is where the retry is occuring.
Cool. I looked briefing into what smtpc was doing, and it was some QRAW
command which was meaningless to me :D
Post by Patrick Felt
as for the other email, are we sure that we want to kill the connection if an invalid queue command come through? that seems like it would break the protocol model (which is based off smtp).
Well, I think there is an upside and a downside here. The downside, as
you rightly say, is that it breaks forward compatibility. If you enter a
command which isn't recognised, that could be recoverable with a smart
client.

The problem, for me, is that it seems like it's relatively easy for the
protocol to get "out of sync" - that is, when a client makes a mistake -
like not detecting an error or something - it may be doing things like
sending mail content when it shouldn't and queue is trying to interpret
that as commands. At best, that makes things buggy - at worst, it's
actually a security hole, because we can potentially inject commands
into the session.

I think we can address forward compatibility with other means; the
synchronisation problem - which is actually the exact same issue I just
fixed on smtpc :) - is a bit of a worry for me though.
Post by Patrick Felt
why there are so many on jur's machine? Dunno other than, and this is a question for jur, is the queue port (8670) blocked at the firewall? we don't want people connecting to 8670 or 689 that aren't trusted.
We probably should be making those connections listen on localhost only,
too, to be fair.

When I looked at the code, the commands were "." and "" - it looks like
some kind of SMTP left-over or something. Ah, maybe SMTP is passing
through the .\r\n\r\n from then end of a DATA command? In fact, that's
probably it...

Cheers,

Alex.

Loading...