Discussion:
How to send with charset=iso-8859-1 instead of unknown-8bit?
m***@raf.org
2018-06-20 02:16:12 UTC
Permalink
Hi,

I have some software that invokes mutt (non-interactively) to
send email with iso-8859-1 body text.

I've noticed that emails with accented characters are being sent
with charset=unknown-utf8 instead of charset=iso-8859-1.

The muttrc manpage says that the default value for send_charset
is "us-ascii:iso-8859-1:utf-8" and that, "In case the text
cannot be converted into one of these exactly, mutt uses
$charset as a fallback". The default charset value is utf-8 (but
the body text is not being entered via the terminal). There is
no mention of unknown-8bit.

I don't understand what conversion is referred to here. I would
have thought (incorrectly, no doubt) that mutt would use the
first character set in send_charset that could be the character
set of the body (i.e. just detection, not conversion).

But if that were the case, the default send_charset would almost
always result in us-ascii or iso-8859-1 being used since most 8
bit characters are valid iso-8859-1. If my understanding were
right, it would make more sense for the default send_charset to
be "us-ascii:utf-8:iso-8859-1" (or "us-ascii:utf-8:unknown-8bit").

So I'm clearly not understanding how it works. But I'm only
thinking this way because that's how vim works with its
fileencodings variable.

What am I not understanding? And how do I make mutt set the
charset of outgoing mail to iso-8859-1 when it detects accented
(iso-8859-1) characters?

Thanks,
raf
Ian Zimmerman
2018-06-20 17:49:11 UTC
Permalink
Post by m***@raf.org
I have some software that invokes mutt (non-interactively) to
send email with iso-8859-1 body text.
My guess is that mutt looks at the locale environment (LANG and LC_*) to
set the encoding of the source data, and tries to recode it into one
of the encodings in send_charset.

If you _know_ your data is iso-8859-1 but your LANG etc. is something
else, try changing LANG locally in your driver script/program.

BTW, running mutt non-interactively has always seemed strange to me.
Why not use something simpler like mailx, or even /usr/sbin/sendmail?
--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.
m***@raf.org
2018-06-21 00:42:58 UTC
Permalink
Post by Ian Zimmerman
Post by m***@raf.org
I have some software that invokes mutt (non-interactively) to
send email with iso-8859-1 body text.
My guess is that mutt looks at the locale environment (LANG and LC_*) to
set the encoding of the source data, and tries to recode it into one
of the encodings in send_charset.
If you _know_ your data is iso-8859-1 but your LANG etc. is something
else, try changing LANG locally in your driver script/program.
Thanks. I'll try that. Mutt probably detects that it isn't valid utf-8,
and so doesn't match the system locale, and so can't be converted.
That would make sense.

I wonder if setting charset to iso-8859-1 would also fix it.
That's for the terminal so it's probably not wise to change that.
Maybe assumed_charset? (maybe that's only for incoming messages).

I ran some tests and setting LANG=en_AU.iso88591 fixes it.
So does setting charset=iso-8859-1 in .muttrc.
But setting assumed_charset or send_charset doesn't fix it.

Strangely, when I perform these tests, when it gets it "wrong",
it's using utf-8 as the character set rather than unknown-8bit.
I don't understand that but it's OK. It is as a different user.
That might have something to do with it. The user producing the
unknown-8bit emails did have send_charset=us-ascii:iso-8859-1
in their .muttrc (I'd forgotten) but the user performing these
tests didn't).

The main thing is that I now have two ways to fix the problem.

I think setting LANG when sending the mail as you suggest is the
best method (and leaving charset as it is so that it matches the
terminal for when reading mail later).

Thanks again.
Post by Ian Zimmerman
BTW, running mutt non-interactively has always seemed strange to me.
Why not use something simpler like mailx, or even /usr/sbin/sendmail?
Because they don't know about ~/.muttrc or mutt's -e option :-)

Mainly, I want mutt to keep a record of outgoing mail in an mbox
that I might need to examine later, and I'll be using mutt when I do.

If I ever need to send encrypted mail programmatically, I'd probably
want to do that via mutt as well.

cheers,
raf
Derek Martin
2018-06-24 03:04:27 UTC
Permalink
Post by m***@raf.org
Post by Ian Zimmerman
My guess is that mutt looks at the locale environment (LANG and LC_*) to
set the encoding of the source data, and tries to recode it into one
of the encodings in send_charset.
If you _know_ your data is iso-8859-1 but your LANG etc. is something
else, try changing LANG locally in your driver script/program.
Thanks. I'll try that. Mutt probably detects that it isn't valid utf-8,
and so doesn't match the system locale, and so can't be converted.
That would make sense.
Yes. Mutt assumes that its input is correctly encoded according to
the system's configured locale. The conversion you're not clear about
is that when it sends a message, it tries to convert from your
configured locale, whatever it is, to each of the character sets in
the send_charset variable, in order, until it finds one for which the
conversion does not fail. It sends that converted version. The
default setting therefore should usually guarantee, for all
English-speaking users and many non-English-speaking European users,
that the outgoing e-mail will be encoded in the simplest encoding
possible, given its contents; i.e. it will send US-ASCII if it can,
then iso-8859-1, unless full Unicode support is required to represent
the data.

Your problem most likely is exactly what you guessed: Your locale is
UTF-8, but the data is not valid Unicode, so all conversions failed.
Mutt just sends the bytes you fed it, and it appears in your case it
failed to default to a reasonable character set (rather than
$charset), for whatever reason related to your combination of locale
and Mutt settings.
Post by m***@raf.org
I wonder if setting charset to iso-8859-1 would also fix it.
This might work, but it's not the "right" fix... The right fix is to
make sure that the data you're sending actually matches your system's
locale settings. Note that given the default send_charset settings,
it should be possible for you to actually use UTF-8, and have mutt
convert the e-mail to iso-8859-1, if you really want that for some
reason, since it will try to use the first matching charset in your
send_charset to which it can successfully convert the data, as I
described above. I used to do this for Korean that I drafted in UTF-8,
since at the time a lot of Koreans still had systems (Win98) that only
supported EUC-KR (WinXP had been out for years, but some people are
extremely slow to update their systems)...

But if you're using Unicode locally, why not just send it as UTF-8 and
be done with it? These days it should be just about impossible to
find people using e-mail on systems that can't handle Unicode. And if
the only reason is that the data is already in ISO-8859-1 and you
don't know how to convert it, that's easy to fix: Just use the iconv
command (iconv is both a C library and a system command). You can
just convert it from iso-8859-1 to en_AU.UTF-8 once and stop mucking
with incompatible charater set settings. See the man page for
details, but it's pretty simple... you just specify the input
character set and the output character set.
Post by m***@raf.org
That's for the terminal so it's probably not wise to change that.
Maybe assumed_charset? (maybe that's only for incoming messages).
You should really never set charset explicitly. If your system is
configured properly, there's virtually never any need to do it, as
Mutt will correctly use your system's locale settings, which would be
the preferred way to make sure things are set up correctly. The main
exception is if you have a large pile of pre-existing data that's in
some other charset besides the one you use, which you'll use in some
fashion other than typing it in manually, and converting it would be
prohibitively costly.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
Loading...