Discussion:
How to convert maildir to mbox format
(too old to reply)
Larry Alkoff
2004-06-19 02:59:53 UTC
Permalink
There are many mbox to maildir format conversion programs
but I have not found one going the other way.

Does anyone know of such?


It doesn't seem that it would be so hard to make:

touch file.mbx to get the thing started.

Then a script which adds using >>

for *.msg do
blank line
From "Return-Path" "day-date-time"
message

one at a time to a file.mbx using the >> addend.

I really want to change a gig of mailbox email to mbox format.

Larry Alkoff



Larry Alkoff N2LA - Austin TX
Charles Cazabon
2004-06-19 03:32:39 UTC
Permalink
Post by Larry Alkoff
There are many mbox to maildir format conversion programs
but I have not found one going the other way.
Does anyone know of such?
Mutt does such. Open an mbox, tag every message, and tag-save them to a
pre-existing maildir.

qmail also includes "maildir2mbox".
[...]
Post by Larry Alkoff
for *.msg do
blank line
From "Return-Path" "day-date-time"
message
one at a time to a file.mbx using the >> addend.
Not quite. You'll need to do From escaping as well.

Charles
--
-----------------------------------------------------------------------
Charles Cazabon <***@discworld.dyndns.org>
GPL'ed software available at: http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------
Charles Cazabon
2004-06-19 15:59:58 UTC
Permalink
Open an mbox, tag every message, and tag-save them to a pre-existing
maildir.
My thinko: other way around, obviously.

Charles
--
-----------------------------------------------------------------------
Charles Cazabon <***@discworld.dyndns.org>
GPL'ed software available at: http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------
David Champion
2004-06-19 03:31:59 UTC
Permalink
Post by Larry Alkoff
There are many mbox to maildir format conversion programs
but I have not found one going the other way.
Does anyone know of such?
You can use formail (from the procmail package) for this.

cd maildir
: > ../mbox
for file in new/*; do
formail -I Status: <"$file" >>../mbox
done
for file in cur/*; do
formail -a "Status: RO" <"$file" >>../mbox
done


Alternatively, mutt will save a mailbox in any supported format to
any other supported format. Set $mbox_type in muttrc to control the
preferred save format. You can use mutt to batch-convert a bunch of
scripts, if you set various options to disable prompting, and use the -e
option to mutt to feed it commands at startup.
--
-D. ***@uchicago.edu NSIT::ENSS
No money, no book. No book, no study. No study, no pass.
No pass, no graduate. No graduate, no job. No job, no money.
T h e U n i v e r s i t y o f C h i c a g o
Larry Alkoff
2004-06-19 14:06:26 UTC
Permalink
Post by David Champion
Post by Larry Alkoff
There are many mbox to maildir format conversion programs
but I have not found one going the other way.
Does anyone know of such?
You can use formail (from the procmail package) for this.
cd maildir
: > ../mbox
for file in new/*; do
formail -I Status: <"$file" >>../mbox
done
for file in cur/*; do
formail -a "Status: RO" <"$file" >>../mbox
done
Alternatively, mutt will save a mailbox in any supported format to
any other supported format. Set $mbox_type in muttrc to control the
preferred save format. You can use mutt to batch-convert a bunch of
scripts, if you set various options to disable prompting, and use the -e
option to mutt to feed it commands at startup.
Hello David.

I tried your script and it worked to create an mbox like file.
Amazing that it's so easy to do and that all my googling didn't turn up "formail".


One problem.
In the created mbox file, each
"From " line that starts a message contains a line like for example:
From sentto-2577139-187-985364714-labradley=***@returns.onelist.com Sat Jun 1
9 08:55:03 2004


Is this a proper mbox format?
I've looked at a few mbox files and the From lines I've seen all start with "From " then the email address, then date-time.
In the above case it would have been:
From labradley=***@returns.onelist.com Sat Jun 1 9 08:55:03 2004

Is this a format that Mutt would understand?

Thanks for your help,
Larry





Larry Alkoff N2LA - Austin TX
Patrick Shanahan
2004-06-19 15:08:33 UTC
Permalink
Post by Larry Alkoff
Is this a proper mbox format?
I've looked at a few mbox files and the From lines I've seen all start
with "From " then the email address, then date-time.
Is this a format that Mutt would understand?
yes
--
Patrick Shanahan Registered Linux User #207535
http://wahoo.no-ip.org @ http://counter.li.org
HOG # US1244711 Photo Album: http://wahoo.no-ip.org/photos
David Champion
2004-06-19 21:39:46 UTC
Permalink
Post by Larry Alkoff
I tried your script and it worked to create an mbox like file.
Amazing that it's so easy to do and that all my googling didn't turn up "formail".
Formail/procmail are sort of a swiss army knife for mail. They can be
employed to do a lot of things they weren't specifically designed for.
:)
Post by Larry Alkoff
One problem.
In the created mbox file, each
9 08:55:03 2004
Is this a proper mbox format?
I've looked at a few mbox files and the From lines I've seen all start with "From " then the email address, then date-time.
I think that the difference you're asking about is just the
"sentto-2577139-187-985364714-" part, and that the ">From" and the
newline in the date are just artifacts. (I mention this just to
check that I'm right.) In that case, this looks OK to me.

The "From " line contains the envelope address -- the address used in
the SMTP transaction. This address can differ from the From: address in
the header. That's permissible and useful.

The complicated return address is associated with a bounce processor on
the sending computer. The method is known as "VERP" -- variable envelope
reply processing, IIRC. It assigns a unique tag to each outbound
message, so that on a bounce, the list server can identify precisely
which outbound address triggered the bounce, through arbitrary layers of
forwarding.

So, if you forwarded ***@mindspring.com to, say,
***@earthlink.net, and then your earthlink account expired,
the mail would bounce at Earthlink's server. But because of the VERP
address, the list server would know that the address on its list
is ***@mindspring.com, even though the bounce message says
that ***@earthlink.net is the bad address. Then it could take
appropriate action on your list membership despite having inaccurate
information in the bounce.

You'll see this for many list memberships. Formail didn't invent it out
of nothing, it just found that information in the Maildir and created a
"From " line that replicates it. For direct mail from person to person,
you probably won't see that kind of thing unless the sender's mail
system is trying to be very clever.

(I'm supposing a little bit here about how onelist.com's list software
works, but it seems reasonable. Full headers for a message would tell
for sure.)

It's also worth note that the "From " line probably doesn't matter,
anyway, as long as there's something address-like there. Generally these
are ignored once they're in the mbox file; they're mostly useful just
for tracing a message's path through SMTP. For mutt's purposes, only the
From: header matters.

Does that answer your question?
--
-D. ***@uchicago.edu NSIT::ENSS
No money, no book. No book, no study. No study, no pass.
No pass, no graduate. No graduate, no job. No job, no money.
T h e U n i v e r s i t y o f C h i c a g o
Larry Alkoff
2004-06-20 13:56:33 UTC
Permalink
Post by David Champion
Post by Larry Alkoff
I tried your script and it worked to create an mbox like file.
Amazing that it's so easy to do and that all my googling didn't turn up "formail".
Formail/procmail are sort of a swiss army knife for mail. They can be
employed to do a lot of things they weren't specifically designed for.
:)
Post by Larry Alkoff
One problem.
In the created mbox file, each
9 08:55:03 2004
Is this a proper mbox format?
I've looked at a few mbox files and the From lines I've seen all start with "From " then the email address, then date-time.
I think that the difference you're asking about is just the
"sentto-2577139-187-985364714-" part, and that the ">From" and the
newline in the date are just artifacts. (I mention this just to
check that I'm right.) In that case, this looks OK to me.
The "From " line contains the envelope address -- the address used in
the SMTP transaction. This address can differ from the From: address in
the header. That's permissible and useful.
The complicated return address is associated with a bounce processor on
the sending computer. The method is known as "VERP" -- variable envelope
reply processing, IIRC. It assigns a unique tag to each outbound
message, so that on a bounce, the list server can identify precisely
which outbound address triggered the bounce, through arbitrary layers of
forwarding.
the mail would bounce at Earthlink's server. But because of the VERP
address, the list server would know that the address on its list
appropriate action on your list membership despite having inaccurate
information in the bounce.
You'll see this for many list memberships. Formail didn't invent it out
of nothing, it just found that information in the Maildir and created a
"From " line that replicates it. For direct mail from person to person,
you probably won't see that kind of thing unless the sender's mail
system is trying to be very clever.
(I'm supposing a little bit here about how onelist.com's list software
works, but it seems reasonable. Full headers for a message would tell
for sure.)
It's also worth note that the "From " line probably doesn't matter,
anyway, as long as there's something address-like there. Generally these
are ignored once they're in the mbox file; they're mostly useful just
for tracing a message's path through SMTP. For mutt's purposes, only the
From: header matters.
Does that answer your question?
I looked at the original .msg files and compared them to the created mbox file.
It seems that _every_ message in the file was created by formail with that "funny" From line.
None of the original messages seem to have anything like that so I'm wondering if and why formail
created the data and where it came from.

I've attached one of the original messages (ga20xs0.msg)
and cut out the same message from the mbox file (formail_msg).

Mutt is reading the mbox file ok but I'm quite curious.

Larry



Larry Alkoff N2LA - Austin TX
David Champion
2004-06-20 22:10:34 UTC
Permalink
Your messages came through fine -- maybe it was just your own incoming
copy that was blocked or mangled by Mindspring?

Anyway, I'm referring here to your first message, because it has
more complete content for the mbox message. Your resend uses a
message/rfc822 attachment, so it lacks the V7 From_ line. Since that's
what we're looking at, I went with the other one. :) The first copy
you sent included the full, literal mbox message, base64-encoded as an
octet-stream.
Post by Larry Alkoff
I looked at the original .msg files and compared them to the created
mbox file. It seems that _every_ message in the file was created by
formail with that "funny" From line. None of the original messages
seem to have anything like that so I'm wondering if and why formail
created the data and where it came from.
I think you'll see it if you look at the message with full headers.
(Press "h" to toggle "header weeding".)
Post by Larry Alkoff
I've attached one of the original messages (ga20xs0.msg) and cut out
the same message from the mbox file (formail_msg).
The header in Maildir and MH folders that's analogous to the V7 ("mbox")
"From " line is Return-Path:. Here's the Return-Path: from your maildir
I think formail will construct a From_ line from the From: header, if
Return-Path: is missing, but since Return-Path: represents the envelope
sender, and so does From_, that's the ideal.
--
-D. ***@uchicago.edu NSIT::ENSS
No money, no book. No book, no study. No study, no pass.
No pass, no graduate. No graduate, no job. No job, no money.
T h e U n i v e r s i t y o f C h i c a g o
Larry Alkoff
2004-06-21 11:06:16 UTC
Permalink
Post by David Champion
Your messages came through fine -- maybe it was just your own incoming
copy that was blocked or mangled by Mindspring?
Glad it came through. Probably only my incoming blocked as you say.
Post by David Champion
Anyway, I'm referring here to your first message, because it has
more complete content for the mbox message. Your resend uses a
message/rfc822 attachment, so it lacks the V7 From_ line. Since that's
what we're looking at, I went with the other one. :) The first copy
you sent included the full, literal mbox message, base64-encoded as an
octet-stream.
What I did was load the entire mbox file into an editor.
I selected the first message in its entirety, pasted it to a file and attached it to the first message to you.
Second message to you I simply edited formail.msg and ga20xs0.msg and removed only the body of the message.
Post by David Champion
Post by Larry Alkoff
I looked at the original .msg files and compared them to the created
mbox file. It seems that _every_ message in the file was created by
formail with that "funny" From line. None of the original messages
seem to have anything like that so I'm wondering if and why formail
created the data and where it came from.
I think you'll see it if you look at the message with full headers.
(Press "h" to toggle "header weeding".)
I'm not looking at the message in Mutt but using less or an editor on the mbox file.
Post by David Champion
Post by Larry Alkoff
I've attached one of the original messages (ga20xs0.msg) and cut out
the same message from the mbox file (formail_msg).
The header in Maildir and MH folders that's analogous to the V7 ("mbox")
"From " line is Return-Path:. Here's the Return-Path: from your maildir
I think formail will construct a From_ line from the From: header, if
Return-Path: is missing, but since Return-Path: represents the envelope
sender, and so does From_, that's the ideal.
I think I understand now.
The first part of the From header is literally the contents of Return-Path:
That is what I would expect and didn't look closely enough at Return-Path:.

It looks like formail did its job very nicely and now I have a method of easily converting all the .msg files
in a directory to mbox format. This gives me the mbox option.


This is another subject and if we start discussing this I'll change the subject.
That's what you get for being so helpful - more questions <g>

I'm still stuck on how to organize the 250+ subdirectories of files into a structure Mut will be happy with.
A lot of those subdirs are rarely used, like trxmanager/save for messages I want to save but hardly ever need to see.

When I simply copy them over from Windows and make them maildirs by adding new, cur and tmp, I can read the messages in
each folder if I can get to them but Mutt doesn't easily navigate to them. Usually I have to specify the full path which is a PITA.

Does Mutt have a way to selectively hide mail directories that clutter up the folder view?
Alternatively is there a way for Mutt to hide entire sections of the folder view?

Most graphical MUA have a little plus sign you can click on to expand and see the tree it represents.
I don't think Mutt can do that but Sylpheed can. However, Sylpheed uses MH format which I don't like.





Larry Alkoff N2LA - Austin TX

Larry Alkoff
2004-06-20 17:28:04 UTC
Permalink
Post by David Champion
Post by Larry Alkoff
I tried your script and it worked to create an mbox like file.
Amazing that it's so easy to do and that all my googling didn't turn up "formail".
Formail/procmail are sort of a swiss army knife for mail. They can be
employed to do a lot of things they weren't specifically designed for.
:)
Post by Larry Alkoff
One problem.
In the created mbox file, each
9 08:55:03 2004
Is this a proper mbox format?
I've looked at a few mbox files and the From lines I've seen all start with "From " then the email address, then date-time.
I think that the difference you're asking about is just the
"sentto-2577139-187-985364714-" part, and that the ">From" and the
newline in the date are just artifacts. (I mention this just to
check that I'm right.) In that case, this looks OK to me.
The "From " line contains the envelope address -- the address used in
the SMTP transaction. This address can differ from the From: address in
the header. That's permissible and useful.
The complicated return address is associated with a bounce processor on
the sending computer. The method is known as "VERP" -- variable envelope
reply processing, IIRC. It assigns a unique tag to each outbound
message, so that on a bounce, the list server can identify precisely
which outbound address triggered the bounce, through arbitrary layers of
forwarding.
the mail would bounce at Earthlink's server. But because of the VERP
address, the list server would know that the address on its list
appropriate action on your list membership despite having inaccurate
information in the bounce.
You'll see this for many list memberships. Formail didn't invent it out
of nothing, it just found that information in the Maildir and created a
"From " line that replicates it. For direct mail from person to person,
you probably won't see that kind of thing unless the sender's mail
system is trying to be very clever.
(I'm supposing a little bit here about how onelist.com's list software
works, but it seems reasonable. Full headers for a message would tell
for sure.)
It's also worth note that the "From " line probably doesn't matter,
anyway, as long as there's something address-like there. Generally these
are ignored once they're in the mbox file; they're mostly useful just
for tracing a message's path through SMTP. For mutt's purposes, only the
From: header matters.
Does that answer your question?
I looked at the original .msg files and compared them to the created mbox file.
It seems that _every_ message in the file was created by formail with that "funny" From line.
None of the original messages seem to have anything like that so I'm wondering if and why formail
created the data and where it came from.

I've attached one of the original messages (ga20xs0.msg)
and cut out the same message from the mbox file (formail.msg).

I edited out the bodies of the messages because Mindspring blocked my first reply to you
believing that there was a virus - there was not.

Mutt is reading the mbox file ok but I'm quite curious about the "funny" addition to the From line.

Larry



Larry Alkoff N2LA - Austin TX
Loading...