How to use this Python script to extract emails from a GMail inbox Takeout — when you have a lot of emails and other, simpler, imports fail.
./mbox_split.py --infile google_mbox.mbox
You may need to chmod 755 mbox_split.py
to be able to run.
Alternatively, you can just export by label from Google Takeout directly.
This script will generate in the current working directory several mailboxes in the mbox
format, corresponding to each of your Gmail labels, plus "Sent", "Archive" and "INBOX".
You can prefix the output files using the -p <prefix>
parameter to the script: all output filenames will be prepended with .
Messages are stored only once: messages that have several labels will be storred in only one target mbox, usually corresponding to the first valid label.
The script will generate output in the form of:
Storing <message-id> from "sender" to mbox "label"
Along with an initial count of messages in the source mbox and and final tally of messages stored and ignored. It is thus recommended to redirect the script output to a file.
Note: Meta labels such as "Important", "Unread", "Starred", or "Newsletters" are ignored, however the script attempts to preserve "Unread" and "Starred" status by setting up the corresponding Status and X-Status flags.
If you use dovecot, you can stop there: dsync
will happily process the mailboxes generated by this script
An attempt was made to convert on the fly, unfortunately it seems the python Maildir backend does not properly set the filenames according to their delivery date. A script is nevertheless provided as mbox_split_tomaildir.py
for the brave: it appears that all email will be successfully converted, but unfortunately it will look as if it was all received at the time the script was run.