Uploading all my email into GMail
Combining and uploading all mailboxes into GMail requires a small amount of trickery. Have a look at the end of this page to read the conclusions first, then read the whole document from the start to get an idea of the problems I encountered.
In brief, don't worry about duplicated messages, be aware that GMail will thread messages into "Conversations", and remember to check your "Spam" folder regularly. GMail will perform very good Spam filtering on your mail.
Begin
Several parts here - I've tried merging all my files beforehand, but had problems with mysterious truncated emails. I'm processing the raw mail boxes first to make sure the headers are clean and intact.
mkdir sorting_out_mail
cd sorting_out_mail/
tar zxvf ~/Desktop/0.\ Inbox/copy\ on\ disk/allmail_mattk.tgz
I then looked through all the directories and uncompressed all tarball and gzip files I could find. The end result is a complex deep directory structure ending at Mailbox files.
I want to rename all the mailbox files into the root directory with an ascending number prefixed.
Select all the files below this by using:
find . -type f -name "*" -print
This loops through, finds files, and makes a unique name for the mailbox in the root directory:
j=1;for i in `find . -type f -name "*" -print`; do echo $i `printf "%03d" $j``basename $i`;j=$(( j+1 )); done
This actually does the file move for you:
j=1;for i in `find . -type f -name "*" -print`; do mv $i `printf "%03d" $j``basename $i`;j=$(( j+1 )); done
Compiling mutt from scratch
curl -O ftp://ftp.mutt.org/mutt/devel/mutt-1.5.19.tar.gz
515 tar zxvf mutt-1.5.19.tar.gz
516 cd mutt-1.5.19
518 ./configure --prefix=/sw --with-curses --with-regex --enable-locales-fix --enable-pop --enable-imap --enable-smtp --with-sasl=/sw --mandir=/sw/share/man
Concatenating all mailboxes
I've taken all these individual mailboxes, installed thunderbird 2.0 on my Mac and Linux box and they've read badly.
Thunderbird has an error where any line in the body of the message that starts with From followed by a space ==>''From ''<== splits the message there and then.
I edited the mailboxes directly, added a single space, then resaved them. Thunderbird parsed them correctly.
I connected to IMAP gmail after activating it on my account, then dragged mailboxes from local mail into gmail.
Small mailboxes work, but larger ones (gt 100 messages) do not. I've tried Thunderbird on Mac and Linux, and Mail.app on Mac. All crap out after about 100 messages.
So, I'm going to install dovecot, a POP3 mail server, activate it on the local network, then get gmail to pull all the messages over to its servers.
Installing dovecot on a Mac
Steward machines have all addresses blocked, except for 80 and 22 (ssh and http).
I cannot switch over my desktop to mmto without losing mmtao.org for a few days, but I can connect to a locally open mmto machine and temp use that to run a dovecot POP3 server and get gmail to pull from there.
On a mac:
sudo port install dovecot
...installs openssl, dovecot onto my laptop.
As propmted by the install text, do:
sudo launchctl load -w /Library/LaunchDaemons/org.macports.dovecot.plist
After running these, I could see that smtp had opened up by running nmap on an external network:
nmap 128.196.100.x
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
6000/tcp open X11
Now added these to .bashrc so that dovecot was picked up in the path:
# paths for MacPorts
# http://guide.macports.org/#using
export PATH=/opt/local/bin:/opt/local/sbin:$PATH
export MANPATH=/opt/local/share/man:$MANPATH
Now sudo edit the dovecot.conf and apply pop3 settings
sudo vim /opt/local/etc/dovecot/dovecot.conf
Add the pam to /etc/pam.d/dovecot by copying the /etc/pam.d/login and twiddling a bit.
OK, now test using
telnet limey.mmto.arizona.edu 110
as 110 is the open unsecured port for POP3. this works, but logins yield sa
(mattk@mmtao:~)$ telnet limey.mmto.arizona.edu 110
Trying 128.196.100.88...
Connected to limey.mmto.arizona.edu.
Escape character is '^]'.
+OK Dovecot ready.
USER mattk
-ERR Plaintext authentication disallowed on non-secure connections.
OK, so have to install and configure SSL.
ps aux | grep dovecot
dovecot 54479 0.0 0.0 603044 756 ?? S 2:34PM 0:00.01 pop3-login
root 54477 0.0 0.0 600996 572 ?? S 2:34PM 0:00.02 dovecot-auth
root 54476 0.0 0.0 599820 332 ?? Ss 2:34PM 0:00.09 /opt/local/sbin/dovecot
root 54474 0.0 0.0 599740 736 ?? Ss 2:34PM 0:00.01 /opt/local/bin/daemondo --label=dovecot --start-cmd /opt/local/sbin/dovecot ; --pid=fileauto --pidfile /opt/local/var/run/dovecot/master.pid
mattk 54600 0.0 0.0 599700 380 s000 R+ 3:07PM 0:00.00 grep dovecot
(mattk@limey:/opt/local/var/macports/software/dovecot/1.1.3_0+darwin_9/opt/local/share/doc/dovecot/wiki)$ sudo kill -HUP 54476
Now we can login into port 110 sith cleartext passwords. It's dodgy, but I can at least login and do this upload then remove it.
+OK Server ready
USER yiming
+OK
PASS foobar
+OK Logged in.
Server commands
Several commands are useful here.
- LIST - lists the messages available in the user’s account, returning a status message and list with each row containing a message number and the size of that message in bytes
- STAT - returns a status message, the number of messages in the mailbox, and the size of the mailbox in bytes
- RETR [message_num] - returns the message identified by the message number, which is the same as the message number shown in the LIST command output
- TOP [message_num] [n] - returns the top n lines of the message denoted by message number.
When finished, the QUIT command will end the session.
Installing Dovecot on a slug
My NSLU2 ('slug') Linux box at home uses Debian, so use apt-get:
apt-get update
apt-get install dovecot-common
apt-get install dovecot-pop3d
Configuration is stored in /etc/dovecot/dovecot.conf
Need to add pop3 as a service, disble SSL authentication, and test using telnet 127.0.0.1 110 and edit the PAM configuration with /etc/pam.d/dovecot and reloading the dovecot server by
ps aux | grep dovecot
kill -HUP <id number from ps command above>
This is what it looks like running as a service:
fishpool:~# ps aux | grep dovecot
root 2486 0.0 2.2 1860 668 ? Ss Apr07 0:00 /usr/sbin/dovecot
root 2977 0.0 7.2 9392 2188 ? S 07:54 0:00 dovecot-auth
dovecot 2979 0.0 3.2 3136 964 ? S 07:54 0:00 pop3-login
dovecot 2982 0.0 3.2 3136 964 ? S 07:55 0:00 pop3-login
dovecot 2986 0.0 3.2 3136 964 ? S 07:56 0:00 pop3-login
Now have to open the firewall to allow outside people to connect.
ALSO, add a user ''matt'' with ''useradd'', generate a directory called mymail/ and add your mail as one big mbox file called ''inbox''
ALSO: complaints of:
==> /var/log/mail.err <==
Apr 8 08:50:29 fishpool dovecot: POP3(matt2): UIDs broken with partial sync in mbox file /home/matt2/mymail/inbox
SO, on the slug we have ''/sbin/ifconfig'' reporting that we are on
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
... but from
curl -s myip.dk |grep '"Box"' | egrep -o '[0-9.]+'
we have a different address - we are behind a cheap netgear router.
70.171.203.43
so in the router we need to set port forwarding of 110 to 192.168.1.3 port 110.
Tried it by telnetting from mmtao into the home router, and 110 is not blocked.
Furthermore, google were able to log into my home router and start to pick up my mail.
Issues with uploading email to gmail
gmail pulls via pop3 200 messages at one time, and retries about every 1/2 hour. In its initial attempt to connect, it times out after 25 seconds. if the pop3 server cannot perform a LIST within that time, google times out in its pull.
With the slug box, and my all mail 2007 inbox, it takes 35 seconds to return a LIST. So, either split the bexes into smaller sizes, or use a faster computer.
So, off we go to my home desktop computer, and install....
dovecot on an ubuntu computer
Third time's the charm.
sudo apt-get install openssh-server
sudo apt-get install dovecot-common dovecot-pop3d
sudo apt-get install nmap
nmap is for testing of the ubuntu firewall.
Success! And conclusions
It took about 4 days to import 45,500 emails, totalling about 3.5 Gb.
Things that gmail servers do:
- They run the spam filter on all imported mail, so it's good to check Spam every so often and see if certain emails were mistakenly moved there.
- They silently delete any emails that are exact duplicates, so you think you're losing emails when you're probably not.
- They count CONVERSATIONS and NOT individual emails - so you think you're missing about 30% of your imported emails, but in reality emails are threaded into 'conversations', which result in a lower count.
- Gmail is the limiting speed here, not your upload speed.
IMAP import seems to be irregular, so POP3 seems to be the more reliable way to pull in your emails.
Gmail can import up to 5 email accounts at a time. The import process consists of logging into the pop3 server, pulling 200 emails at one time, then waiting between 30 to 60 minutes before pulling the next 200. With mail boxes consisting of 5000 emails, this adds up to about 24 hours to pull the whole set! Splitting email mail boxes into 5 roughly equal sets of email boxes, putting them into separate accounts, then getting gmail to pull 5 boxes simultaneously.