You are here: Matthew KENWORTHY > Nelse > Uploading all my email into GMail

Uploading all my email into GMail


Combining and uploading all mailboxes into GMail requires a small amount of trickery. Have a look at the end of this page to read the conclusions first, then read the whole document from the start to get an idea of the problems I encountered.

In brief, don't worry about duplicated messages, be aware that GMail will thread messages into "Conversations", and remember to check your "Spam" folder regularly. GMail will perform very good Spam filtering on your mail.

Begin

Several parts here - I've tried merging all my files beforehand, but had problems with mysterious truncated emails. I'm processing the raw mail boxes first to make sure the headers are clean and intact.

mkdir sorting_out_mail
cd sorting_out_mail/
tar zxvf ~/Desktop/0.\ Inbox/copy\ on\ disk/allmail_mattk.tgz

I then looked through all the directories and uncompressed all tarball and gzip files I could find. The end result is a complex deep directory structure ending at Mailbox files.

I want to rename all the mailbox files into the root directory with an ascending number prefixed.

Select all the files below this by using:

find . -type f -name "*" -print

This loops through, finds files, and makes a unique name for the mailbox in the root directory:

j=1;for i in `find . -type f -name "*" -print`; do echo $i `printf "%03d" $j``basename $i`;j=$(( j+1 )); done

This actually does the file move for you:

j=1;for i in `find . -type f -name "*" -print`; do mv $i `printf "%03d" $j``basename $i`;j=$(( j+1 )); done

Compiling mutt from scratch

curl -O ftp://ftp.mutt.org/mutt/devel/mutt-1.5.19.tar.gz
515  tar zxvf mutt-1.5.19.tar.gz 
516  cd mutt-1.5.19
518  ./configure --prefix=/sw --with-curses --with-regex --enable-locales-fix   --enable-pop --enable-imap --enable-smtp --with-sasl=/sw --mandir=/sw/share/man

Concatenating all mailboxes

I've taken all these individual mailboxes, installed thunderbird 2.0 on my Mac and Linux box and they've read badly.

Thunderbird has an error where any line in the body of the message that starts with From followed by a space ==>''From ''<== splits the message there and then.

I edited the mailboxes directly, added a single space, then resaved them. Thunderbird parsed them correctly.

I connected to IMAP gmail after activating it on my account, then dragged mailboxes from local mail into gmail.

Small mailboxes work, but larger ones (gt 100 messages) do not. I've tried Thunderbird on Mac and Linux, and Mail.app on Mac. All crap out after about 100 messages.

So, I'm going to install dovecot, a POP3 mail server, activate it on the local network, then get gmail to pull all the messages over to its servers.

Installing dovecot on a Mac

Steward machines have all addresses blocked, except for 80 and 22 (ssh and http).

I cannot switch over my desktop to mmto without losing mmtao.org for a few days, but I can connect to a locally open mmto machine and temp use that to run a dovecot POP3 server and get gmail to pull from there.

On a mac:

sudo port install dovecot

...installs openssl, dovecot onto my laptop.

As propmted by the install text, do:

sudo launchctl load -w /Library/LaunchDaemons/org.macports.dovecot.plist

After running these, I could see that smtp had opened up by running nmap on an external network:

nmap 128.196.100.x
PORT STATE SERVICE
22/tcp  open ssh
25/tcp  open smtp
6000/tcp open X11

Now added these to .bashrc so that dovecot was picked up in the path:

# paths for MacPorts
# http://guide.macports.org/#using
export PATH=/opt/local/bin:/opt/local/sbin:$PATH
export MANPATH=/opt/local/share/man:$MANPATH

Now sudo edit the dovecot.conf and apply pop3 settings

sudo vim /opt/local/etc/dovecot/dovecot.conf

Add the pam to /etc/pam.d/dovecot by copying the /etc/pam.d/login and twiddling a bit.

OK, now test using

telnet limey.mmto.arizona.edu 110

as 110 is the open unsecured port for POP3. this works, but logins yield sa

(mattk@mmtao:~)$ telnet limey.mmto.arizona.edu 110
Trying 128.196.100.88...
Connected to limey.mmto.arizona.edu.
Escape character is '^]'.
+OK Dovecot ready.
USER mattk
-ERR Plaintext authentication disallowed on non-secure connections.

OK, so have to install and configure SSL.

ps aux | grep dovecot
dovecot  54479   0.0  0.0   603044    756   ??  S     2:34PM   0:00.01 pop3-login
root     54477   0.0  0.0   600996    572   ??  S     2:34PM   0:00.02 dovecot-auth
root     54476   0.0  0.0   599820    332   ??  Ss    2:34PM   0:00.09 /opt/local/sbin/dovecot
root     54474   0.0  0.0   599740    736   ??  Ss    2:34PM   0:00.01 /opt/local/bin/daemondo --label=dovecot --start-cmd     /opt/local/sbin/dovecot ; --pid=fileauto --pidfile /opt/local/var/run/dovecot/master.pid
mattk    54600   0.0  0.0   599700    380 s000  R+    3:07PM   0:00.00 grep dovecot
(mattk@limey:/opt/local/var/macports/software/dovecot/1.1.3_0+darwin_9/opt/local/share/doc/dovecot/wiki)$ sudo kill -HUP 54476

Now we can login into port 110 sith cleartext passwords. It's dodgy, but I can at least login and do this upload then remove it.

+OK Server ready
USER yiming
+OK
PASS foobar
+OK Logged in.

Server commands

Several commands are useful here.

When finished, the QUIT command will end the session.

Installing Dovecot on a slug

My NSLU2 ('slug') Linux box at home uses Debian, so use apt-get:

apt-get update
apt-get install dovecot-common
apt-get install dovecot-pop3d

Configuration is stored in /etc/dovecot/dovecot.conf

Need to add pop3 as a service, disble SSL authentication, and test using telnet 127.0.0.1 110 and edit the PAM configuration with /etc/pam.d/dovecot and reloading the dovecot server by

ps aux | grep dovecot
kill -HUP <id number from ps command above>

This is what it looks like running as a service:

fishpool:~# ps aux | grep dovecot
root      2486  0.0  2.2   1860   668 ?        Ss   Apr07   0:00 /usr/sbin/dovecot
root      2977  0.0  7.2   9392  2188 ?        S    07:54   0:00 dovecot-auth
dovecot   2979  0.0  3.2   3136   964 ?        S    07:54   0:00 pop3-login
dovecot   2982  0.0  3.2   3136   964 ?        S    07:55   0:00 pop3-login
dovecot   2986  0.0  3.2   3136   964 ?        S    07:56   0:00 pop3-login

Now have to open the firewall to allow outside people to connect.

ALSO, add a user ''matt'' with ''useradd'', generate a directory called mymail/ and add your mail as one big mbox file called ''inbox''

ALSO: complaints of:

==> /var/log/mail.err <==
Apr  8 08:50:29 fishpool dovecot: POP3(matt2): UIDs broken with partial sync in mbox file /home/matt2/mymail/inbox

SO, on the slug we have ''/sbin/ifconfig'' reporting that we are on

inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0

... but from

curl -s myip.dk |grep '"Box"' | egrep -o '[0-9.]+'

we have a different address - we are behind a cheap netgear router.

70.171.203.43

so in the router we need to set port forwarding of 110 to 192.168.1.3 port 110.

Tried it by telnetting from mmtao into the home router, and 110 is not blocked.

Furthermore, google were able to log into my home router and start to pick up my mail.

Issues with uploading email to gmail

gmail pulls via pop3 200 messages at one time, and retries about every 1/2 hour. In its initial attempt to connect, it times out after 25 seconds. if the pop3 server cannot perform a LIST within that time, google times out in its pull.

With the slug box, and my all mail 2007 inbox, it takes 35 seconds to return a LIST. So, either split the bexes into smaller sizes, or use a faster computer.

So, off we go to my home desktop computer, and install....

dovecot on an ubuntu computer

Third time's the charm.

sudo apt-get install openssh-server
sudo apt-get install dovecot-common dovecot-pop3d
sudo apt-get install nmap

nmap is for testing of the ubuntu firewall.

Success! And conclusions

It took about 4 days to import 45,500 emails, totalling about 3.5 Gb.

Things that gmail servers do:

IMAP import seems to be irregular, so POP3 seems to be the more reliable way to pull in your emails.

Gmail can import up to 5 email accounts at a time. The import process consists of logging into the pop3 server, pulling 200 emails at one time, then waiting between 30 to 60 minutes before pulling the next 200. With mail boxes consisting of 5000 emails, this adds up to about 24 hours to pull the whole set! Splitting email mail boxes into 5 roughly equal sets of email boxes, putting them into separate accounts, then getting gmail to pull 5 boxes simultaneously.