Migrate blog from wordpress into nikola

  |   Source

CREATED: <2014-04-22>

UPDATED: <2017-06-28 Wed>

If you are only interested in web page optimization, start from the second section of the article.

Migrate from WordPress to Nikola

# you might need need `sudo apt-get install python-gdbm` on Debian
# install python2 and make sure sqlite is supported
sudo USE="sqlite" emerge -a =python-2* # Gentoo Linux

# one debian/ubuntu, you need: `apt-get install libxml2-dev libxslt1-dev python-dev libjpeg-dev` for lxml

# Not to mess up with root
export PATH=$PATH:$HOME/.local/bin

# Best way to get latest pip,
# see https://packaging.python.org/installing/#install-pip-setuptools-and-wheel
python ~/bin/get-pip.py --user # ~/.local/bin/pip

# install dependencies (requests is required by zen theme)
# sudo pip install markdown webassets phpserialize nikola requests
# "sudo pip install" could screw up my python setup on Gentoo Linux
# @see https://forums.gentoo.org/viewtopic-t-1006044-view-next.html?sid=931f7be2c16ac99fd85eb2940c0bf82b
# so install the python packages in my HOME directory might be better
# @see http://stackoverflow.com/questions/2915471/install-a-python-package-into-a-different-directory-using-pip
pip install --user markdown webassets phpserialize nikola requests

# create root directory of nikola
mkdir -p ~/.config/nikola;cd ~/.config/nikola

# import from wordpress dump
nicola import_wordpress my_wordpress_dump.xml

# since I use zen theme, I need install lessc
# obviously NodeJS is required
npm install -g less # use portable nodejs in $HOME is better

# I use zen theme, before intalling new theme, clean the legacy theme at first
rm -rf themes/zen/;nikola install_theme zen
# or rm -rf themes/zen/; http_proxy=http://127.0.0.1:8087 nikola install_theme zen at mainland China

# build the web site
nikola build

Use below command to fix embedded code in HTML files:

find -name '*.wp' -exec grep -l "\[sourcecode.*\<diff\>.*\]" {} \; |xargs sed -i 's/\[sourcecode.*\<diff\>.*\]/<pre class="brush: diff;">/g
find -name '*.wp' -exec grep -l "~~~~~~~~~~~~" {} \;|xargs sed -i "s%~~~~~~~~~~~~%</pre>%g"

Manually fixed those articles with Chinese title in url_map.csv

Use below script to fixed the xml dumped from wordpress:

#!/usr/bin/python
import getopt, sys, csv
def usage():
    print '''
NAME
    fix url mapping when migrate wordpress blog into nikola
Usage
    python fix-url-map.py [options]
'''[1:-1]

if __name__ == '__main__':
    try:
        opts, args = getopt.getopt(sys.argv[1:], "hf:x:", ["help", "file=","xml="])
    except getopt.GetoptError as err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)

    file=""
    xml=""

    for o, a in opts:
        if o in ("-h", "--help"):
            usage()
            sys.exit()
        elif o in ("-f", "--file"):
            file= a
        elif o in ("-x", "--xml"):
            xml=a
        else:
            assert False, "unhandled option"

    with open(xml, 'r') as content_file:
        content = content_file.read()

    with open(file, 'rb') as csvfile:
         spamreader = csv.reader(csvfile, delimiter=',')
         for row in spamreader:
             content=content.replace(">"+row[0]+"<",">"+row[1]+"<")

    print content

Import the xml into http://disqus.com.

You can use javascript to re-direct the URL, so your old article links are still valid. Ask at http://stackoverflow.com or contact your local Javascript developers on how to do it. It's simple task but a little boring.

Optimization

I developed org2nikola to convert Org subtree into Nikola page.

Instead of hacking conf.py, I tweak the theme's javascript/html/css directly. It's simpler and more flexible.

For example, the Zen theme uses JQuery plugins to format time. I replace JQuery with Moment.js v1.0 which is much smaller.

Since only a few icons from Font Awesome is used by Zen theme.. We can use NodeJS plugin font-spider to trim down the font file.

Other conventional front end tricks like static assets concatenating/minifying can also be used. These tricks are introduced everywhere. So I won't waste time on details.

In order to see the real example of optimization, please visit http://blog.binchen.org. Watching through the browser's developer tool, you can see my web page is 90% smaller than pages without optimization.

highlight.js is the best solution for code syntax highlighting. I customized the highlight.js to render only the programming languages I use.

My static blog is hosted at GitHub Pages.

Comments powered by Disqus