Migrate blog from wordpress into nikola

  |   Source

CREATED: <2014-04-22>

UPDATED: <2017-03-12 Sun>

Here are the steps.

# you might need need `sudo apt-get install python-gdbm` on Debian
# install python2 and make sure sqlite is supported
sudo USE="sqlite" emerge -a =python-2* # Gentoo Linux

# one debian/ubuntu, you need: `apt-get install libxml2-dev libxslt1-dev python-dev libjpeg-dev` for lxml

# Not to mess up with root
export PATH=$PATH:$HOME/.local/bin

# Best way to get latest pip,
# see https://packaging.python.org/installing/#install-pip-setuptools-and-wheel
python ~/bin/get-pip.py --user # ~/.local/bin/pip

# install dependencies (requests is required by zen theme)
# sudo pip install markdown webassets phpserialize nikola requests
# "sudo pip install" could screw up my python setup on Gentoo Linux
# @see https://forums.gentoo.org/viewtopic-t-1006044-view-next.html?sid=931f7be2c16ac99fd85eb2940c0bf82b
# so install the python packages in my HOME directory might be better
# @see http://stackoverflow.com/questions/2915471/install-a-python-package-into-a-different-directory-using-pip
pip install --user markdown webassets phpserialize nikola requests

# create root directory of nikola
mkdir -p ~/.config/nikola;cd ~/.config/nikola

# import from wordpress dump
nicola import_wordpress my_wordpress_dump.xml

# since I use zen theme, I need install lessc
# obviously NodeJS is required
npm install -g less # use portable nodejs in $HOME is better

# I use zen theme, before intalling new theme, clean the legacy theme at first
rm -rf themes/zen/;nikola install_theme zen
# or rm -rf themes/zen/; http_proxy=http://127.0.0.1:8087 nikola install_theme zen at mainland China

# build the web site
nikola build

Use below command to fix embedded code in HTML files:

find -name '*.wp' -exec grep -l "\[sourcecode.*\<diff\>.*\]" {} \; |xargs sed -i 's/\[sourcecode.*\<diff\>.*\]/<pre class="brush: diff;">/g
find -name '*.wp' -exec grep -l "~~~~~~~~~~~~" {} \;|xargs sed -i "s%~~~~~~~~~~~~%</pre>%g"

Manually fixed those articles with Chinese title in url_map.csv

Use below script to fixed the xml dumped from wordpress:

#!/usr/bin/python
import getopt, sys, csv
def usage():
    print '''
NAME
    fix url mapping when migrate wordpress blog into nikola
Usage
    python fix-url-map.py [options]
'''[1:-1]

if __name__ == '__main__':
    try:
        opts, args = getopt.getopt(sys.argv[1:], "hf:x:", ["help", "file=","xml="])
    except getopt.GetoptError as err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)

    file=""
    xml=""

    for o, a in opts:
        if o in ("-h", "--help"):
            usage()
            sys.exit()
        elif o in ("-f", "--file"):
            file= a
        elif o in ("-x", "--xml"):
            xml=a
        else:
            assert False, "unhandled option"

    with open(xml, 'r') as content_file:
        content = content_file.read()

    with open(file, 'rb') as csvfile:
         spamreader = csv.reader(csvfile, delimiter=',')
         for row in spamreader:
             content=content.replace(">"+row[0]+"<",">"+row[1]+"<")

    print content

Import the xml into disqus.com.

Install syntaxhighlighter and google analytics, insert below code into conf.py:

BODY_END = """
<script type='text/javascript' src='//cdnjs.cloudflare.com/ajax/libs/SyntaxHighlighter/3.0.83/scripts/shCore.min.js'></script>
<script type='text/javascript' src='//cdnjs.cloudflare.com/ajax/libs/SyntaxHighlighter/3.0.83/scripts/shAutoloader.min.js'></script>
<script type='text/javascript'>
function path()
{
  var args = arguments,
      result = [];

  for(var i = 0; i < args.length; i++){
    result.push(args[i].replace('@', '//cdnjs.cloudflare.com/ajax/libs/SyntaxHighlighter/3.0.83/scripts/'));
  }

  return result;
};

SyntaxHighlighter.autoloader.apply(null, path(
  'applescript            @shBrushAppleScript.min.js',
  'actionscript3 as3      @shBrushAS3.min.js',
  'bash shell             @shBrushBash.min.js',
  'coldfusion cf          @shBrushColdFusion.min.js',
  'cpp c c++              @shBrushCpp.min.js',
  'clojure                @shBrushScala.min.js',
  'c# c-sharp csharp      @shBrushCSharp.min.js',
  'css                    @shBrushCss.min.js',
  'delphi pascal          @shBrushDelphi.min.js',
  'diff patch pas         @shBrushDiff.min.js',
  'erl erlang             @shBrushErlang.min.js',
  'groovy                 @shBrushGroovy.min.js',
  'java                   @shBrushJava.min.js',
  'jfx javafx             @shBrushJavaFX.min.js',
  'js jscript javascript  @shBrushJScript.min.js',
  'perl pl                @shBrushPerl.min.js',
  'php                    @shBrushPhp.min.js',
  'text plain             @shBrushPlain.min.js',
  'py python              @shBrushPython.min.js',
  'ruby rails ror rb      @shBrushRuby.min.js',
  'sass scss              @shBrushSass.min.js',
  'scala                  @shBrushScala.min.js',
  'sql                    @shBrushSql.min.js',
  'vb vbnet               @shBrushVb.min.js',
  'xml xhtml xslt html    @shBrushXml.min.js'
));
SyntaxHighlighter.all();
</script>
<script type='text/javascript'>
var _gaq = _gaq || [];

_gaq.push(['_setAccount', 'UA-29850823-2']);
_gaq.push(['_addDevId', 'i9k95']); // Google Analyticator App ID with Google
_gaq.push(['_trackPageview']);

(function() {
  var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
  ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
  var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
"""

EXTRA_HEAD_DATA = """
<link rel='stylesheet' type='text/css' href='//cdnjs.cloudflare.com/ajax/libs/SyntaxHighlighter/3.0.83/styles/shCore.min.css'>
<link rel='stylesheet' type='text/css' href='//cdnjs.cloudflare.com/ajax/libs/SyntaxHighlighter/3.0.83/styles/shCoreEmacs.css'>
"""

In above code, shCoreEmacs.css is the color theme for syntaxhighlighter.

I use FileZilla to upload files to FTP. ncftp is a command line alternative.

Comments powered by Disqus