What's the best spell check setup in emacs

  |   Source

UPDATED: <2016-05-13 Fri>

CREATED: <2014-04-26>

I will show you the minimum setup and explain why.

Topics covered in official manual (flyspell-mode-predicate, for example) are NOT discussed here.

But you can see my complete setup HERE.

1 Suggestion for non-programmers

Emacs will find the right dictionary by querying your locale.

Run command "locale" in your shell to get current locale.

If you want to force Emacs use the dictionary "en_US", insert below code into your ~/.emacs:

;; find aspell and hunspell automatically
(cond
 ;; try hunspell at first
  ;; if hunspell does NOT exist, use aspell
 ((executable-find "hunspell")
  (setq ispell-program-name "hunspell")
  (setq ispell-local-dictionary "en_US")
  (setq ispell-local-dictionary-alist
        ;; Please note the list `("-d" "en_US")` contains ACTUAL parameters passed to hunspell
        ;; You could use `("-d" "en_US,en_US-med")` to check with multiple dictionaries
        '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)
          )))

 ((executable-find "aspell")
  (setq ispell-program-name "aspell")
  ;; Please note ispell-extra-args contains ACTUAL parameters passed to aspell
  (setq ispell-extra-args '("--sug-mode=ultra" "--lang=en_US"))))

That's it!

Some people prefer hunspell to aspell because hunspell gives better suggestions for typo fix. We can run both in shell to demonstrate that,

echo htink | aspell -a --sug-mode=ultra --lang=en_US
echo htink | hunspell -a -d en_US

Please run command "man aspell" or "man hunspell" in shell if you have more questions. I've nothing more to say.

2 Suggestion for programmers

I strongly recommend aspell instead of hunspell (Though hunspell is fine).

Please insert below code into your ~/.emacs:

;; if (aspell installed) { use aspell}
;; else if (hunspell installed) { use hunspell }
;; whatever spell checker I use, I always use English dictionary
;; I prefer use aspell because:
;; 1. aspell is older
;; 2. looks Kevin Atkinson still get some road map for aspell:
;; @see http://lists.gnu.org/archive/html/aspell-announce/2011-09/msg00000.html
(defun flyspell-detect-ispell-args (&optional run-together)
  "if RUN-TOGETHER is true, spell check the CamelCase words."
  (let (args)
    (cond
     ((string-match  "aspell$" ispell-program-name)
      ;; Force the English dictionary for aspell
      ;; Support Camel Case spelling check (tested with aspell 0.6)
      (setq args (list "--sug-mode=ultra" "--lang=en_US"))
      (if run-together
          (setq args (append args '("--run-together" "--run-together-limit=5" "--run-together-min=2")))))
     ((string-match "hunspell$" ispell-program-name)
      ;; Force the English dictionary for hunspell
      (setq args "-d en_US")))
    args))

(cond
 ((executable-find "aspell")
  ;; you may also need `ispell-extra-args'
  (setq ispell-program-name "aspell"))
 ((executable-find "hunspell")
  (setq ispell-program-name "hunspell")

  ;; Please note that `ispell-local-dictionary` itself will be passed to hunspell cli with "-d"
  ;; it's also used as the key to lookup ispell-local-dictionary-alist
  ;; if we use different dictionary
  (setq ispell-local-dictionary "en_US")
  (setq ispell-local-dictionary-alist
        '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8))))
 (t (setq ispell-program-name nil)))

;; ispell-cmd-args is useless, it's the list of *extra* arguments we will append to the ispell process when "ispell-word" is called.
;; ispell-extra-args is the command arguments which will *always* be used when start ispell process
;; Please note when you use hunspell, ispell-extra-args will NOT be used.
;; Hack ispell-local-dictionary-alist instead.
(setq-default ispell-extra-args (flyspell-detect-ispell-args t))
;; (setq ispell-cmd-args (flyspell-detect-ispell-args))
(defadvice ispell-word (around my-ispell-word activate)
  (let ((old-ispell-extra-args ispell-extra-args))
    (ispell-kill-ispell t)
    (setq ispell-extra-args (flyspell-detect-ispell-args))
    ad-do-it
    (setq ispell-extra-args old-ispell-extra-args)
    (ispell-kill-ispell t)
    ))

(defadvice flyspell-auto-correct-word (around my-flyspell-auto-correct-word activate)
  (let ((old-ispell-extra-args ispell-extra-args))
    (ispell-kill-ispell t)
    ;; use emacs original arguments
    (setq ispell-extra-args (flyspell-detect-ispell-args))
    ad-do-it
    ;; restore our own ispell arguments
    (setq ispell-extra-args old-ispell-extra-args)
    (ispell-kill-ispell t)
    ))

(defun text-mode-hook-setup ()
  ;; Turn off RUN-TOGETHER option when spell check text-mode
  (setq-local ispell-extra-args (flyspell-detect-ispell-args)))
(add-hook 'text-mode-hook 'text-mode-hook-setup)

3 Why

3.1 Aspell

aspell is recommended because its option "–run-together". That option could check the camel case word. Variable name often uses camel case naming convention these days. Read my Effective spell check in Emacs for advanced tips (spell check HTML files).

If Emacs start a aspell process with "–run-together" option, that process is not closed so it can be re-used by other commands.

This behavior will be a problem if you want to let Emacs/aspell correct the typo by running the command "ispell-word" because a aspell process with "–run-together" will produce much noise.

For example, for a typo "helle" Emacs will give you too many candidates. It's hard to find the desired word "hello": aspell-camelcase-suggest-nq8.png

The better solution is before running "M-x ispell-word", we'd better restart a aspell process without the argument "–run-together".

Here is the screen shot after we applying this fix: aspell-normal-suggest-nq8.png

As I mentioned, the global variable "ispell-extra-args" contains arguments Emacs will always append to a spell checker process (aspell or hunspell). That's the only variable you need care about.

There is another variable named "ispell-cmd-args". It is actually some extra arguments Emacs could send to an existing spell checker process when you "M-x ispell-word". In my opinion, it's useless. I mention it because the naming are confusing. "ispell-extra-args" is actually command line arguments the spell checker will always use. The "ispell-cmd-args" are the extra arguments optionally be used.

3.2 Hunspell

I cannot find hunspell option to check camel case words. Please enlighten me if you know the option.

Hunspell has some design flaw. It will always check the environment variable LC_ALL, LC_MESSAGES and LANG at first to find the default dictionary unless you specify the dictionary in the command line. If it cannot find the default dictionary, the spell checker process won't start. Aspell does not have this issue, if it cannot find the zh_CN dictionary, it will fall back into English.

Specify the ispell-extra-args won't stop hunspell to search for the default dictionary at the beginning because ispell-extra-args is NOT used.

For example, I am a Chinese and my locale is "zh_CN.utf-8". So hunspell will always search the dictionary zh_CN. Even I'm only interested in English spell checking.

To specify the dictionary explicitly, I need hack the Emacs code which is kind of mess.

Finally, I figure out,

(setq ispell-program-name "hunspell")
;; below two lines reset the the hunspell to it STOPS querying locale!
(setq ispell-local-dictionary "en_US") ; "en_US" is key to lookup in `ispell-local-dictionary-alist`
(setq ispell-local-dictionary-alist
      '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)))

As you can see, you can pass the extra arguments to the hunspell by tweak ispell-local-dictionary-alist.

4 FAQ

4.1 How to setup Hunspell

The easiest way is to set up environment variable DICPATH,

export DICPATH=/Applications/LibreOffice.app/Contents/Resources/extensions/dict-en:/Applications/LibreOffice.app/Contents/Resources/extensions/dict-es
hunspell -D # list available/loaded dictionaries

You can also run hunspell -D in shell to figure out where hunspell searches dictionaries.

Comments powered by Disqus