What's the best spell check setup in emacs

  |   Source

UPDATED: <2016-05-13 Fri>

CREATED: <2014-04-26 Sat>

I will show you the minimum setup and explain details.

Topics already covered in official manual (flyspell-mode-predicate, for example) are NOT discussed here.

But you can see my complete setup HERE.

Suggestion for non-programmers

Emacs finds the right dictionary by querying your locale.

You can run the command locale in the shell to get current locale.

If you want to force Emacs use the dictionary "en_US", insert below code into your ~/.emacs:

;; find aspell and hunspell automatically
(cond
 ;; try hunspell at first
  ;; if hunspell does NOT exist, use aspell
 ((executable-find "hunspell")
  (setq ispell-program-name "hunspell")
  (setq ispell-local-dictionary "en_US")
  (setq ispell-local-dictionary-alist
        ;; Please note the list `("-d" "en_US")` contains ACTUAL parameters passed to hunspell
        ;; You could use `("-d" "en_US,en_US-med")` to check with multiple dictionaries
        '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8))))

 ((executable-find "aspell")
  (setq ispell-program-name "aspell")
  ;; Please note ispell-extra-args contains ACTUAL parameters passed to aspell
  (setq ispell-extra-args '("--sug-mode=ultra" "--lang=en_US"))))

That's it!

Some people prefer hunspell to aspell because hunspell gives better suggestions for typo fix. We can run both in shell to demonstrate that,

echo htink | aspell -a --sug-mode=ultra --lang=en_US
echo htink | hunspell -a -d en_US

Run man aspell or man hunspell in shell if you have more questions. I've nothing more to say.

Suggestion for programmers

I strongly recommend aspell instead of hunspell (Though hunspell is fine).

Please insert below code into your ~/.emacs:

;; if (aspell installed) { use aspell}
;; else if (hunspell installed) { use hunspell }
;; whatever spell checker I use, I always use English dictionary
;; I prefer use aspell because:
;; 1. aspell is older
;; 2. looks Kevin Atkinson still get some road map for aspell:
;; @see http://lists.gnu.org/archive/html/aspell-announce/2011-09/msg00000.html
(defun flyspell-detect-ispell-args (&optional run-together)
  "if RUN-TOGETHER is true, spell check the CamelCase words."
  (let (args)
    (cond
     ((string-match  "aspell$" ispell-program-name)
      ;; Force the English dictionary for aspell
      ;; Support Camel Case spelling check (tested with aspell 0.6)
      (setq args (list "--sug-mode=ultra" "--lang=en_US"))
      (when run-together
        (cond
         ;; Kevin Atkinson said now aspell supports camel case directly
         ;; https://github.com/redguardtoo/emacs.d/issues/796
         ((string-match-p "--camel-case"
                          (shell-command-to-string (concat ispell-program-name " --help")))
          (setq args (append args '("--camel-case"))))

         ;; old aspell uses "--run-together". Please note we are not dependent on this option
         ;; to check camel case word. wucuo is the final solution. This aspell options is just
         ;; some extra check to speed up the whole process.
         (t
          (setq args (append args '("--run-together" "--run-together-limit=16")))))))
     ((string-match "hunspell$" ispell-program-name)
      ;; Force the English dictionary for hunspell
      (setq args "-d en_US")))
    args))

(cond
 ((executable-find "aspell")
  ;; you may also need `ispell-extra-args'
  (setq ispell-program-name "aspell"))
 ((executable-find "hunspell")
  (setq ispell-program-name "hunspell")

  ;; Please note that `ispell-local-dictionary` itself will be passed to hunspell cli with "-d"
  ;; it's also used as the key to lookup ispell-local-dictionary-alist
  ;; if we use different dictionary
  (setq ispell-local-dictionary "en_US")
  (setq ispell-local-dictionary-alist
        '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8))))
 (t (setq ispell-program-name nil)))

;; ispell-cmd-args is useless, it's the list of *extra* arguments we will append to the ispell process when "ispell-word" is called.
;; ispell-extra-args is the command arguments which will *always* be used when start ispell process
;; Please note when you use hunspell, ispell-extra-args will NOT be used.
;; Hack ispell-local-dictionary-alist instead.
(setq-default ispell-extra-args (flyspell-detect-ispell-args t))
;; (setq ispell-cmd-args (flyspell-detect-ispell-args))
(defadvice ispell-word (around my-ispell-word activate)
  (let ((old-ispell-extra-args ispell-extra-args))
    (ispell-kill-ispell t)
    (setq ispell-extra-args (flyspell-detect-ispell-args))
    ad-do-it
    (setq ispell-extra-args old-ispell-extra-args)
    (ispell-kill-ispell t)))

(defadvice flyspell-auto-correct-word (around my-flyspell-auto-correct-word activate)
  (let ((old-ispell-extra-args ispell-extra-args))
    (ispell-kill-ispell t)
    ;; use emacs original arguments
    (setq ispell-extra-args (flyspell-detect-ispell-args))
    ad-do-it
    ;; restore our own ispell arguments
    (setq ispell-extra-args old-ispell-extra-args)
    (ispell-kill-ispell t)))

(defun text-mode-hook-setup ()
  ;; Turn off RUN-TOGETHER option when spell check text-mode
  (setq-local ispell-extra-args (flyspell-detect-ispell-args)))
(add-hook 'text-mode-hook 'text-mode-hook-setup)

There is only one minor issue in this setup. If a camel case word contains correct two character sub-word. Aspell will regard the word as typo. So aspell might produce some noise (Future aspell has new option --camel-case which solves this problem completely).

See https://github.com/redguardtoo/emacs.d/blob/master/lisp/init-spelling.el for the solution.

Why

Aspell

aspell is recommended because its option --run-together. That option could check the camel case word. Variable name often uses camel case naming convention these days. Read my Effective spell check in Emacs for advanced tips.

If Emacs starts an aspell process with "–run-together" option, that process is not closed so it can be re-used by other commands.

This behavior will be a problem if you want Emacs/aspell correct the typo by running the command "ispell-word" because an aspell process with "–run-together" will produce too much noise.

For example, for a typo "helle" Emacs will give you extra candidates. It's hard to find the desired word "hello": aspell-camelcase-suggest-nq8.png

The better solution is before running "M-x ispell-word", we'd better restart the aspell process without the argument "–run-together".

Here is the screen shot after we applying this fix: aspell-normal-suggest-nq8.png

As I mentioned, the global variable ispell-extra-args contains arguments Emacs will always append to a spell checker process (aspell or hunspell). That's the only variable you need care about.

There is another variable ispell-cmd-args. It is actually some extra arguments Emacs could send to the existing spell checker process when you "M-x ispell-word". In my opinion, it's useless. I mention it because its name is confusing. ispell-extra-args contains command line arguments the spell checker always uses. The ispell-cmd-args contains the extra arguments which might be used.

Hunspell

I cannot find hunspell option to check camel case words. Please enlighten me if you know the option.

Hunspell has some design flaw. It always checks the environment variable LC_ALL, LC_MESSAGES and LANG at first to find the default dictionary unless you give it the dictionary in the command line. If it cannot find the default dictionary, the spell checker process won't start. Aspell does not have this issue, if it cannot find the zh_CN dictionary, it will fall back into English dictionary.

Specify the ispell-extra-args won NOT stop hunspell to search for the default dictionary at the beginning because ispell-extra-args is NOT used by hunspell.

For example, I am a Chinese and my locale is zh_CN.utf-8. So hunspell always searches the dictionary zh_CN. Even I'm only interested in English spell checking.

To specify the dictionary explicitly, I need hack the Emacs code which is kind of mess.

Finally, I figured out,

(setq ispell-program-name "hunspell")
;; below two lines reset the the hunspell to it STOPS querying locale!
(setq ispell-local-dictionary "en_US") ; "en_US" is key to lookup in `ispell-local-dictionary-alist`
(setq ispell-local-dictionary-alist
      '(("en_US" "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8)))

You can pass the extra arguments to the hunspell by tweaking ispell-local-dictionary-alist.

FAQ

How to setup Hunspell

The easiest way is to set up environment variable DICPATH,

export DICPATH=/Applications/LibreOffice.app/Contents/Resources/extensions/dict-en:/Applications/LibreOffice.app/Contents/Resources/extensions/dict-es
hunspell -D # list available/loaded dictionaries

You can also run hunspell -D in shell to figure out where hunspell actually searches dictionaries.

Comments powered by Disqus