This tutorial is adapted from on a collection of other tutorials in both English and Japanese. It assumes you have Python 3 installed. Links are provided to the original source pages periodically.

Installing Mecab


On Windows

In Windows you have several options. One is to use apt-get to install Mecab

sudo apt-get install libmecab-dev
sudo apt-get install mecab mecab-ipadic-utf8
pip3 install mecab-python3

If you get the following error:

Command "python egg_info" failed with error code 1 in /private/var/folders/bg/7wyn3chj2m3bhmxrglv6m0wr0000gn/T/pip-install-_wqpyjn5/mecab-python3/
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Make sure you upgrade your pip

pip3 install --upgrade setuptools

You can also use the .exe (executable) file for a user-friendly installation.


  • You will get mojibake (e.g. “æ–‡å—化㠒”) after you first intall Mecab. To fix this, you must change the region settings/locale to Japan
  • You can use Mecab using the command prompt. Navigate to the directory with mecab and type “MeCab.Ink” or “MeCab.Iink -h” for help
  • If you’re still getting moji bake, then you may not have the correct dictionaries loaded. You can check this by writing “MeCab.Ink -D”, which will give you info on the dictionary.
    • The charset has to be “SHIFT-JIS”
    • You can change it by going into the windows menu, navigating to Mecab, and recompiling the software with SHIFT-JIS or something else

There is a YouTube tutorial for Mecab, which is pretty helpful, showing you how to use wakati and chasen (parser and part of speech tagger).

On Mac

First, download the source file for MeCab from:

In terminal, go to the directory where the mecab tar file is. Do the following:

$ tar xvfz mecab-0.996.tar.gz  #to extract the files (untar it)
$ cd mecab-0.996               #go to the directory with the configure files
$ ./configure --enable-utf8-only --prefix=/usr/local/mecab
$ make
$ sudo make install

Note: make sure you move the downloaded source file to somewhere besides Dropbox. If you don’t, you may get an error like the one below

test -z "/usr/local/lib" || ./install-sh -c -d "/usr/local/lib"
 /bin/sh ./libtool   --mode=install /usr/bin/install -c '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/libcrfpp.0.dylib /usr/local/lib/libcrfpp.0.dylib
libtool: install: (cd /usr/local/lib && { ln -s -f libcrfpp.0.dylib libcrfpp.dylib || { rm -f libcrfpp.dylib && ln -s libcrfpp.0.dylib libcrfpp.dylib; }; })
libtool: install: /usr/bin/install -c .libs/libcrfpp.lai /usr/local/lib/
libtool: install: /usr/bin/install -c .libs/libcrfpp.a /usr/local/lib/libcrfpp.a
libtool: install: chmod 644 /usr/local/lib/libcrfpp.a
libtool: install: ranlib /usr/local/lib/libcrfpp.a
/bin/sh: /Users/auroratsai/Dropbox: No such file or directory
make[1]: *** [install-libLTLIBRARIES] Error 127
make: *** [install-am] Error 2

Installing the IPA dictionary (IPA辞書のインストール)

1) Download the IPA dictionary file (IPA辞書のソースをダウンロード)

2) In terminal, go to the directory where the downloaded tar file is located.

$ tar xvfz mecab-ipadic-2.7.0-20070801.tar.gz   #untar the file
$ cd mecab-ipadic-2.7.0-20070801
$ ./configure --prefix=/usr/local/mecab --with-mecab-config=/usr/local/mecab/bin/mecab-config --with-charset=utf8
$ make
$ sudo make install

Source: This page provides suggestions for fixing it when you get the error for the IPA dictionary:

Install the “easy” way with Homebrew (MacOS) or LinuxBrew (Debian/Ubuntu)

This is the method I used, since I encountered a number of errors with my installation. Molly DesJardin provides the following documentation.

1) Paste into terminal:

/usr/bin/ruby -e "$(curl -fsSL"

or get LinuxBrew

#prepare environment for download
$ sudo apt-get update
$ sudo apt-get upgrade -y
$ sudo sudo apt-get install -y build-essential make cmake scons curl git \
                               ruby autoconf automake autoconf-archive \
                               gettext libtool flex bison \
                               libbz2-dev libcurl4-openssl-dev \
                               libexpat-dev libncurses-dev

#clone linuxBrew
$ git clone ~/.linuxbrew

# Until LinuxBrew is fixed, the following is required.
# See:
$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/local/lib64/pkgconfig:/usr/lib64/pkgconfig:/usr/lib/pkgconfig:/usr/lib/x86_64-linux-gnu/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig:$PKG_CONFIG_PATH

## Setup linux brew and update environmental variables
$ export LINUXBREWHOME=$HOME/.linuxbrew

#test installation
$ which brew
#(path of installation displays)
#(path of config displays)

2) Then install mecab

$ brew install mecab
$ brew install mecab-ipadic

3) Optional: Change Your MeCab dictionary
You can find all of the NINJAL dictionaries for unidic here:

You can’t use -Ochasen option with unidic; it has to be -Owataki. See those posts for sample code.

Open file /usr/local/etc/mecabrc and change:

dicdir =  /usr/local/lib/mecab/dic/ipadic
dicdir =  /usr/local/lib/mecab/dic/unidic

Make sure you move your preferred dictionary to the unidic directory. Just copy and paste everything that you downloaded from the NINJAL site in there, with the directory structure intact. Now, your unidic will be your default dictionary and you can use it (remember, with - Owakati as your option for parser, not -Ochasen) in MeCab Python (and also in rmecab, but as I’m not an R programmer, I won’t cover that here.)

Adapted from:

4) Check your installation

Check if it works by typing mecab into your terminal and then enter some Japanese text.

Use Python/Mecab in R

If you are interested in running python chucks in R Markdown, the reticulate package provides easy interoperability between the two.

Warning: The communication between R and Python chunks (the pieces of code in an R-Markdown document) is only supported since RStudio v1.2 preview release. Otherwise it will only work when you knit the document; it doesn’t happen if you are running chunk by chunk–only after you knit.


1) Set your default python in R
It is helpful to make sure your default python version is set up in your .Rprofile. You can edit your .Rprofile by typing the following into your Console:

file.edit(file.path("~", ".Rprofile"))

And then edit then paste the following into it:

Sys.setenv(RETICULATE_PYTHON = "/usr/local/bin/python3")  #Put this in your .Rprofile

2) Load reticulate and Python3 in your R setup chunk

Python chunks work similarly to R chunks within R Markdown, providing text or graphical outputs. The two languages have full access to each other’s objects if you convert them in your code, including NumPy arrays and Pandas data frames.
By default, reticulate uses the version of Python found on your PATH (i.e. Sys.which(“python”)). If you want to use an alternate version you should add one of the use_python() family of functions to your R Markdown setup chunk. If you want to use Python3, make sure you indicate that in your setup.

knitr::opts_chunk$set(echo = TRUE)

##                  python3 
## "/usr/local/bin/python3"
# use_python(python = "/usr/local/bin/python3", required = T) or
use_python(python = Sys.which("python3"), required = T)

# py_discover_config()#discover which python will be used without actually unloading python
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/config-3.7m-darwin/libpython3.7.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.7:/Library/Frameworks/Python.framework/Versions/3.7
## version:        3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)  [Clang 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy
## numpy_version:  1.15.4
## NOTE: Python version was forced by use_python function

If you want to import packages such as “MeCab”, you can do that using the “import” function

mecab <- import("MeCab")  
## [1] "0.996"
# py_help(mecab)

See what paths for Python are being referenced

import sys
for p in sys.path:
## /Library/Frameworks/Python.framework/Versions/3.7/bin
## /Library/Frameworks/Python.framework/Versions/3.7/lib/
## /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7
## /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload
## /Users/auroratsai/Library/Python/3.7/lib/python/site-packages
## /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
## /Library/Frameworks/R.framework/Versions/3.5/Resources/library/reticulate/python

Now python chunks should work in R markdown. Keep reading to see examples for how the python chunks look.


Install the mecab python wrapper

This is pretty straightforward.
1) In Terminal:

pip3 install mecab-python3
#if you want to use a tagger
echo `mecab-config --dicdir`"/mecab-ipadic-neologd"

2) Test out Mecab in Python

# -*- coding: utf-8 -*-
import MeCab
t = MeCab.Tagger(" ".join(sys.argv))
ex = """
ex2 = """
#using the Mecab rc tagger
tagger = MeCab.Tagger('mecabrc')
mecab_result = tagger.parse(ex)
## 見    動詞,自立,*,*,一段,連用形,見る,ミ,ミ
## た    助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
## 人    名詞,一般,*,*,*,*,人,ヒト,ヒト
## たち   名詞,接尾,一般,*,*,*,たち,タチ,タチ
## が    助詞,格助詞,一般,*,*,*,が,ガ,ガ
## みんな  名詞,代名詞,一般,*,*,*,みんな,ミンナ,ミンナ
## 携帯   名詞,サ変接続,*,*,*,*,携帯,ケイタイ,ケイタイ
## に    助詞,格助詞,一般,*,*,*,に,ニ,ニ
## 話し   動詞,自立,*,*,五段・サ行,連用形,話す,ハナシ,ハナシ
## て    助詞,接続助詞,*,*,*,*,て,テ,テ
## いる   動詞,非自立,*,*,一段,基本形,いる,イル,イル
## こと   名詞,非自立,一般,*,*,*,こと,コト,コト
## 。    記号,句点,*,*,*,*,。,。,。
## 誰    名詞,代名詞,一般,*,*,*,誰,ダレ,ダレ
## でも   助詞,副助詞,*,*,*,*,でも,デモ,デモ
## 連絡   名詞,サ変接続,*,*,*,*,連絡,レンラク,レンラク
## が    助詞,格助詞,一般,*,*,*,が,ガ,ガ
## 取ら   動詞,自立,*,*,五段・ラ行,未然形,取る,トラ,トラ
## ない   助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
## 、    記号,読点,*,*,*,*,、,、,、
## 仕事   名詞,サ変接続,*,*,*,*,仕事,シゴト,シゴト
## や    助詞,並立助詞,*,*,*,*,や,ヤ,ヤ
## 他    名詞,一般,*,*,*,*,他,タ,タ
## の    助詞,連体化,*,*,*,*,の,ノ,ノ
## こと   名詞,非自立,一般,*,*,*,こと,コト,コト
## から   助詞,格助詞,一般,*,*,*,から,カラ,カラ
## 離れ   動詞,自立,*,*,一段,連用形,離れる,ハナレ,ハナレ
## て    助詞,接続助詞,*,*,*,*,て,テ,テ
## 、    記号,読点,*,*,*,*,、,、,、
## 心    名詞,一般,*,*,*,*,心,ココロ,ココロ
## を    助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
## 留守   名詞,サ変接続,*,*,*,*,留守,ルス,ルス
## に    助詞,格助詞,一般,*,*,*,に,ニ,ニ
## なる   動詞,自立,*,*,五段・ラ行,基本形,なる,ナル,ナル
## こと   名詞,非自立,一般,*,*,*,こと,コト,コト
## 。    記号,句点,*,*,*,*,。,。,。
## 誰    名詞,代名詞,一般,*,*,*,誰,ダレ,ダレ
## でも   助詞,副助詞,*,*,*,*,でも,デモ,デモ
## 連絡   名詞,サ変接続,*,*,*,*,連絡,レンラク,レンラク
## が    助詞,格助詞,一般,*,*,*,が,ガ,ガ
## 取れ   動詞,自立,*,*,一段,未然形,取れる,トレ,トレ
## ない   助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ
## 、    記号,読点,*,*,*,*,、,、,、
## 自分   名詞,一般,*,*,*,*,自分,ジブン,ジブン
## の    助詞,連体化,*,*,*,*,の,ノ,ノ
## 時間   名詞,副詞可能,*,*,*,*,時間,ジカン,ジカン
## が    助詞,格助詞,一般,*,*,*,が,ガ,ガ
## ある   動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
## こと   名詞,非自立,一般,*,*,*,こと,コト,コト
## 。    記号,句点,*,*,*,*,。,。,。
## EOS

Yay, it works! You now can segment Japanese into morphemes for natural language processing with part-of-speech tags.