OpenBLASを使うと、multiprocessingが使えない？ - 病みつきエンジニアブログ

numpy/scipyは、別に全ての演算がpythonで実装されているわけではなくて、内部的にはBLASとかを呼び出している(多分)。で、普通だったらATLASのようなBLAS実装が使われると思うんだけど、それだと遅いからOpenBLASみたいなBLAS実装を使いたかったりする。(参考：Atsushi TATSUMA Web Page » OpenBLAS を使った Numpy/Scipy のビルド)

で、確かにOpenBLASによって一部の行列演算が早くなる。なぜかというとマルチコアの力を使ってくれるから。

しかし困ったことに

import scipy.sparse.linalg

しただけで、multiprocessingを使った並列処理ができなくなってしまった。コア数とかは

import multiprocessing
multiprocessing.cpu_count() # 16

みたいな感じでマルチコア風なんだけど、実際に動かしてみると、１つのCPUの中で並列処理することになる。調べてみると、OpenBLASはmultiprocessingを使えなくしてしまう、みたいな記述がいくつか見つかる。

openblas uses openmp for parallization. that does not work well when you are forking like python multiprocessing does.

I don't think it can be solved besides disabling parallelization in either openblas or python.

Bug #1186274 “openblas, multiprocessing and numpy freeze python...” : Bugs : “openblas” package : Ubuntu

とか

Using OpenBLAS can give speedups in some scikit-learn modules, but it doesn’t play nicely with joblib/multiprocessing, so using it is not recommended unless you know what you’re doing.

Installing scikit-learn — scikit-learn 0.14 documentation

ということで、一長一短・・・かなあ。ひえええ