SARIMAXモデルの高速ベイズ推定¶

はじめに¶

このノートブックでは、SARIMAX（外生変数を持つ季節自己回帰統合移動平均）モデルを推定するための高速ベイズ法の使用方法を示します。これらの手法は、複数のコアにわたって並列化することもできます。

ここで言う「高速な手法」とは、ホフマンとゲルマン（Hoffmann and Gelman）によって開発された、ハミルトニアンモンテカルロ法の一種であるNo-U-Turn Sampler（NUTS）のことです。詳細は Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623. をご覧ください。彼らによると、「ターゲット分布からの独立標本ごとの HMC のコストは、次元 \(D\) に対しておおよそ \(\mathcal{O}(D^{5/4})\) であり、ランダムウォークメトロポリス法の \(\mathcal{O}(D^{2})\) のコストと鋭く対照的です。」したがって、大きな次元の問題に対して、HMCの時間短縮効果は非常に大きいです。しかし、これはモデルの勾配またはヤコビアンを提供する必要があります。

このノートブックでは、 statsmodels （計量経済学を行うライブラリ）と PyMC3 （ベイズ推定用のライブラリ）を組み合わせて、簡単な SARIMAX モデル、ここでは米国 CPI の ARMA(1, 1) モデルの高速ベイズ推定を行います。

簡単なモデル（例えば AR(p) モデル）に関しては、 PyMC3 のベースを使用する方がモデルのフィットが速いですが、こちらに例があります。statsmodels を使用する利点は、広範囲な状態空間モデルを解決するためのメソッドにアクセスできることです。

解くべきモデルは次のように与えられます：

\[y_t = \phi y_{t-1} + \varepsilon_t + \theta_1 \varepsilon_{t-1}, \qquad \varepsilon_t \sim N(0, \sigma^2)\]

自己回帰項が一つ、移動平均項が一つのモデルです。状態空間形式では次のように書かれます：

\[\begin{split}\begin{align} y_t & = \underbrace{\begin{bmatrix} 1 & \theta_1 \end{bmatrix}}_{Z} \underbrace{\begin{bmatrix} \alpha_{1,t} \\ \alpha_{2,t} \end{bmatrix}}_{\alpha_t} \\ \begin{bmatrix} \alpha_{1,t+1} \\ \alpha_{2,t+1} \end{bmatrix} & = \underbrace{\begin{bmatrix} \phi & 0 \\ 1 & 0 \\ \end{bmatrix}}_{T} \begin{bmatrix} \alpha_{1,t} \\ \alpha_{2,t} \end{bmatrix} + \underbrace{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}_{R} \underbrace{\varepsilon_{t+1}}_{\eta_t} \\ \end{align}\end{split}\]

コードは次のステップに従います：

外部依存関係をインポート
米国CPIデータのダウンロードとプロット
最大尤度推定（MLE）の簡単な例
ベイズ推定を行うためのテンソルをライブラリに提供する補助関数の定義
NUTSによるベイズ推定
米国　CPI　時系列への適用

最後に、付録Aでは、ステップ（4）の補助関数を再利用して、異なる状態空間モデルである UnobservedComponents を同じベイズ法で推定する方法を示します。

1. 外部依存性をインポートする¶

[ ]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import statsmodels.api as sm
import theano
import theano.tensor as tt
from pandas.plotting import register_matplotlib_converters
from pandas_datareader.data import DataReader

plt.style.use("seaborn")
register_matplotlib_converters()

2. 米国 CPI のデータをダウンロードしてプロットする¶

データは FRED から取得します：

[ ]:

cpi = DataReader("CPIAUCNS", "fred", start="1971-01", end="2018-12")
cpi.index = pd.DatetimeIndex(cpi.index, freq="MS")

# Define the inflation series that we'll use in analysis
inf = np.log(cpi).resample("QS").mean().diff()[1:] * 400
inf = inf.dropna()
print(inf.head())

[ ]:

# Plot the series
fig, ax = plt.subplots(figsize=(9, 4), dpi=300)
ax.plot(inf.index, inf, label=r"$\Delta \log CPI$", lw=2)
ax.legend(loc="lower left")
plt.show()

3. 最尤推定によるモデルのフィット¶

Statsmodels はこの作業をすべて代わりに行ってくれます。モデルの作成とフィットは、たった 2 行のコードで完了します。モデルの順序パラメータは、それぞれ自己回帰、差分、移動平均の順序に対応します。

[ ]:

# Create an SARIMAX model instance - here we use it to estimate
# the parameters via MLE using the `fit` method, but we can
# also re-use it below for the Bayesian estimation
mod = sm.tsa.statespace.SARIMAX(inf, order=(1, 0, 1))

res_mle = mod.fit(disp=False)
print(res_mle.summary())

適合度は良好です。また、1ステップ先の予測の系列を取得し、それを実際のデータとともに、信頼区間を加えてプロットすることもできます。

[ ]:

predict_mle = res_mle.get_prediction()
predict_mle_ci = predict_mle.conf_int()
lower = predict_mle_ci["lower CPIAUCNS"]
upper = predict_mle_ci["upper CPIAUCNS"]

# Graph
fig, ax = plt.subplots(figsize=(9, 4), dpi=300)

# Plot data points
inf.plot(ax=ax, style="-", label="Observed")

# Plot predictions
predict_mle.predicted_mean.plot(ax=ax, style="r.", label="One-step-ahead forecast")
ax.fill_between(predict_mle_ci.index, lower, upper, color="r", alpha=0.1)
ax.legend(loc="lower left")
plt.show()

4. ベイズ推定を行うライブラリにテンソルを提供するための補助関数¶

もうすぐ魔法のような部分に入りますが、いくつかの前提があります。技術的な詳細に興味がない場合は、このセクションを飛ばしても構いません。

技術的詳細¶

PyMC3 は、ベイズ推定ライブラリ（「Pythonによる確率的プログラミング： Theano によるベイズモデリングと確率的機械学習」）で、a) 高速で、b) ベイズ機械学習に最適化されています。例えば、ベイズニューラルネットワークなどです。このライブラリは、非常に効率的にテンソルを評価し、シンボリック微分（深層学習には必要不可欠）を提供することを目的とした Theano というライブラリの上に構築されています。このシンボリック微分によって、 PyMC3 は PyMC3 内で定式化された任意の問題に対して NUTS を使用することができます。

私たちはPyMC3で直接問題を定式化しているわけではなく、 statsmodels を使用して状態空間モデルを指定し、カルマンフィルタを使って解いています。したがって、 statsmodels と PyMC3 の接続部分を構築する必要があり、そのためには、 statsmodels の SARIMAX モデルオブジェクトを Theano 風のラッパーでラップしてから、 PyMC3 に情報を渡して推定を行います。

このため、 Theano の自動微分を直接使用することはできません。しかし、幸いにも、 statsmodels の SARIMAX オブジェクトにはパラメータ値で評価されたヤコビ行列を返すメソッドがあります。これを利用して勾配を提供し、 NUTS を使用できるようにします。

モデルを PyMC3 に適した形式に変換するための補助関数の定義¶

まず、 Theano ラッパーを作成します。これらは「Ops」と呼ばれる操作オブジェクトの形式で、特定のタスクを「実行」します。これらは statsmodels の model インスタンスで初期化されます。

このコードはやや難解に見えるかもしれませんが、 statsmodels の任意の状態空間モデルに対して汎用的なものです。

[ ]:

class Loglike(tt.Op):

    itypes = [tt.dvector]  # expects a vector of parameter values when called
    otypes = [tt.dscalar]  # outputs a single scalar value (the log likelihood)

    def __init__(self, model):
        self.model = model
        self.score = Score(self.model)

    def perform(self, node, inputs, outputs):
        (theta,) = inputs  # contains the vector of parameters
        llf = self.model.loglike(theta)
        outputs[0][0] = np.array(llf)  # output the log-likelihood

    def grad(self, inputs, g):
        # the method that calculates the gradients - it actually returns the
        # vector-Jacobian product - g[0] is a vector of parameter values
        (theta,) = inputs  # our parameters
        out = [g[0] * self.score(theta)]
        return out


class Score(tt.Op):
    itypes = [tt.dvector]
    otypes = [tt.dvector]

    def __init__(self, model):
        self.model = model

    def perform(self, node, inputs, outputs):
        (theta,) = inputs
        outputs[0][0] = self.model.score(theta)

5. NUTS を用いたベイズ推定¶

次のステップは、ベイズ推定のためのパラメータを設定し、事前分布を指定して実行することです。

[ ]:

# Set sampling params
ndraws = 3000  # number of draws from the distribution
nburn = 600  # number of "burn-in points" (which will be discarded)

さあ、楽しい部分です！推定すべきパラメータは3つあります： \(\phi\) 、 \(\theta_1\) 、そして \(\sigma\) です。最初の2つには情報を与えない一様事前分布を、最後の一つには逆ガンマ分布を使用します。その後、可能であれば私が持っているだけのコンピュータコアを使って推定を実行します。

[ ]:

# Construct an instance of the Theano wrapper defined above, which
# will allow PyMC3 to compute the likelihood and Jacobian in a way
# that it can make use of. Here we are using the same model instance
# created earlier for MLE analysis (we could also create a new model
# instance if we preferred)
loglike = Loglike(mod)

with pm.Model() as m:
    # Priors
    arL1 = pm.Uniform("ar.L1", -0.99, 0.99)
    maL1 = pm.Uniform("ma.L1", -0.99, 0.99)
    sigma2 = pm.InverseGamma("sigma2", 2, 4)

    # convert variables to tensor vectors
    theta = tt.as_tensor_variable([arL1, maL1, sigma2])

    # use a DensityDist (use a lamdba function to "call" the Op)
    pm.DensityDist("likelihood", loglike, observed=theta)

    # Draw samples
    trace = pm.sample(
        ndraws,
        tune=nburn,
        return_inferencedata=True,
        cores=1,
        compute_convergence_checks=False,
    )

NUTS サンプラーは勾配を提供したため、自動的に割り当てられたことに注意してください。 PyMC3 は、勾配が利用できない場合、 Metropolis サンプラーや Slicing サンプラーを使用します。「ブラックボックス」スタイルの計算において、1秒あたりのサンプル数は非常に印象的です！しかし、モデルが PyMC3 で直接表現できる場合（上記の AR(p) モデルのように）、計算はかなり速くなることに注意してください。

推論は完了しましたが、結果は良いのでしょうか？確認する方法はいくつかあります。最初の方法は、事後分布を確認することです（MLE値を示すライン付きで）。

[ ]:

plt.tight_layout()
# Note: the syntax here for the lines argument is required for
# PyMC3 versions >= 3.7
# For version <= 3.6 you can use lines=dict(res_mle.params) instead
_ = pm.plot_trace(
    trace,
    lines=[(k, {}, [v]) for k, v in dict(res_mle.params).items()],
    combined=True,
    figsize=(12, 12),
)

推定された事後分布は、明確に最尤推定（MLE）で得られたパラメータ付近でピークを示しています。また、推定された値の概要も確認できます。

[ ]:

pm.summary(trace)

ここで、 \(\hat{R}\) は Gelman-Rubin 統計量です。これは、複数のチェーン間の分散と各チェーン内の分散を比較することで収束の有無をテストします。収束が達成されていれば、チェーン間の分散とチェーン内の分散は一致するはずです。もし、すべてのモデルパラメータについて \(\hat{R}<1.2\) であれば、収束が達成されたとある程度確信できます。

さらに、最も高い事後密度区間（表における HPD の2つの値の間隔）は、各変数について小さいです。

6. パラメータのベイズ推定の適用¶

次に、ベイズ推定から得たパラメータを使用してモデルのバージョンを再実行し、再度、1ステップ先の予測をプロットします。

[ ]:

# Retrieve the posterior means
params = pm.summary(trace)["mean"].values

# Construct results using these posterior means as parameter values
res_bayes = mod.smooth(params)

predict_bayes = res_bayes.get_prediction()
predict_bayes_ci = predict_bayes.conf_int()
lower = predict_bayes_ci["lower CPIAUCNS"]
upper = predict_bayes_ci["upper CPIAUCNS"]

# Graph
fig, ax = plt.subplots(figsize=(9, 4), dpi=300)

# Plot data points
inf.plot(ax=ax, style="-", label="Observed")

# Plot predictions
predict_bayes.predicted_mean.plot(ax=ax, style="r.", label="One-step-ahead forecast")
ax.fill_between(predict_bayes_ci.index, lower, upper, color="r", alpha=0.1)
ax.legend(loc="lower left")
plt.show()

付録 A. `UnobservedComponents` モデルへの適用¶

上記で定義した Loglike と Score ラッパーを再利用して、別の状態空間モデルを考えることができます。例えば、インフレをランダムウォークのトレンドと自己回帰誤差項の組み合わせとしてモデル化したい場合、次のようなモデルを考えます：

\[\begin{split}\begin{aligned} y_t & = \mu_t + \varepsilon_t \\ \mu_t & = \mu_{t-1} + \eta_t \\ \varepsilon_t &= \phi \varepsilon_t + \zeta_t \end{aligned}\end{split}\]

このモデルは、 UnobservedComponents クラスを使用し、 rwalk と autoregressive 仕様を指定することでStatsmodelsで構築できます。前回と同様に、 fit メソッドを使用して最尤推定でモデルをフィットさせることができます。

[ ]:

# Construct the model instance
mod_uc = sm.tsa.UnobservedComponents(inf, "rwalk", autoregressive=1)

# Fit the model via maximum likelihood
res_uc_mle = mod_uc.fit()
print(res_uc_mle.summary())

前述のように、上記で作成したTheanoラッパー（LoglikeとScore）は汎用的であるため、ベイズ法を用いてモデルを探求するために基本的に同じコードを再利用することができます。

[ ]:

# Set sampling params
ndraws = 3000  # number of draws from the distribution
nburn = 600  # number of "burn-in points" (which will be discarded)

[ ]:

# Here we follow the same procedure as above, but now we instantiate the
# Theano wrapper `Loglike` with the UC model instance instead of the
# SARIMAX model instance
loglike_uc = Loglike(mod_uc)

with pm.Model():
    # Priors
    sigma2level = pm.InverseGamma("sigma2.level", 1, 1)
    sigma2ar = pm.InverseGamma("sigma2.ar", 1, 1)
    arL1 = pm.Uniform("ar.L1", -0.99, 0.99)

    # convert variables to tensor vectors
    theta_uc = tt.as_tensor_variable([sigma2level, sigma2ar, arL1])

    # use a DensityDist (use a lamdba function to "call" the Op)
    pm.DensityDist("likelihood", loglike_uc, observed=theta_uc)

    # Draw samples
    trace_uc = pm.sample(
        ndraws,
        tune=nburn,
        return_inferencedata=True,
        cores=1,
        compute_convergence_checks=False,
    )

以前と同様に、周辺事後分布をプロットすることができます。SARIMAX の例とは対照的に、ここでは事後モードが最尤推定（MLE）の推定値とは多少異なります。

[ ]:

plt.tight_layout()
# Note: the syntax here for the lines argument is required for
# PyMC3 versions >= 3.7
# For version <= 3.6 you can use lines=dict(res_mle.params) instead
_ = pm.plot_trace(
    trace_uc,
    lines=[(k, {}, [v]) for k, v in dict(res_uc_mle.params).items()],
    combined=True,
    figsize=(12, 12),
)

[ ]:

pm.summary(trace_uc)

[ ]:

# Retrieve the posterior means
params = pm.summary(trace_uc)["mean"].values

# Construct results using these posterior means as parameter values
res_uc_bayes = mod_uc.smooth(params)

このモデルの利点の一つは、 \(\mu_t\) の平滑化された推定値を使用して、インフレの基礎となる「レベル」を推定できることです。この推定値は、結果オブジェクトの states.smoothed 属性の「level」列としてアクセスできます。この場合、レベルの分散のベイズ事後平均がMLE推定よりも大きいため、推定されたレベルは少しボラティリティが高くなります。

[ ]:

# Graph
fig, ax = plt.subplots(figsize=(9, 4), dpi=300)

# Plot data points
inf["CPIAUCNS"].plot(ax=ax, style="-", label="Observed data")

# Plot estimate of the level term
res_uc_mle.states.smoothed["level"].plot(ax=ax, label="Smoothed level (MLE)")
res_uc_bayes.states.smoothed["level"].plot(ax=ax, label="Smoothed level (Bayesian)")

ax.legend(loc="lower left");

最終更新日: 2025年01月28日