财务管理理论与实务

class: center, middle, inverse, title-slide

.title[
# 财务管理理论与实务
]
.subtitle[
## Time series
]
.author[
### 唐润宇
]
.date[
### 2023-10-26
]

---

# 时间序列

- 时间序列（time series）:是按照一定的时间区间进行索引的随机变量序列。

![](lecture4_files/figure-html/unnamed-chunk-1-1.png)

---
## 时间序列的构成因素

![](Figs/TS_component.png)

---
## 时间序列的分解模型

时间序列分析需要把**趋势(T)、季节变动(S)、周期波动(C)和随机波动(R)**这几种成分从时间序列中有目的的分离出来，或者所对数据进行分解、整理，并将它们的关系用一定的数学关系式进行表达，然后分别进行分析，即建立时间序列的分解模型。

按照四种成分对时间序列影响方式的不同，时间序列可分解为多种模型，比如加法模型，乘法模型等，其中比较常用的是乘法模型。

.pull-left[
乘法模型：

`$$Y_t = T_t \times S_t \times C_t \times R_t$$`
]
.pull-right[
加法模型：

`$$Y_t = T_t + S_t + C_t + R_t$$`
]

---
## 平稳时间序列的预测

**平稳序列（stationary series）**指的是不含趋势、季节变动和循环波动的序列，即其通常只包含随机成分。

平稳序列的数学定义(宽平稳)： `$x_1, x_2, \ldots, x_T$`

- `$E[X_t^2]<\infty, \forall t \in T$`,

- `$E[X_t] = \mu, \forall t \in T$` 其中 `$\mu$`为常数

- `$\gamma(t, s) = \gamma(k, k+s-t)$`, `$\forall t,s,k, k+s-t \in T$`.

其中 `$\gamma(t,s) = E[(X_t-\mu_t)(X_s - \mu_s)]$` 被称为自协方差.

**白噪声序列**: 纯随机序列 `$x_t \sim WN(0, \sigma^2)$`

- `$E[X_t] = \mu, \forall t \in T$` 其中 `$\mu$`为常数

- `$\gamma(t, s) = 0$`, if `$\forall t\neq s \in T$` and `$\gamma(t, s) = \sigma^2$`, if `$\forall t=s \in T$`.

???
白噪声序列一定是平稳序列，但是一般认为这种序列没有预测的必要。

---
## 平稳时间序列的预测

**常见方法:**

- 移动平均法(Moving Average):

`$$F_{t+1} = \frac{Y_{t-d+1} + Y_{t-d+2} + \ldots + Y_t}{d}$$`

- 加权移动平均法
$$ F_{t+1} = \omega_1 Y_1 + \omega_2 Y_2 + \ldots + \omega_t Y_t $$
其中  `$\sum_t \omega_t = 1$`.

- 指数平滑法:
$$ F_{t+1} = \alpha Y_t +  (1-\alpha) F_t$$

---
## 移动平均法

R中常见的处理时间序列的包： zoo, xts

```r
setwd("~/XJTU/课程/财务管理理论与实务/Codes")
library(tidyverse)
library(zoo)
data <- read.csv("../Data/002594.SZ.csv")
new_dat <- data |>
  mutate(Date=as.Date(Date, "%Y-%m-%d")) |>
  select(Date, Adj.Close)
ts <- read.zoo(new_dat)
```

---
##移动平均法

```r
ma5 <- rollmean(ts, 5, align = "right")
new <- merge(ma5, ts)
autoplot(new, facets=NULL)
```

```
## Warning: Removed 4 rows containing missing values (`geom_line()`).
```

![](lecture4_files/figure-html/unnamed-chunk-3-1.png)

---
##移动平均法

```r
ma50 <- rollmean(ts, 50, align = "right")
new <- merge(ma50, new)
autoplot(new, facets=NULL)
```

```
## Warning: Removed 53 rows containing missing values (`geom_line()`).
```

![](lecture4_files/figure-html/unnamed-chunk-4-1.png)

---
## 指数平滑法

```r
library(fpp2)
ses2 <- ses(new_dat$Adj.Close, alpha = 0.2)
autoplot(ses2)
```

![](lecture4_files/figure-html/unnamed-chunk-5-1.png)

---
## 评价预测方法

- 均方误差 Mean Squared Error(MSE)
$$ \frac{\sum_{t=1}^n (Y_t - F_t)^2}{n}$$
> or RMSE = `$\sqrt{\text{MSE}}$`

- 平均绝对误差 Mean Absolute Error(MAE)
$$ \frac{\sum_{t=1}^n |Y_t - F_t|}{n}$$

> or MAPE: `$$\frac{\sum_{t=1}^n |(Y_t - F_t)/Y_t|}{n}$$`

---
## 评价预测方法

使用RSME标准找到合适的 `$\alpha$`

```r
# identify optimal alpha parameter
alpha <- seq(.01, .99, by = .01)
RMSE <- NA
for(i in seq_along(alpha)) {
  fit <- ses(new_dat$Adj.Close, alpha = alpha[i], h = 100)
  RMSE[i] <- accuracy(fit$fitted, new_dat$Adj.Close)[2]
}

# convert to a data frame and idenitify min alpha value
alpha.fit <- tibble(alpha, RMSE)
alpha.min <- filter(alpha.fit, RMSE == min(RMSE))
```

---

使用RSME标准找到合适的 `$\alpha$`

```r
# plot RMSE vs. alpha
ggplot(alpha.fit, aes(alpha, RMSE)) +
  geom_line() +
  geom_point(data = alpha.min, aes(alpha, RMSE), size = 2, color = "blue")  
```

![](lecture4_files/figure-html/unnamed-chunk-7-1.png)

---
## 平稳序列的判断

- 白噪声检验

Box-Pierce or Ljung-Box检验：原假设是延迟阶数小于等于m期的序列值之间没有相关性

```r
Box.test(new_dat$Adj.Close, type="Box-Pierce")
```

```
## 
## 	Box-Pierce test
## 
## data:  new_dat$Adj.Close
## X-squared = 210.18, df = 1, p-value < 2.2e-16
```

```r
Box.test(new_dat$Adj.Close, type="Ljung-Box")
```

```
## 
## 	Box-Ljung test
## 
## data:  new_dat$Adj.Close
## X-squared = 212.79, df = 1, p-value < 2.2e-16
```
???
推荐LB检验，因为BP不适合小样本场合 （30以内）

---
## ARMA

Linear combination of white noise `$\epsilon\sim N(0, \sigma^2)$`

- MA(q) 
 `$$x_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \ldots + \theta_q \epsilon_{t-q}$$`

- AR(p)
 `$$x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \ldots + \phi_p x_{t-p} + \epsilon_t$$`

- ARMA(p, q)
  `$$x_t = \phi_1 x_{t-1} + \ldots + \phi_p x_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \ldots + \theta_q \epsilon_{t-q}$$`
---
## ARMA模型的确定

- autocorrelation function (ACF)

```r
acf(new_dat$Adj.Close, main="ACF of BYD")
```

![](lecture4_files/figure-html/unnamed-chunk-9-1.png)

???
ACF explains how the present value of a given time series is correlated with the past (1-unit past, 2-unit past, …, n-unit past) values. In the ACF plot, the y-axis expresses the correlation coefficient whereas the x-axis mentions the number of lags. Assume that, y(t-1), y(t), y(t-1),….y(t-n) are values of a time series at time t, t-1,…,t-n, then the lag-1 value is the correlation coefficient between y(t) and y(t-1), lag-2 is the correlation coefficient between y(t) and y(t-2) and so on.

---
## ARMA模型的确定

- partial autocorrelation function (PACF)

```r
pacf(new_dat$Adj.Close, main="PACF of BYD")
```

![](lecture4_files/figure-html/unnamed-chunk-10-1.png)

???
PACF就是控制了Lag=1…k-1的那部分影响

---
## ARMA模型的确定

|模型 |ACF|PACF|
|------|------|------|
|AR(P) | 拖尾 |P阶截尾|
|MA(q) |q阶截尾|拖尾|
|ARMA(p,q) |拖尾 |拖尾|

&nbsp;

所以对于比亚迪的数据我们应该选择AR(1)模型~

**注意：** ARMA模型只适用于平稳序列！

**注意：** 画图只是一种辅助手段，我们也可以使用AIC/BIC准则辅助判断
---
## 平稳序列的检测

- Dickey-Fuller Test: 原假设是该序列不平稳(有趋势性)

```r
library(tseries)
adf.test(new_dat$Adj.Close)
```

```
## 
## 	Augmented Dickey-Fuller Test
## 
## data:  new_dat$Adj.Close
## Dickey-Fuller = -2.9462, Lag order = 6, p-value = 0.1778
## alternative hypothesis: stationary
```

- KPSS TEST: 原假设是该序列trend stationary

```r
kpss.test(new_dat$Adj.Close, null="Trend")
```

```
## 
## 	KPSS Test for Trend Stationarity
## 
## data:  new_dat$Adj.Close
## KPSS Trend = 0.18388, Truncation lag parameter = 4, p-value = 0.02205
```

---
## ARIMA(p, d, q)

相比于ARMA，多考虑了d阶差分。

通过d阶差分后一般可以将非平稳序列转化为平稳序列。

```r
library(forecast)
byd.arima <- auto.arima(new_dat$Adj.Close)
byd.arima
```

```
## Series: new_dat$Adj.Close 
## ARIMA(4,1,1) 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ma1
##       1.0179  -0.1395  0.0868  -0.0368  -0.9887
## s.e.  0.0657   0.0922  0.0941   0.0666   0.0187
## 
## sigma^2 = 22.15:  log likelihood = -716.1
## AIC=1444.21   AICc=1444.57   BIC=1465.14
```

---
## ARIMA(p, d, q)

```r
checkresiduals(byd.arima)
```

![](lecture4_files/figure-html/unnamed-chunk-14-1.png)

```
## 
## 	Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,1)
## Q* = 3.2524, df = 5, p-value = 0.6611
## 
## Model df: 5.   Total lags used: 10
```

---
## ARIMA 预测

```r
byd.fc <- forecast(byd.arima)
autoplot(byd.fc)
```

![](lecture4_files/figure-html/unnamed-chunk-15-1.png)

---
## 时间序列数据的处理

```r
library(Ecdat)
data(AirPassengers)
ts_air = ts(AirPassengers, frequency = 12, start = 1949)
decompose_air = decompose(ts_air, "multiplicative") # plotable
adjust_air = ts_air / decompose_air$seasonal
# decompose_air = decompose(ts_air, "additive")
# adjust_air = ts_air - decompose_air$seasonal
plot(ts_air, col="grey")
lines(adjust_air, col = "blue", lwd = 2)
```

![](lecture4_files/figure-html/unnamed-chunk-16-1.png)

---
## 时间序列数据的预测
Holt-Winters 方法

```r
forecast = HoltWinters(ts_air)
plot(forecast)
```

![](lecture4_files/figure-html/unnamed-chunk-17-1.png)

---
## 时间序列数据的预测

也可以使用机器学习的方法进行预测~

![](Figs/forecast.png)

---
# 面板数据 (Panel data)

经济数据有截面数据（cross sectional data）、时序数据（time series data）和面板数据（panel data）三种类型。截面数据是A和B比，时序数据是以前的A和现在的A比，这两种数据都是一维的。而面板数据是二维数据，既有截面维度（n个个体），也有时间维度（T个时期）https://zhuanlan.zhihu.com/p/356250433

![](Figs/panel.png)

---
## 面板回归模型设定

- 截面数据的线性回归
 $$ y_i = \mathbf{x_i' \beta} + \epsilon_i$$
 
- 加入时间维度的**个体效应模型**

`$$y_{it} = \mathbf{x_{it}' \beta} + u_i + \epsilon_{it}$$`
其中 `$u_i$`为不随时间变化但是随个体变化的因素
 
常见的三种估计策略

> 混合效应模型(Pooled)： 忽略个体效应

> 固定效应模型(Fixed-effect)：假设异质性截距是非随机的, `$u_i$` 与某个解释变量相关

> 随机效应模型(Random-effect): 假设异质性截距是随机的, `$u_i$` 与所有解释变量都不相关

---
## 面板回归 in R
混合效应

.scroll-output[

```r
data <- read.csv("../Data/FS_Comins_small.csv")
pool_lm <- lm(OpeRev~RDCost, data=data)
summary(pool_lm)
```

```
## 
## Call:
## lm(formula = OpeRev ~ RDCost, data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -9.724e+11 -1.412e+09 -1.825e+08  3.365e+08  2.671e+12 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.596e+08  8.558e+07  -4.202 2.65e-05 ***
## RDCost       5.072e+01  1.301e-01 389.964  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.439e+10 on 167247 degrees of freedom
## Multiple R-squared:  0.4762,	Adjusted R-squared:  0.4762 
## F-statistic: 1.521e+05 on 1 and 167247 DF,  p-value: < 2.2e-16
```
]

---
## 面板回归 in R
固定效应

.scroll-output[

```r
library(plm)
data |>
  select(-X) |>
  group_by(Name, Date) |>
  filter(row_number() == 1) -> new_data
fe_lm <- plm(OpeRev~RDCost, data=new_data,
             index=c("Name", "Date"), model="within")
summary(fe_lm)
```

```
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = OpeRev ~ RDCost, data = new_data, model = "within", 
##     index = c("Name", "Date"))
## 
## Unbalanced Panel: n = 5954, T = 1-26, N = 99887
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -7.26e+11 -3.76e+08  3.56e+07  0.00e+00  4.60e+08  1.94e+12 
## 
## Coefficients:
##        Estimate Std. Error t-value  Pr(>|t|)    
## RDCost 38.84013    0.13938  278.67 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    6.5036e+25
## Residual Sum of Squares: 3.5602e+25
## R-Squared:      0.45257
## Adj. R-Squared: 0.41788
## F-statistic: 77656.8 on 1 and 93932 DF, p-value: < 2.22e-16
```
]

---
## 面板回归 in R
随机效应
.scroll-output[

```r
re_lm <- plm(OpeRev~RDCost, data=new_data,
             index=c("Name", "Date"), model="random",
             random.method = "ht")
summary(re_lm)
```

```
## Oneway (individual) effect Random Effect Model 
##    (Hausman-Taylor's transformation)
## 
## Call:
## plm(formula = OpeRev ~ RDCost, data = new_data, model = "random", 
##     random.method = "ht", index = c("Name", "Date"))
## 
## Unbalanced Panel: n = 5954, T = 1-26, N = 99887
## 
## Effects:
##                     var   std.dev share
## idiosyncratic 3.790e+20 1.947e+10 0.506
## individual    3.697e+20 1.923e+10 0.494
## theta:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8052  0.8052  0.8052  0.8052  0.8052  0.8052 
## 
## Residuals:
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -6.10e+11 -6.63e+08 -2.38e+08  0.00e+00  1.38e+08  2.10e+12 
## 
## Coefficients:
##               Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept) 1.2171e+09 3.1666e+08   3.8437 0.0001212 ***
## RDCost      3.9396e+01 1.3664e-01 288.3270 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    6.9356e+25
## Residual Sum of Squares: 3.7852e+25
## R-Squared:      0.45423
## Adj. R-Squared: 0.45423
## Chisq: 83132.5 on 1 DF, p-value: < 2.22e-16
```
]

---
## RE or FE

**When to use which?**

固定效应允许 `$u_i$` 和 `$x$` 任意相关，而随机效应不然， 通常认为在其他条件不变时 FE 是更令人信服的工具。

如果使用随机效应，在解释变量中包含的非时变控制变量应尽量多（FE则不必如此）。

常见的情况是，研究者应同时使用随机效应和固定效应，然后规范地检验解释变量系数的统计显著性差别。

.pull-right[
--- 伍德里奇《计量经济学导论》
]

---
## PLM

![](Figs/panel_relation.jpg)