# Penalised spline regression

24 May 2016 | web: Sam Clifford | github: samcliffordtags: r, bayes, JAGS, regression |

Sometimes you don’t know the functional form of a regression relationship. In such an instance, the use of a penalised spline regression can help you model it without having a ridiculously wiggly smooth function.

# Motivation

Consider some simulated data for which it is not obvious what the functional form of the relationship is.

```
N <- 100
x <- sort(runif(n=N, min = 0, max = 1))
y <- sin(2*pi*x) - 5*x^2 + 3*x + rnorm(n=N, sd=0.25)
dat <- data.frame(y=y, x=x)
library(ggplot2)
gg.data <- ggplot(data=dat, aes(x=x, y=y)) +
geom_point( alpha=0.5) + theme_bw()
```

# Splines

In order to model the effect of *x* on *y* we may wish to fit a regression model. If we aren’t explicitly interested writing down a parametric equation, we can use a spline to flexibly model this relationship (Eilers and Marx 2010).

The choice of what kind of spline to use and how many basis splines can be important. Too many basis splines and we end up with a fitted smooth that is very wiggly; too few and we may not be able to capture the variability.

## B-splines

- The B-spline is a common choice for producing smooth functions (de Boor 1977)
- The P-spline (Eilers and Marx 1996) penalises changes in the second derivative of the B-spline
- Ruppert, Wand, and Carroll (2003) discuss choice of knots
- Lang and Brezger (2004) formulated this in a Bayesian framework
- Rue and Held (2005) discuss this as an intrinsic CAR prior

## The basis matrix

The basis matrix is calculated according to the formulation given by Eilers and Marx (1996). Here we choose a number of basis splines given by the advice in section 3 of Wand (2000).

```
d <- max(4, floor(N/35))
K <- floor(N/d - 1)
bspline <- function(x, K, bdeg=3, cyclic=FALSE, xl=min(x), xr=max(x)){
x <- as.matrix(x,ncol=1)
ndx <- K - bdeg
# as outlined in Eilers and Marx (1996)
dx <- (xr - xl) / ndx
t <- xl + dx * (-bdeg:(ndx+bdeg))
T <- (0 * x + 1) %*% t
X <- x %*% (0 * t + 1)
P <- (X - T) / dx
B <- (T <= X) & (X < (T + dx))
r <- c(2:length(t), 1)
for (k in 1:bdeg){
B <- (P * B + (k + 1 - P) * B[ ,r]) / k;
}
B <- B[,1:(ndx+bdeg)]
if (cyclic == 1){
for (i in 1:bdeg){
B[ ,i] <- B[ ,i] + B[ ,K-bdeg+i]
}
B <- B[ , 1:(K-bdeg)]
}
return(B)
}
X <- bspline(x, K, xl=0, xr=1)
```

We also want to predict, so need to re-use the same number of knots and endpoints from the original spline to set up a prediction matrix.

```
x.pred <- seq(0, 1, length.out=1001)
X.pred <- bspline(x.pred, K, xl=0, xr=1)
```

What does our basis look like?

## Formulating penalty matrix

For a B-spline with *K* basis splines, we need to differentiate a *K* × *K* identity matrix a number of times equal to the order of the penalty

We add a small value to the diagonal to ensure the matrix is symmetric, positive, definite

- we know the matrix is symmetric
- we
**need**eigenvalues real and greater than 0

```
makeQ = function(degree, K, epsilon=1e-3){
x <- diag(K)
E <- diff(x, differences=degree)
return( t(E) %*% E + x*epsilon)
}
Q <- makeQ(2, K)
```

Without the small amount on the diagonal

```
round(eigen(makeQ(2, K))$values, 5)
## [1] 15.85603 15.42770 14.73537 13.80996 12.69206 11.42942 10.07411
## [8] 8.67940 7.29673 5.97289 4.74756 3.65150 2.70540 1.91948
## [15] 1.29383 0.81951 0.48020 0.25428 0.11730 0.04434 0.01241
## [22] 0.00251 0.00100 0.00100
round(eigen(makeQ(2, K, epsilon=0))$values, 5)
## [1] 15.85503 15.42670 14.73437 13.80896 12.69106 11.42842 10.07311
## [8] 8.67840 7.29573 5.97189 4.74656 3.65050 2.70440 1.91848
## [15] 1.29283 0.81851 0.47920 0.25328 0.11630 0.04334 0.01141
## [22] 0.00151 0.00000 0.00000
```

we see eigenvalues of 0 (or close to) which leads to, at best, and ill-conditioned system or, at worst, a degenerate precision matrix that can’t be factorised

# Fitting the model

For our simple univariate spline, we fit the model

The following model code was used in JAGS to fit the above model:

```
model{
for (i in 1:n){
y[i] ~ dnorm(mu[i], tau.y)
mu[i] = X[i, ] %*% beta[1:K]
}
beta[1:K] ~ dmnorm(beta.0[1:K,1] + beta.00, lambda*Q[1:K,1:K])
gamma <- beta - beta.00
tau.y ~ dgamma(0.001, 0.001)
lambda ~ dgamma(0.001, 0.001)
beta.00 ~ dnorm(0, 1e-6)
for (j in 1:m){
y.rep[j] ~ dnorm(mu.rep[j], tau.y)
mu.rep[j] = X.pred[j, ] %*% gamma[1:K] + beta.00
}
}
```

## Results

If we didn’t penalise our spline, we would have the model

where *I*_{K} is the *K* × *K* identity matrix, and the resulting smooth would look like

This fitted smooth is much more wiggly than the penalised smooth

# Conclusion

B-splines are great for non-linear relationships without strict *a priori* assumptions about the functional form.

Penalisation smooths your smooths even further

- not as susceptible to local changes
- posterior distribution of
*λ*informed by data, not fixed and cross-validated as in Frequentist penalised likelihood smoothing

# References

de Boor, Carl. 1977. “Package for Calculating with B-Splines.” *SIAM Journal on Numerical Analysis* 14: 441–72.

Eilers, Paul H. C., and Brian D. Marx. 1996. “Flexible Smoothing with B-Splines and Penalties.” *Statistical Science* 11: 89–121.

———. 2010. “Splines, Knots and Penalties.” *Wiley Interdisciplinary Reviews: Computational Statistics* 2: 637–53. doi:10.1002/wics.125.

Lang, Stefan, and Andreas Brezger. 2004. “Bayesian P-Splines.” *Journal of Computational and Graphical Statistics* 13 (1): 183–212.

Rue, Havard, and Leonhard Held. 2005. *Gaussian Markov Random Fields: Theory and Applications*. CRC Press.

Ruppert, David, Matt P. Wand, and Raymond J. Carroll. 2003. *Semiparametric Regression*. Cambridge University Press.

Wand, Matt P. 2000. “A Comparison of Regression Spline Smoothing Procedures.” *Computational Statistics* 15: 443–62.