3D surface plots of the strikezone - Carson's blog on R, RStudio, plotly, shiny, data visualization, statistics, etc

Over the past month, I’ve been working on plotly’s R package; and in particular, a new interface for creating plotly visualizations from R. I’m really excited about this project and I think it’s one of the most elegant, straight-forward ways to create interactive graphics that are easy to share. In this post, I’ll show you just how easy it is to create 3D surface plots of the strikezone using plotly.

Kernel Densities

The MASS package in R has a function called kde2d() which performs 2D density estimation (with a bivariate normal kernel)

data(pitches, package = "pitchRx")
dens <- with(pitches, MASS::kde2d(px, pz))

# plotly isn't available on CRAN, but u can install from GitHub
# devtools::install_github("ropensci/plotly@carson-dsl")
library(plotly)
with(dens, plot_ly(x = x, y = y, z = z, type = "surface"))

Although this plot is cool, we can’t perform any interesting statistical inference with it. All we can see is an estimated frequency.

Probabilistic Surfaces

Brian Mills and I have a number of posts/papers on using generalized additive models (GAMs) to model event probabilities over the strikezone. To keep things simple, we’ll stick with the example data, and model the probablity of a called strike by allowing it to vary by location and batter stance.

# condition on umpire decisions
noswing <- subset(pitches, des %in% c("Ball", "Called Strike"))
noswing$strike <- as.numeric(noswing$des %in% "Called Strike")
library(mgcv)
m <- bam(strike ~ s(px, pz, by = factor(stand)) + factor(stand), 
         data = noswing, family = binomial(link = 'logit'))

## Warning in attr(pterms[tind[j]], "term.label"): partial match of
## 'term.label' to 'term.labels'

Now we use the predict.gam() method to fit response values (for right handers) over a strike-zone grid.

px <- round(seq(-2, 2, length.out = 20), 2)
pz <- round(seq(1, 4, length.out = 20), 2)
dat <- expand.grid(px = px, pz = pz, stand = "R")
dat$fit <- as.numeric(predict(m, dat, type = "response"))

plotly’s z argument likes numeric matrices, so we need change the data structure accordingly.

z <- Reduce(rbind, split(dat$fit, dat$px))
plot_ly(x = px, y = pz, z = z, type = "surface")

It’s more interesting to look at the difference in fitted values for right/left handed batters:

dat2 <- expand.grid(px = px, pz = pz, stand = "L")
dat2$fit <- as.numeric(predict(m, dat2, type = "response"))
z <- Reduce(rbind, split(dat2$fit - dat$fit, dat2$px))
plot_ly(x = px, y = pz, z = z, type = "surface")