9 min read

Upgrading to plotly 4.0 (and above)

I’m excited to announce that plotly’s R package just sent its first CRAN update in nearly four months. To install the update, run install.packages("plotly").

This update has breaking changes, enables new features, fixes numerous bugs, and takes us from version 3.6.0 to 4.5.2. To see all the changes, I encourage you to read the NEWS file. In this post, I’ll highlight the most important changes, explain why they needed to happen, and provide some tips for fixing errors brought about by this update. As you’ll see, this update is mostly about improving the plot_ly() interface, so ggplotly() users won’t notice much (if any) change. I’ve also started a plotly for R book which provides more narrative than the documentation on https://plot.ly/r (which is now updated to 4.0), more recent examples, and features exclusive to the R package. The first three chapters are nearly finished and replace the package vignettes. The later chapters are still in their beginning stages – they discuss features that are still under development, but I plan adding stability, and more documentation in the coming months.

Formula mappings

In the past, you could use an expression to reference variable(s) in a data frame, but this no longer works. Consequently, you might see an error like this when you update:

library(plotly)
plot_ly(mtcars, x = mpg, y = sqrt(wt))
## Error in plot_ly(mtcars, x = mpg, y = sqrt(wt)): object 'wt' not found

plot_ly() now requires a formula (which is basically an expression, but with a ~ prefixed) when referencing variables. You do not have to use a formula to reference objects that exist in the namespace, but I recommend it, since it helps populate sensible axis/guide title defaults (e.g., compare the output of plot_ly(z = volcano) with plot_ly(z = ~volcano) ).

plot_ly(mtcars, x = ~mpg, y = ~sqrt(wt))

There are a number of technical reasons why imposing this change from expressions to formulas is a good idea. If you’re interested in the details, I recommend reading Hadley Wickham’s notes on non-standard evaluation, but here’s the gist of the situation:

  1. Since formulas capture the environment in which they are created, we can be confident that evaluation rules are always correct, no matter the context.
  2. Compared to expressions/symbols, formulas are easier to program with, which makes writing custom functions around plot_ly() easier.
myPlot <- function(x, y, ...) {
  plot_ly(mtcars, x = x, y = y, color = ~factor(cyl), ...)
}
myPlot(~mpg, ~disp, colors = "Dark2")

Also, it’s fairly easy to convert a string to a formula (e.g., as.formula("~sqrt(wt)")). This trick can be quite useful when programming in shiny (and a variable mapping depends on an input value).

Smarter defaults

Instead of always defaulting to a “scatter” trace, plot_ly() now infers a sensible trace type (and other attribute defaults) based on the information provided. These defaults are determined by inspecting the vector type (e.g., numeric/character/factor/etc) of positional attributes (e.g., x/y). For example, if we supply a discrete variable to x (or y), we get a vertical (or horizontal) bar chart:

subplot(
  plot_ly(diamonds, y = ~cut, color = ~clarity),
  plot_ly(diamonds, x = ~cut, color = ~clarity),
  margin = 0.07
) %>% hide_legend() 

Or, if we supply two discrete variables to both x and y:

plot_ly(diamonds, x = ~cut, y = ~clarity)

Also, the order of categories on a discrete axis, by default, is now either alphabetical (for character strings) or matches the ordering of factor levels. This makes it easier to sort categories according to something meaningful, rather than the order in which the categories appear (the old default). If you prefer the old default, use layout(categoryorder = "trace")

library(dplyr)
# order the clarity levels by their median price
d <- diamonds %>%
  group_by(clarity) %>%
  summarise(m = median(price)) %>%
  arrange(m)
diamonds$clarity <- factor(diamonds$clarity, levels = d[["clarity"]])
plot_ly(diamonds, x = ~price, y = ~clarity, type = "box")

plot_ly() now initializes a plot

Previously plot_ly() always produced at least one trace, even when using add_trace() to add on more traces (if you’re familiar with ggplot2 lingo, a trace is similar to a layer). From now on, you’ll have to specify the type in plot_ly() if you want it to always produce a trace:

subplot(
  plot_ly(economics, x = ~date, y = ~psavert, type = "scatter") %>% 
    add_trace(y = ~uempmed) %>%
    layout(yaxis = list(title = "Two Traces")),
  plot_ly(economics, x = ~date, y = ~psavert) %>% 
    add_trace(y = ~uempmed) %>% 
    layout(yaxis = list(title = "One Trace")),
  titleY = TRUE, shareX = TRUE, nrows = 2
) %>% hide_legend()

Why enforce this change? Often times, when composing a plot with multiple traces, you have attributes that are shared across traces (i.e., global) and attributes that are not. By allowing plot_ly() to simply initialize the plot and define global attributes, it makes for a much more natural to describe such a plot. Consider the next example, where we declare x/y (longitude/latitude) attributes and alpha transparency globally, but alter trace specific attributes in add_trace()-like functions. This example also takes advantage of a few other new features:

  1. The group_by() function which defines “groups” within a trace (described in more detail in the next section).
  2. New add_*() functions which behave like add_trace(), but are higher-level since they assume a trace type, might set some attribute values (e.g., add_marker() set the scatter trace mode to marker), and might trigger other data processing (e.g., add_lines() is essentially the same as add_paths(), but guarantees values are sorted along the x-axis).
  3. Scaling is avoided for “AsIs” values (i.e., values wrapped with I()) which makes it easier directly specify a constant value for a visual attribute(s) (as opposed to mapping data values to visuals).
  4. More support for R’s graphical parameters such as pch for symbols and lty for linetypes.
map_data("world", "canada") %>%
  group_by(group) %>%
  plot_ly(x = ~long, y = ~lat, alpha = 0.1) %>%
  add_polygons(color = I("black"), hoverinfo = "none") %>%
  add_markers(color = I("red"), symbol = I(17),
              text = ~paste(name, "
", pop), hoverinfo = "text", data = maps::canada.cities) %>% hide_legend()

New interpretation of group

The group argument in plot_ly() has been removed in favor of the group_by() function. In the past, the group argument incorrectly created multiple traces. If you want that same behavior, use the new split argument, but groups are now used to define “gaps” within a trace. This is more consistent with how ggplot2’s group aesthetic is translated in ggplotly(), and is much more efficient than plotting a trace for each group.

txhousing %>%
  group_by(city) %>%
  plot_ly(x = ~date, y = ~median) %>%
  add_lines(alpha = 0.3)

The default hovermode (compare data on hover) isn’t super useful here since we have only 1 trace to compare, so you may want to add layout(hovermode = "closest") when using group_by(). If you’re group sizes aren’t that large, you may want to use split to generate one trace per group, then set a constant color (using the I() function to avoid scaling).

txhousing %>%
  plot_ly(x = ~date, y = ~median) %>%
  add_lines(split = ~city, color = I("steelblue"), alpha = 0.3)

In the coming months, we will have better ways to identify/highlight groups to help combat overplotting (see here for example). This same interface can be used to coordinate multiple linked plots, which is a powerful tool for exploring multivariate data and presenting multivariate results (see here and here for examples).

New plotly object representation

Prior to version 4.0, plotly functions returned a data frame with special attributes attached (needed to track the plot’s attributes). At the time, I thought this was the right way to enable a “data-plot-pipeline” where a plot is described as a sequence of visual mappings and data manipulations. For a number of reasons, I’ve been convinced otherwise, and decided the central plotly object should inherit from an htmlwidget object instead. This change does not destroy our ability to implement a “data-plot-pipeline”, but it does, in a sense, constrain the set manipulations we can perform on a plotly object. Below is a simple example of transforming the data underlying a plotly object using dplyr’s mutate() and filter() verbs (the plotly book has a whole section on the data-plot-pipeline, if you’d like to learn more).

library(dplyr)
economics %>%
  plot_ly(x = ~date, y = ~unemploy / pop, showlegend = F) %>%
  add_lines(linetype = I(22)) %>%
  mutate(rate = unemploy / pop) %>% 
  slice(which.max(rate)) %>%
  add_markers(symbol = I(10), size = I(50)) %>%
  add_annotations("peak")

In this context, I’ve often found it helpful to inspect the (most recent) data associated with a particular plot, which you can do via plotly_data()

diamonds %>%
  group_by(cut) %>%
  plot_ly(x = ~price) %>%
  plotly_data()
## Source: local data frame [53,940 x 10]
## Groups: cut [5]
## 
##    carat       cut color clarity depth table price     x     y     z
##                   
## 1   0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2   0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3   0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4   0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5   0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6   0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
## 7   0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47
## 8   0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
## 9   0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
## 10  0.23 Very Good     H     VS1  59.4    61   338  4.00  4.05  2.39
## # ... with 53,930 more rows

To keep up to date with currently supported data manipulation verbs, please consult the help(reexports) page, and for more examples, check out the examples section under help(plotly_data).

This change in the representation of a plotly object also has important implications for folks using plotly_build() to “manually” access or modify a plot’s underlying spec. Previously, this function returned the JSON spec as an R list, but it now returns more “meta” information about the htmlwidget, so in order to access that same list, you have to grab the “x” element. The new as_widget() function (different from the now deprecated as.widget() function) is designed to turn a plotly spec into an htmlwidget object.

# the style() function provides a more elegant way to do this sort of thing,
# but I know some people like to work with the list object directly...
pl <- plotly_build(qplot(1:10))[["x"]]
pl$data[[1]]$hoverinfo <- "none"
as_widget(pl)

Conclusion

The latest CRAN release upgrades plotly’s R package from version 3.6.0 to 4.5.2. This upgrade includes a number of breaking changes, as well as a ton of new features and bug fixes. The time spent upgrading your code will be worth it as enables a ton of new features. It also provides a better foundation for advancing the plot_ly() interface (not to mention the linked highlighting stuff we have on tap). This post should provide the information necessary to fix these breaking changes, but if you have any trouble upgrading, please let us know on http://community.plot.ly. Happy plotting!