.data: an unfortunate name for a column

R
tidyverse
Author

Richard J. Telford

Published

February 21, 2024

A colleague came to ask for help today with some code that worked perfectly when run with base R, but failed with an impenetrable error when run through tidyverse.

Here is a minimal example of the set up, a tibble with a column called .data

library(tidyverse)

df <- tibble(a = LETTERS[1:5], .data = rnorm(5))
df
# A tibble: 5 × 2
  a      .data
  <chr>  <dbl>
1 A     -0.700
2 B      0.892
3 C      1.50 
4 D      0.731
5 E     -0.440

.data is a legal name for a column according to R’s rules for naming objects, and everything appears to be fine.

What could possibly go wrong? Let’s try filtering the column .data to keep just the positive values, first with base R.

df[df$.data > 0, ]
# A tibble: 3 × 2
  a     .data
  <chr> <dbl>
1 B     0.892
2 C     1.50 
3 D     0.731

Perfect. And now with dplyr::filter

df |> filter(.data > 0)
Error in `filter()`:
ℹ In argument: `.data > 0`.
Caused by error:
! 'list' object cannot be coerced to type 'double'

List object cannot be coerced to type double? But the column .data is already a double.

It took me a while to workout what was going on, but eventually I remembered that .data is a pronoun in tidyverse (see ?rlang:::.data), used mainly when writing functions using tidyverse. Changing the column name to data fixed the problem.

So what does .data do? Consider this code

b <- "fish"
df <- tibble(a = letters[1:3], b = 1:3)

df <- df |> mutate(c = b)

What will column c contain? The values 1 to 5 from the column b or the word “fish”? Let’s have a peek.

df
# A tibble: 3 × 3
  a         b     c
  <chr> <int> <int>
1 a         1     1
2 b         2     2
3 c         3     3

It took the values from the column. If we wanted to be explicit, we could write

# take b from the column
df |> mutate(c = .data$b)
# A tibble: 3 × 3
  a         b     c
  <chr> <int> <int>
1 a         1     1
2 b         2     2
3 c         3     3
# take b from the environment with the brace-brace operator
df |> mutate(c = {{b}})
# A tibble: 3 × 3
  a         b c    
  <chr> <int> <chr>
1 a         1 fish 
2 b         2 fish 
3 c         3 fish 
# take b from the environment with the .env pronoun 
df |> mutate(c = .env$b)
# A tibble: 3 × 3
  a         b c    
  <chr> <int> <chr>
1 a         1 fish 
2 b         2 fish 
3 c         3 fish 

It also useful to use the .data pronoun when writing packages otherwise you get notes from R CMD check.

In short, while .data or .env are legal names, they break tidyverse code, so don’t call data.frame columns .data or .env if you ever want to use tidyverse functions.