Variables
Overview
A variable is a named reference to a value. Variables have types that define what type of data they hold.
In R, there 5 basic variable types: character, integer, double, logical, and complex. Additionally, you might run into the structures raw, list, closure, special, builtin, environment, and S4.
Basic Variables
Character
In R, a string is technically just a vector of characters. Characters start and end with double quotes ("
), or single quotes ('
). Strings aren’t as heavily utilized in R, as its functionality focuses more on statistical analysis.
typeof('my_character_vector')
typeof("my_character_vector")
'character' 'character'
Integer
An integer is a numeric value without a fractional part (no decimal). In many dynamic languages, when you create a variable to hold a number without a decimal point, it will be an integer. For example, when using Python, the line variable = 5
will result in variable
having type integer, while the line variable = 5.0
will result in variable
having type float (Python’s version of a decimal number).
In R, however, this is not the case. In R, 5 and 5.0 will both be stored as doubles unless explicitly set as integers using as.integer()
.
my_variable <- 5
typeof(my_variable)
my_variable <- 5.0
typeof(my_variable)
'double' 'double'
my_variable <- as.integer(5)
typeof(my_variable)
'integer'
A key exception to this rule is when creating a vector of integers.
test <- 1:5
typeof(test)
test <- 1.2:4
typeof(test)
'integer' 'double'
Double
A double is a numeric value with a fractional part. As shown above, R will cast all numeric inputs to doubles, and you can verify this using typeof
. To try and convert a variable to a double, you can use the as.double()
function — notably, as.double()
will convert a character vector to a double vector if possible.
char_to_double <- as.double("4.0")
typeof(char_to_double)
'double'
Logical
Logical constants refer to the boolean values of TRUE
, FALSE
, and the bonus NA
.
You might see |
Logical values are critical in R. They are constantly used when indexing vectors and data.frames
. Remember that the result of a comparison using logical operators is a vector of logical values. For example, the following code will return all values in our vector that are greater than 5.
vec <- 1:10
vec[vec > 5]
The inner statement vec > 5
will only give you a logical vector, indicating TRUE
or FALSE
depending on the statement’s truth at each index:
vec <- 1:10
vec > 5
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
So, vec[vec > 5]
is really equivalent to the following.
vec[c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE)]
What if we wanted to count the number of even values in our vector? One may consider the following code:
vec <- 1:10
length(vec[vec %% 2 == 0])
5
However, a more succint version of this code is as follows.
vec <- 1:10
vec[vec %% 2 == 0]
[1] 2 4 6 8 10
As you can see, because R coerces the logical values to be 1 for TRUE
and 0 for FALSE
, we can easily discern the even values in our vector.
Other Structures
Complex
While it is less likely that you will need to use complex numbers in R, they have first-class support. Complex numbers are a pair of real numbers, the real and imaginary parts of which are separated by a comma. For example, the following code shows a few ways to create a complex number.
0i ^ (-3:3)
1i^2
R (and other programming languages) consider any variables that start with a number to be a number, unless surrounded by quotes. As such, inputting 1i
tells the program that i
is an imaginary number.
Be careful, however, as code that you may expect to produce a complex number might not do that:
sqrt(-1)
Warning message: In sqrt(-1) : NaNs produced
You can read more about complex numbers in R here.
Coercion
Coercion is the process of changing the type of a variable, either explicitly by using a special function or implicitly by performing an operation on a variable of one type, when the operation was meant for another type. The following is an example of coercion:
typeof(paste("test", 5.0))
'character'
Here, 5.0 is a double, and "test" is a character vector. paste
is a function expecting character vector(s) as input, and returns the concatenation of the input vectors. We instead passed a character vector and a double, so R intelligently coerced the double to be a character so the operation will completed successfully.
In general, R will coerce types from more to less specific. In the above example, the coercion of 5.0 makes sense — it’s easy to consider 5.0 as the string "5.0", while it’s hard to turn "test" into a double. Another example is the following:
my_integer <- as.integer(5)
my_double <- 7.0
my_result <- my_integer + my_double
typeof(my_result)
'double'
This logic is important for preventing the loss of data — the number 5.12345 cannot be stored as an integer without losing information.
Factors
A factor is R’s way of representing a categorical variable. There are elements in a factor (just like there are elements in a vector), but they are constrained to only be chosen from a specific set of values, called "levels". They are useful when a vector has only a few different values — "Male"/"Female" or "A"/"B"/"C".
There is the factor()
function that is used to cast variables as factors, the is.factor()
function to test if a variable is a factor, and the levels()
function to list all of the factors for a variable.
Examples
How do I test whether or not a vector is a factor?
Click to see solution
test_factor <- factor("Male")
is.factor(test_factor)
[1] TRUE
List the levels we have in vec
.
Click to see solution
vec <- factor(c("Male", "Female", "Female"))
levels(vec)
[1] "Female" "Male"
How can I rename the levels of a factor?
Click to see solution
vec <- factor(c("Male", "Female", "Female"))
levels(vec)
[1] "Female" "Male"
levels(vec) <- c("F", "M")
vec
[1] M F F Levels: F M
# be careful! Order matters, this is wrong:
vec <- factor(c("Male", "Female", "Female"))
levels(vec)
[1] "Female" "Male"
# here we incorrectly rename "Female"'s to "M" instead of "F"
levels(vec) <- c("M", "F")
vec
[1] F M M Levels: M F
Dates
Date
is a class which allows you to perform special operations like subtraction, where the number of days between dates are returned. Or addition, where you can add 30 to a Date and a Date is returned where the value is 30 days in the future.
You will usually need to specify the "format" argument based on the format of your date strings.
For example, if you had a string "07/05/1990", the format would be: %m/%d/%Y
, where %m
matches a zero-padded month value, /’s match literal `/’s, `%d
matches a zero-padded day value, and %Y
matches a 4 digit year in the format YYYY. If your string was 31-12-90
, the format string would be %d-%m-%y
. Replace %d, %m, %Y, and %y according to your date strings. A full list of formats can be found here.
Working with dates can be difficult and confusing. See here for more information about a package called lubridate
which provides a much easier interface to working with dates.
Examples
How do I convert a string "07/05/1990" to a Date
?
Click to see solution
my_string <- "07/05/1990"
my_date <- as.Date(my_string, format="%m/%d/%Y")
my_date
[1] "1990-07-05"
How do I convert a string "31-12-1990" to a Date
?
Click to see solution
my_string <- "31-12-1990"
my_date <- as.Date(my_string, format="%d-%m-%Y")
my_date
[1] "1990-12-31"
NA
, NaN
, and NULL
NA
-
NA
stands for not available. In general, this represents a missing value or a lack of data. Technically,NA
is a logical value. You can test this with the following code.
class(NA)
NaN
-
NaN
stands for not a number. This is a special value that is used to indicate that there is a result, it just cannot be represented as a number (for example the result of 0/0). Technically,NaN
is a double value. You can test this with the following code.
class(NaN)
NULL
-
If you have an understanding of
NULL
from other programming languages, you can carry it over to R. Otherwise, it is safe to think ofNULL
as something that is neitherTRUE
norFALSE
. Technically,NULL
is its own thing. It is not a logical value, double value, etc.NULL
is commonly used to represent an empty object or something that exists but isn’t really defined. When trying to distinguish betweenNA
andNULL
, think ofNA
as a missing value, andNULL
as an undefined value.
Examples
How do I tell if a value is NA
?
Click to see solution
# test if a value is NA.
value <- NA
is.na(value)
[1] TRUE
# does is.nan return TRUE for NA?
is.nan(value)
[1] FALSE