Data types

by ellen

7 minute read

In this installment of the Learn R series, we’re going to start to have a look at data types, dataframes and what on earth you do with them!

Have a watch of this short video and then consolidate everything we cover by reading through the post and playing along. You can take the code blocks included in the post and try them out in your own script - you should be all set up with R and RStudio, so there’s no time like the present!

R data types

When we think of different bits of data, some of it might be numbers, text, dates, and more. R has it’s own set of these data types. Before we get into the data types, let’s see how we can get R to tell us what something is.

R uses functions (basically, an inbuilt bit of code that you can call on to do things with, optionally passing it some data to work with) to take some inputs and get an output. The function that we can pass a value to, and get what data type it is as the output, is the class() function.

Have a look at these three examples - what classes do you get? Can you see why they are different?

class(1)
class(1.1)
class("1")

You can use this class() function if you’re ever unsure what data type something is. This is great for when you’re getting unexpected results!

Numbers

Numbers are split into a few different types:

  • integers are whole numbers like 1 or 42[1]
  • numerics are numbers that have a decimal portion associated with them like 1.0 or 3.133[2]
  • complex numbers are numbers that make use of the imaginary number i like 4i[3]

Converting to numbers

The functions as.numeric() and as.integer() allow you to convert something stored as text into a number.

These functions will give you some red text as a warning if you attempt to convert something to a number that can’t be safely converted. It will still attempt to perform the conversion, but return missings (NA) instead of actual values.

  • Which of the three datatypes above would you apply this to? Add the code as a new line to your script and see if you are right.

Checking numbers

Now here’s something we didn’t cover in the video and is especially helpful if something just WILL NOT work and you’ve spent all morning panic eating biscuits.

You can write checks to see if something is numeric, or an integer, with is.numeric() or is.integer().

The general “‘is.XXXXX()’” function will take many of the data types we cover here and more, and can be a real time/life saver.

We could also use class() here and inspect the result.^[You might recall that class(1) had the result of “numeric” - R was not by default considering 1 as an integer for the purpose of the class() function. ### Special numbers As well as i to denote imaginary numbers, there are some additional symbols you might encounter or want to use.

  • pi = 3.1415927
  • Inf represents positive infinity. You’ll often see this if you divide a positive number by zero
  • -Inf represents negative infinity. You’ll often see this if you divide a negative number by zero
  • NaN is what happens when you really screw up a calculation and do something like 0/0. It means the result is not a number!

Text

Text, also known as strings, is split up into two core types:

  • characters are text as we typically think of it like “red”
  • factors (and the subtype ordered factors) are a text type where the number of unique values is constrained e.g. the values are selected from a dropdown. Factors work by storing numbers that correspond to our values and then printing these values. This is much more space efficient when the number of unique values is low.[4]

We are going to focus on characters. This translates much better in written form than in video - so this is another new thing.

In R, you can’t just type some text as it will be construed as an object or function name. To delimit a string you can use speech marks (") or apostrophes (') at the beginning and end of it to show where it starts and ends. These are the text delimiters in R.

Note you can’t use the two delimiters interchangeably e.g. “red’, but you can use them together to enable you have speech marks or apostrophes inside a string e.g. 'They said "Read this"' or "It's mine now".

If you need to have both inside a string you can escape the ones on the inside of a string to say they don’t count as text delimiters. To escape a delimiter you can use a backslash(\) e.g. "They said \"Read this\"".

Beware the copy and paste here - sometimes this can really mess your code up, and if it’s not working for no obvious reason, try tidying up all your speech marks and apostrophes.

Converting to strings

Converting to characters and factors is the same as working with numbers. You swap “numeric” for “character” or “factor” and you’re done!

Add another line to your script and try it out.

Similarly to checking numbers, the same rule applies to checking strings - I’m not going to give it away - have a go!

Logical values

Whilst we’ve been testing our datatypes, we’ve created a lot of logical or boolean values. Boolean values are TRUE and FALSE. R is case-sensitive so these have to be typed upper-case, otherwise it means something different.

I bet you can totally guess how to convert and check logicals. Add it to your script!

We quickly touched on dates in the video, but refer to this below for a deeper explanation of date handling in R.

Dates

Dates are one of the hardest parts of programming! This is a very brief introduction to dates so if you want more (and there’s lots) - get searching!

Dates in R split into:

  • dates do not have any time component
  • POSIX date-times
  • POSIXct is an integer based storage method
  • POSIXlt is a component based storage method

You might be looking at the two POSIX times and thinking to yourself “ZOMG how am I meant to choose?”. Most people use the POSIXct format[5], which is the default for many of R’s functions.

Converting to dates

You can convert to date-time’s with as.Date(),as.POSIXct(), and as.POSIXlt(). Ideally, you’ll provide a string with the date(time) in ISO8601 formats e.g. “YYYY-MM-DD hh:mm”.

  • Use your birthday and turn it into a date in your script. You guessed it. Use that as. function again. (Hint - date needs a capital D!) If you don’t know the specific minute of your arrival onto earth, you can just leave hh:mm blank and R will still understand!

Note that it’s assuming a time zone based on my device as I’ve not provided a default. It’s prudent to set the time zone in order to avoid the results of your code changing based on where the code is run or when[6].

  • Add a timezone to your birthday by including tz = GMT inside the brackets.

Checking dates

Unfortunately, R does not provide functions for checking whether the class of something is a date-time type without extending it’s functionality. We have to use class() as a consequence.

class(as.Date("2017-12-31"))

Missings

Every data type has an NA, an identifier for a missing value.

If you use an NA in an object it will take on the data type used in the object. You can, however, make NAs directly.

NA
NA_integer_
NA_character_

Checking NAs

You can check what data type an NA is, using the class() function. Add this to your script now.

Want to check if something is NA? You know how to check things by now! Add another line to your script, just to be sure.

Summary

There a few more datatypes out in the wild but numbers, strings, booleans, and dates are the core types you’ll encounter.

There are normally as.* and is.* functions for converting to a datatype or checking if something is a given datatype. You can use class() to uncover the datatype too.

Just in case you want to play around with the code used in the video, I’ve included the code below.

Happy testing!

Ellen :)

PlantGrowth <- PlantGrowth

View(PlantGrowth)
summary(PlantGrowth)
PlantGrowth$weight <- as.character(PlantGrowth$weight)
class(PlantGrowth$weight)
PlantGrowth1 <- PlantGrowth

summary(PlantGrowth1)
PlantGrowth1$weight <- as.numeric(PlantGrowth1$weight)

class(PlantGrowth1$weight)
my.birthday <- '1991-12-12' # Note the speech marks

class(my.birthday)
birthday.date <- as.Date(my.birthday)

class(birthday.date)
str(PlantGrowth)
summary(PlantGrowth)