R Merge Two Datasets


In R you use the merge function to combine data frames. This powerful function tries to identify columns or rows that are common between the two different data frames. How to use merge to find the intersection of data The simplest form of merge finds the intersection between two different sets of data. See full list on datacamp.com.

Dealing with NAs


If we look at our new dataframe we see that some of our rows now contains NAs

R Merge Two Datasets

This occured b/c the original BBS data only contains data when a species is observed - if it isn’t seen, nothing is entered. So each “NA” in the new dataframe we made represents a route for which, in 2006, the SCTA wasn’t observed.

Its easy to fix the Year and Aou columns because the all have the same values. All of the years = 2006, and all of the Aou columns = 6080, for Scarlet tanager. The following code will fill in any of the missing values

Actualy, since the code “6080” isn’t very meaningful, let’s add the letters “SCTA” to a column to make it easy to remembe what we are looking at. Let’s make a new column called “name” and put “SCTA” in it.

We’ll make this a factor variable

Summary will show us what we’ver done: now there are no NAs in Year or Aou and there’s a new column calld name

To fill in the NAs for the SpeciesTotal column (the counts of the number of birds) requires a new function: is.na(). is.na() determiens if a row in a column has NA or it doesn’t. is.na() returns “TRUE” whenever there is an NA

List Of R Datasets

R Merge Two Datasets

We can use is.na with a function from dplyr called mutate() to change these NAs to 0s.

R merge two different datasets

This code is pretty complex, actually, so don’t worry if you don’t get it the 1st try Actually, since its pretty trick for beginers, I’ve made a new function that does it more simply (see next chunk of code)

If you compare the dataset BBS_PA_SCTA_3 to BBS_PA_SCTA_4 that was made with NA_to_zero() you can see that the NAs were removed


Combine Data In R

Since the above code is rather long, the following function from the wildlifeR package will do the same thing