Example 2: Keep Data of Unmatched Rows. The merge function provides the options all.x and all.y. These two options can be used to retain certain rows of your input data tables, even when no match is found for the merging. With the following R codes, we can keep all rows of our first input data frame (i.e.
- Table 1: Merging Two Data Frames by Row Names. Table 1 shows the output of our previous R code. As you can see the merge function retained all rows where the row names were available in both data sets. This is also called inner join. That’s basically it. However, there is much more to learn about the merge function Video & Further Resources.
- Patching?merge to allow the user to keep the order of one of the two data.frame objects merged. Hello dear R-devel list members. Following an old (2002) thread from R-help (and having myself needing.
dplyr::filter() to conditionally subset by rows
filter() to let R know which rows you want to keep or exclude, based whether or not their contents match conditions that you set for one or more variables.
Some examples in words that might inspire you to use
- “I only want to keep rows where the temperature is greater than 90°F.”
- “I want to keep all observations except those where the tree type is listed as unknown.”
- “I want to make a new subset with only data for mountain lions (the species variable) in California (the state variable).”
When we use
filter(), we need to let R know a couple of things:
- What data frame we’re filtering from
- What condition(s) we want observations to match and/or not match in order to keep them in the new subset
Here, we’ll learn some common ways to use
8.3.1 Filter rows by matching a single character string
Let’s say we want to keep all observations from the fish data frame where the common name is “garibaldi” (fun fact: that’s California’s official marine state fish, protected in California coastal waters!).
Here, we need to tell R to only keep rows from the fish data frame when the common name (common_name variable) exactly matches garibaldi.
Use to ask R to look for exact matches:
Check out the fish_garibaldi object to ensure that only garibaldi observations remain.
Task: Create a subset starting from the fish data frame, stored as object fish_mohk, that only contains observations from Mohawk Reef (site entered as “mohk”).
Explore the subset you just created to ensure that only Mohawk Reef observations are returned.
8.3.2 Filter rows based on numeric conditions
Use expected operators (>, <, >=, <=, ) to set conditions for a numeric variable when filtering. For this example, we only want to retain observations when the total_count column value is >= 50:
8.3.3 Filter to return rows that match this OR that OR that
What if we want to return a subset of the fish df that contains garibaldi, blacksmith OR black surfperch?
There are several ways to write an “OR” statement for filtering, which will keep any observations that match Condition A or Condition B or Condition C. In this example, we will create a subset from fish that only contains rows where the common_name is garibaldi or blacksmith or black surfperch.
Way 1: You can indicate OR using the vertical line operator
to indicate “OR”:
Alternatively, if you’re looking for multiple matches in the same variable, you can use the
%in% operator instead. Use
%in% to ask R to look for any matches within a vector:
Notice that the two methods above return the same thing.
Critical thinking: In what scenario might you NOT want to use
%in% for an “or” filter statement? Hint: What if the “or” conditions aren’t different outcomes for the same variable?
Task: Create a subset from fish called fish_gar_2016 that keeps all observations if the year is 2016 OR the common name is “garibaldi.”
8.3.4 Filter to return observations that match this AND that
Merge Keep All R
In the examples above, we learned to keep observations that matched any of a number of conditions (or statements).
Sometimes we’ll only want to keep observations that satisfy multiple conditions (e.g., to keep this observation it must satisfy this condition AND that condition). For example, we may want to create a subset that only returns rows from fish where the year is 2018 and the site is Arroyo Quemado “aque”
filter(), add a comma (or ampersand ‘&’) between arguments for multiple “and” conditions:
Check it out to see that only observations where the site is “aque” in 2018 are retained:
Like most things in R, there are other ways to do the same thing. For example, you could do the same thing using
& (instead of a comma) between “and” conditions:
Or you could just do two filter steps in sequence:
8.3.5 Activity: combined filter conditions
Merge But Keep All Data R
Challenge task: Create a subset from the fish data frame, called low_gb_wr that only contains:
- Observations for garibaldi or rock wrasse
- AND the total_count is less than or equal to 10
stringr::str_detect() to filter by a partial pattern
Sometimes we’ll want to keep observations that contain a specific string pattern within a variable of interest.
For example, consider the fantasy data below:
|4||royal blue fish|
There might be a time when we would want to use observations that:
- Contain the string “fish,” in isolation or within a larger string (like “rockfish”)
- Contain the string “blue”
In those cases, it would be useful to detect a string pattern, and potentially keep any rows that contain it. Here, we’ll use
stringr::str_detect() to find and keep observations that contain our specified string pattern.
Let’s detect and keep observations from fish where the common_name variable contains string pattern “black.” Note that there are two fish, blacksmith and black surfperch, that would satisfy this condition.
str_detect() in combination to find and keep observations where the site variable contains pattern “sc”:
str_detect() returns is a series of TRUE/FALSE responses for each row, based on whether or not they contain the specified pattern. In that example, any row that does contain “black” returns
TRUE, and any row that does not contain “black” returns
Task: Create a new object called fish_it, starting from fish, that only contains observations if the common_name variable contains the string pattern “it.” What species remain?
Sas Data Merge Keep All Records
We can also exclude observations that contain a set string pattern by adding the
negate = TRUE argument within
Sync your local project to your repo on GitHub.
Merge Keep All Receipts
When a single column header is split across cells, merge thecells with
merge_cols(). E.g. if a column header 'MeanGDP' is split over two cells, where the top cell has the value 'Mean' and thebottom cell has the value 'GDP', then
merge_rows() will combine them into asingle cell with the value 'Mean GDP'.
merge_rows() keeps the top cell, and
merge_cols() keeps the left-mostcell. When there are several columns of headers,
merge_rows() aligns theoutput cells so that they are all in the same row, and similarly
merge_cols() aligns to the same column.
These functions apply only to cells with character values because it doesn'tmake sense to concatenate non-character values. Convert cell values tocharacters first if you need to merge non-character cells.
Merge Keep All X R
R Merge Keep All Columns
Data frame. The cells of a pivot table, usually the output of
The numbers of the rows to be merged.
The column of
A character string to separate the values of each cell.
The numbers of the columns to be merged.
A data frame