Merge 2 Datasets



Merge 2 Datasets

Say you have two data files that have the same columns in them (for example, two months worth of data from a database), but you want to combine them into one object in R so you can more easily visualise differences or trends.

At the high level, there are two ways you can merge datasets; you can add information by adding more rows or by adding more columns to your dataset. In general, when you have datasets that have the same set of columns or have the same set of observations, you can concatenate them vertically or horizontally, respectively. Let’s illustrate when would we need to perform one-to-many merge by combining two sample datasets: one with information of dads, another with records of their kids. First we create the dads file with family id, family name, dads name and their status, sort the observations by family id. These operations can involve anything from very straightforward concatenation of two different datasets, to more complicated database-style joins and merges that correctly handle any overlaps between the datasets.

Let’s set up a simple example to show how this works. In the code below, the function rpois(31, 50) geneates 31 random integers in the vicinity of the number 50. What we end up with in jan is 2017 repeated in the year column, 1 repeated down the month column, the numbers 1:31 in the day column and some random integers representing fictional head counts in the head column.

We can take a quick look at the data in each of those data frames using the glimpse function from the dplyr package:

To join two data frames (datasets) vertically we can use the bind_rows function.

The object combo now has 59 observations but the same 4 columns as the original jan and feb objects.

Columns in different orders

What if the columns in the two data sets are in different orders? Not a problem! When you use bind_rows the columns in the two data frames do not have to be in the same order.

More than two objects to bind rows

Say there was a third (or fourth or fifth) month of data that you wanted to combine. It’s reasonably intuitive:

Different column names

What if the data sets are the same but the column names aren’t identical?

This is a big issue, and is a good reason to run the clean_names function from the janitor package on your data as soon as you import it. For example:

It hasn’t merged, rather it’s put them in separate columns because capitalisation matters. But using janitor to clean_names():

Note that this won’t help if the variable names have differences other than capitalisation and the other things that the clean_names function tidies up (e.g. changing . to _). For example:

In this case you would have to rename your columns so that they match:

More variables in one data frame than the other data frame

What if there are more variables in one data frame than the other data frame(s)? This might happen if you start measuiring a new trait in one month, but never had a column for that trait in previous months. As you may have noticed above, the bind_rows function just fills any missing valuse with NA.

Merge 2 Datasets

Merge 2 Datasets Excel

Before using any of the above methods, make sure you all names of the columns in your data frame are unique! Using clean_names from the janitor package will help here.


Merge 2 Datasets In Python

Esri recommends preserving the original raster datasets wherever possible, so the Mosaic tool and the Mosaic To New Raster tool with an empty raster dataset as the target dataset are the best choices to merge raster datasets. The Mosaic tool is used to mosaic multiple input rasters into an existing raster dataset. The existing raster dataset can be empty or it can contain data. The tool is used to merge rasters that are adjacent and have the same cell resolution and coordinate system. Similar to the Mosaic tool, the Mosaic To New Raster tool is used to mosaic multiple input raster datasets. However, unlike the Mosaic tool, the Mosaic To New Raster tool saves the output mosaic in a new empty raster dataset that it creates on the fly.

This article provides instructions to merge multiple raster datasets into a new raster dataset using the Mosaic To New Raster tool in ArcMap. The following image demonstrates the raster datasets to be merged into one raster dataset.


To merge two or more raster files using the Mosaic To New Raster tool, follow the steps below.

  1. Determine the number of bands and pixel type of the raster files. (Right-click Table Of Contents, click Properties and the Source tab.) The inputs must have the same number of bands and same bit depth.
  1. Open the Mosaic To New Raster tool by navigating to ArcToolbox > Data Management Tools > Raster > Raster Dataset.
    1. Insert the raster files.
    2. Select the output location.
    3. Specify a name and extension for the output.
    4. Specify the pixel type.
    5. Specify the number of bands.
  1. Run the tool.

The following image shows the output of a merged raster:

How To Merge 2 Datasets In Python

Related Information

Last Published: 4/9/2021

How To Merge 2 Datasets In R

Article ID: 000015258