Connecting Datasets: Joins & Advanced Filtering

Real-world analyses rarely rely on a single dataset. We might need to pair each Landsat image with the closest temperature reading, or attach county names to satellite scenes. Think of it like matching rows across two spreadsheet columns: that's exactly what joins do, but at scale.

Learning objectives

  • Use ee.Join to combine collections by a shared key.
  • Apply ee.Filter.equals() for temporal matching between collections.
  • Join ImageCollections with FeatureCollections to attach spatial attributes.
  • Build complex multi-condition filters using ee.Filter.and() and ee.Filter.or().

Why it matters

Climate studies combine satellite imagery with weather station records. Land cover maps need census boundaries. Without joins, we'd have to manually match records one by one (and nobody wants that).

Joins automate this matching, making your workflow scalable and reproducible. Once you understand them, you'll use them everywhere.

Key vocabulary

Join
An operation that combines two collections by matching elements based on a shared condition (property, date, or location).
Primary collection
The collection that receives matched elements from the secondary collection.
Secondary collection
The collection whose elements are matched and attached to the primary.
Join condition
The filter that defines how elements from the two collections should be matched (for example, by date or by shared property).

Your first join: Landsat meets MODIS temperature

Let's see it in action. This script joins Landsat 8 images with the closest MODIS land surface temperature observation. Each Landsat scene gets a matching MODIS image attached as a property, so we can analyze surface reflectance alongside temperature for the same date.

// Define study area near Gainesville, FL
var point = ee.Geometry.Point([-82.35, 29.65]);

// Load Landsat 8 SR for 2023
var landsat = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
  .filterBounds(point)
  .filterDate('2023-01-01', '2024-01-01')
  .filter(ee.Filter.lt('CLOUD_COVER', 30));

// Load MODIS 8-day LST for 2023
var modisLST = ee.ImageCollection('MODIS/061/MOD11A2')
  .filterDate('2023-01-01', '2024-01-01');

// Define the join: keep the closest MODIS image within 7 days
var maxDiff = 7 * 24 * 60 * 60 * 1000; // 7 days in milliseconds
var timeFilter = ee.Filter.maxDifference({
  difference: maxDiff,
  leftField: 'system:time_start',
  rightField: 'system:time_start'
});

var join = ee.Join.saveBest({
  matchKey: 'closest_modis',
  measureKey: 'time_diff'
});

// Apply the join
var joined = join.apply(landsat, modisLST, timeFilter);
print('Joined collection size:', joined.size());

// Inspect the first result
var first = ee.Image(joined.first());
var matchedMODIS = ee.Image(first.get('closest_modis'));
print('Landsat date:', first.date().format('YYYY-MM-dd'));
print('Matched MODIS date:', matchedMODIS.date().format('YYYY-MM-dd'));

What you should see

The console prints the size of the joined collection (roughly 10 to 20 images depending on cloud cover) and shows matching dates for the first Landsat/MODIS pair. The MODIS date should be within 7 days of the Landsat date. Congratulations, you just paired two completely different satellite missions!

Think of it like VLOOKUP for satellite data

Think of a join like VLOOKUP in a spreadsheet. We have two tables (collections) and a shared column (property). The join scans the secondary table for every row in the primary table, finds the matching entry, and attaches it.

In GEE, the "tables" are ImageCollections or FeatureCollections, and the "column" is typically a timestamp or a shared ID. Every join in GEE requires three ingredients:

  1. A join type that defines what to keep (saveFirst, saveAll, or inner).
  2. A filter condition that specifies how elements match (by date, property, or spatial overlap).
  3. The .apply() call that combines the primary collection, secondary collection, and filter.

Which join type do I need?

GEE offers three main join types. Each answers a different question about how to handle matches. Let's break them down:

Join Type What It Does Use When Unmatched Elements
ee.Join.saveFirst() Attaches the first matching element as a property You want exactly one match per element (closest date, best quality) Kept in output (with no match property)
ee.Join.saveAll() Attaches all matching elements as a list property You want every match (all images within a time window) Kept in output (with empty list)
ee.Join.inner() Creates pairs of matching elements; drops unmatched You only want elements that have a match in both collections Removed from output

There's also a fourth option: ee.Join.saveBest(). It works like saveFirst() but also records the "distance" of the match (how close the dates were). That's what we used in the Quick Win above.

Matching by date: the most common join

The most common join in remote sensing is temporal: pairing images from two sensors that were acquired near the same date. Here's a cleaner example using saveFirst().

// Join Landsat 8 with MODIS LST by closest date
var point = ee.Geometry.Point([-82.35, 29.65]);

var landsat = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
  .filterBounds(point)
  .filterDate('2023-06-01', '2023-09-01');

var modisLST = ee.ImageCollection('MODIS/061/MOD11A2')
  .filterDate('2023-06-01', '2023-09-01');

// Match within 7 days
var timeFilter = ee.Filter.maxDifference({
  difference: 7 * 24 * 60 * 60 * 1000,
  leftField: 'system:time_start',
  rightField: 'system:time_start'
});

var joinedCol = ee.Join.saveFirst({
  matchKey: 'modis_match'
}).apply(landsat, modisLST, timeFilter);

print('Joined count:', joinedCol.size());
print('First element properties:', ee.Image(joinedCol.first()).propertyNames());

After the join, each Landsat image carries a new property called modis_match containing the matched MODIS image. To retrieve it, just use ee.Image(landsatImage.get('modis_match')).

When one filter isn't enough

Sometimes we need images that are both cloud-free and within a certain date range, or scenes from sensor A or sensor B. GEE gives us logical combinators for exactly these situations.

// Combine multiple conditions with ee.Filter.and()
var strictFilter = ee.Filter.and(
  ee.Filter.lt('CLOUD_COVER', 10),
  ee.Filter.gte('SUN_ELEVATION', 30),
  ee.Filter.date('2023-06-01', '2023-09-01')
);

var clean = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
  .filterBounds(ee.Geometry.Point([-82.35, 29.65]))
  .filter(strictFilter);

print('Strict filter count:', clean.size());
// Use ee.Filter.or() to accept images from multiple paths
var multiPath = ee.Filter.or(
  ee.Filter.eq('WRS_PATH', 17),
  ee.Filter.eq('WRS_PATH', 18)
);

var twoPaths = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
  .filterDate('2023-01-01', '2024-01-01')
  .filterBounds(ee.Geometry.Point([-82.35, 29.65]))
  .filter(multiPath);

print('Images from two paths:', twoPaths.size());

We can also use ee.Filter.equals() to match a specific property value. This is particularly useful in join conditions where we want to match on a shared ID rather than a timestamp.

Pairing images with vector data

Here's where it gets interesting: joins aren't limited to image-to-image matching. We can attach region properties (state name, ecoregion type) to images by joining an ImageCollection with a FeatureCollection. The key is finding a shared property or using a spatial filter.

// Join images with county boundaries using spatial overlap
var counties = ee.FeatureCollection('TIGER/2018/Counties')
  .filter(ee.Filter.eq('STATEFP', '12')); // Florida

var landsat = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
  .filterBounds(counties.geometry())
  .filterDate('2023-06-01', '2023-07-01')
  .filter(ee.Filter.lt('CLOUD_COVER', 20));

// Use intersects filter for spatial join
var spatialFilter = ee.Filter.intersects({
  leftField: '.geo',
  rightField: '.geo'
});

var spatialJoin = ee.Join.saveAll({
  matchesKey: 'overlapping_counties'
});

var withCounties = spatialJoin.apply(landsat, counties, spatialFilter);
print('Images with county info:', withCounties.size());

// Check how many counties overlap the first image
var firstImg = ee.Image(withCounties.first());
var matchedCounties = ee.List(firstImg.get('overlapping_counties'));
print('Counties overlapping first image:', matchedCounties.size());

Pro tips

  • Name your match keys clearly. Use descriptive names like 'closest_modis' or 'overlapping_counties' instead of generic names like 'match'. Future-you will thank you.
  • Filter before joining. Reduce both collections to only the elements you need. Joins on large collections can be slow.
  • Use saveBest() for "closest" matching. It picks the single best match and records the distance, which is ideal for temporal pairing.
  • Cast after retrieval. The matched element is stored as a generic object. Always cast it back: ee.Image(element.get('matchKey')).

Try it: join Sentinel-2 with ERA5 temperature

Let's pair Sentinel-2 images with ERA5 daily mean temperature data for the same location and time period. Use a point of your choice, a 3-month window, and a 1-day maxDifference.

// Starter code
var point = ee.Geometry.Point([-82.35, 29.65]);

var s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
  .filterBounds(point)
  .filterDate('2023-06-01', '2023-09-01')
  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20));

var era5 = ee.ImageCollection('ECMWF/ERA5_LAND/DAILY_AGGR')
  .filterDate('2023-06-01', '2023-09-01')
  .select('temperature_2m');

// YOUR CODE HERE:
// 1. Define a maxDifference filter (1 day)
// 2. Create a saveFirst or saveBest join
// 3. Apply the join
// 4. Print the matched date pair for the first element

Common mistakes

  • Forgetting the join condition. A join without a filter will fail. Always provide ee.Filter.maxDifference(), ee.Filter.equals(), or another filter to .apply().
  • Using wrong property names. The leftField and rightField must exist in the respective collections. Use .propertyNames() to verify before joining.
  • Not handling unmatched features. With saveFirst() and saveAll(), elements without matches remain in the output but lack the match property. Accessing it will cause an error. Filter unmatched elements out, or use inner() to keep only pairs.
  • Confusing milliseconds and days. The maxDifference parameter uses milliseconds. One day = 24 * 60 * 60 * 1000 = 86,400,000 ms.

Quick self-check

  1. What are the three ingredients every GEE join requires?
  2. When would you use saveAll() instead of saveFirst()?
  3. How do you retrieve the matched element after a saveFirst() join?
  4. What happens to unmatched elements in an inner() join?
  5. Why should you filter collections before applying a join?

Going deeper

This module introduces joins at a foundational level, focusing on temporal and spatial matching. For advanced join techniques and large-scale filtering workflows, see:

  • EEFA Book - Chapter F4.0: Filter, Map, Reduce
  • EEFA Book - Chapter F5.3: Advanced Vector Operations

What's next?