Chapter 11

Image Classification

Turning pixels into information. Learn how computer algorithms categorize satellite data to track urbanization, deforestation, and water resources.

At a Glance

Prereqs: Chapters 09-10 Time: 30 min read + 35 min practice Deliverable: Confusion matrix + short interpretation

Learning outcomes

  • Explain the difference between training and validation data.
  • Describe what a confusion matrix measures and how to read it.
  • Identify one likely source of classification error and propose a fix.

Key terms

training data, validation, classifier, features, confusion matrix, overfitting

Stop & check

  1. Why should training and validation samples be separate?

    Answer: To test generalization honestly.

    Why: Re-using training data inflates accuracy and hides overfitting.

    Common misconception: More accuracy numbers means better; quality of evaluation matters.

  2. What does high confusion between two classes usually mean?

    Answer: Their spectral/feature signatures overlap in your data.

    Why: The model cannot separate them with the current features/training.

    Common misconception: The algorithm is bad; often the labels/features need improvement.

Try it (5 minutes)

  1. Name two classes that might be confused in your region (e.g., bare soil vs urban).
  2. List one extra feature you would add (SWIR, NDVI, texture, elevation).

Lab (Two Tracks)

Both tracks produce the same deliverable: a classified map plus a brief accuracy assessment statement.

Desktop GIS Track (ArcGIS Pro / QGIS)

Create training polygons, run a supervised classification, and evaluate accuracy with a validation sample.

Remote Sensing Track (Google Earth Engine)

Train a classifier (e.g., RF), classify an image, and compute an error matrix. Interpret 2 key errors.

Common mistakes

  • Train/validation leakage (same polygons used for both).
  • Class imbalance (too few samples for minority classes).
  • Mixing seasons/dates so the same class looks different.

Further reading: https://gistbok-topics.ucgis.org/UCGIS

🤖 The Challenge of Interpretation

A satellite image is just a grid of numbers. Image Classification is the process of organizing those pixels into meaningful groups—turning a picture of a forest into a digital "Forest" class that we can use for calculation.

Key Lesson: We classify based on spectral similarities. If a rooftop and a highway reflect light the same way, the computer will likely put them in the same class unless we provide extra context.
Critical GIS: The Power of Categories

Classification is an act of power. When an algorithm labels a neighborhood as "Slum" versus "Informal Settlement," it affects property rights and policy. A computer doesn't know "context"—it only knows the samples we teach it. If our training data is biased (e.g., only sampling wealthy neighborhoods), the resulting map will codify that bias into "data."

Supervised vs. Unsupervised

There are two primary ways to train a computer to recognize land cover:

  • Unsupervised: The computer groups similar pixels into clusters automatically. The analyst later identifies what those clusters actually represent.
  • Supervised: The analyst provides Training Samples (examples) for each class first, and the computer maps the rest of the image based on those samples.

Teaching the Machine: The "Shape Sorter" Analogy

We often talk about "AI" as if it's magic. It's not. It's just a baby with a shape-sorter toy.

Supervised Classification

The Method: You (the parent) hold up a square block and say "This is a SQUARE." You hold up a circle and say "This is a CIRCLE." Then you let the baby (AI) sort the rest based on your rules.

The Risk: If you accidentally call a pentagon a "square," the baby will force every pentagon into the square hole forever. This is Human Bias.

Unsupervised Classification

The Method: You dump the toys on the floor and walk away. The baby looks at them and says, "These look similar, I'll put them in a pile. These others look weird, I'll put them in another pile."

The Result: The computer groups pixels purely by Spectral Similarity.

Click to add samples

Samples: 0

Summary of Big Ideas

  • Land Use is how humans use the land; Land Cover is what is physically there.
  • Object-Based classification looks at shapes and textures, not just individual pixels.
  • Confusion Matrices are used to measure the accuracy of a classified map.
  • Multi-temporal analysis allows us to track changes over time (Land Change Science).

Chapter 11 Checkpoint

1. Which classification method requires the user to provide "Training Samples"?

Supervised Classification
Unsupervised Classification

2. After classification, you compare your map to high-resolution data to find the "Accuracy." The table used for this is called a:

Attribute Table
Confusion Matrix

📚 Chapter Glossary

Training Sample A polygon drawn by the analyst over a known land cover type (e.g., "Water") to teach the computer what that class "looks like" spectrally.
Confusion Matrix A table used to assess the accuracy of a classification. It compares the "Classified" result against "Ground Truth" data.
Land Cover vs. Land Use Land Cover is the physical material on the surface (e.g., Grass). Land Use is how humans utilize it (e.g., Golf Course).
← Chapter 10: Spectral Analysis Next: Chapter 12: LiDAR →

BoK Alignment

Topics in the UCGIS GIS&T Body of Knowledge that support this chapter.