Chapter 20

Research Data Management

Architecture for science. Learn how to structure your GIS projects to ensure that your analysis is reproducible, authoritative, and built to last.

At a Glance

Prereqs: Chapters 16 - 19 Time: 25 min read + 35 min cleanup Deliverable: Project README + metadata

Learning outcomes

  • Organize a GIS/RS project for reproducibility.
  • Write a minimal README and metadata record.
  • Choose an appropriate license and citation format.

Key terms

metadata, provenance, DOI, licensing, reproducibility, versioning

Stop & check

  1. Why is CRS information part of metadata?

    Answer: Because coordinates and measurements depend on CRS.

    Why: Without CRS, data can be misaligned or mis-measured.

    Common misconception: A shapefile knows its CRS; it often does not travel reliably without metadata.

  2. What is a good alternative to 'final_v7_reallyfinal'?

    Answer: Semantic versioning and a changelog (or dated releases).

    Why: Clear versioning supports replication and comparison.

    Common misconception: You will remember later; project history becomes unreadable fast.

Try it (5 minutes)

  1. Create a folder structure: data/raw, data/processed, outputs, scripts, docs.
  2. Write 5 README bullets describing data sources and processing steps.

Lab (Two Tracks)

Both tracks produce the same deliverable: a clean project structure plus a README that another student can run.

Desktop GIS Track (ArcGIS Pro / QGIS)

Package a small project with inputs, outputs, and a methods log. Export one final figure.

Remote Sensing Track (Google Earth Engine)

Document your script inputs (AOI, dates, bands), export outputs, and write a methods + provenance note.

Common mistakes

  • Editing raw data instead of writing processed outputs.
  • Not recording acquisition dates or processing level.
  • Sharing outputs without listing parameters and versions.

Further reading: https://www.ucgis.org/site/gis-t-body-of-knowledge

πŸ”¬ GIS is Research

Professional GIS analysts spend up to 80% of their time on data management. If your folder structure is a mess, your analysis will likely be a mess too. Research Data Management (RDM) is the practice of architecting your files and metadata to survive for the long term.

🎨 GIS as an Art: Elegant Architecture

"Code is poetry," and a well-structured file system is a form of functional art. When you open a project folder and everything is perfectly named, nested, and documented, it creates a sense of calm and clarity. This "elegance" isn't just aesthetic; it reduces cognitive load and makes the science better.

Geographic Inquiry: Asking Questions of Where

Before ever opening software, a GIS analyst starts with a question. "Where" is not just a coordinate; it is a relationship.

  • Concentration: Where is the phenomenon clustered?
  • Boundary: Where does it change sharply vs. gradually?
  • Uncertainty: Where is the data missing or biased?
  • Verification: Where would you stand on the ground to prove it?

πŸ“‚ The Standard Project Structure

A professional GIS project should move from left to right: from Raw Data to the Final Product.

  • 01_Original_Data: Read-only downloads. Never edit these!
  • 02_Scratch: Temporary files like clippings or buffer results.
  • 03_Analysis: Your Geodatabases and processing models.
  • 04_Final_Map: Only the authoritative maps for the client.
Critical GIS: The Cost of "Open"

The Open Data movement has democratized science, but it also creates risks. Poachers use open tracking data to hunt endangered species. Looters use open archaeological maps to find sites. "Open by Default" is not always ethical. Sometimes, data must be closed or obfuscated to protect the vulnerable.

🀝 Interdisciplinary GIS: Library Science

Research Data Management is where GIS meets Library Science. We act as digital librarians, curating spatial data, assigning DOIs (Digital Object Identifiers), and ensuring metadata standards (like ISO 19115) are met. Without these library skills, our maps would be unfindable in the vast digital ocean.

Summary of Big Ideas

  • Metadata: Data about data. It explains the 5 Ws: Who, What, When, Where, Why.
  • File Naming Conventions: Avoid spaces and generic names like "final_v2.shp". Use dates and snake_case (e.g., 2026_05_12_austin_flood_zone.gdb).
  • Data Provenance: The ability to trace a piece of data back to its original source and lineage.
  • Longevity: Using standard formats like GeoPackage (.gpkg) ensures your data can be read in 20 years.

Chapter 20 Checkpoint

1. Why should the "Original Data" folder be kept separate from the analysis?

To ensure the raw sources remain clean and can be re-accessed if a mistake is made.
To save space on the computer.

2. In a professional GIS report, the section explaining who created the data and when is called:

Annotation
Metadata

πŸ“š Chapter Glossary

Metadata Structured information that describes, explains, locates, or makes it easier to retrieve, use, or manage an information resource (Data about Data).
Reproducibility The ability of an entire analysis to be duplicated, either by the same researcher or by someone else working independently, by following the same steps.
Data Provenance The documentation of where a piece of data comes from and the processes and methodology by which it was produced (Lineage).
← Chapter 19: GIS Ethics Next: Chapter 21: Research in GIS β†’

BoK Alignment

Topics in the UCGIS GIS&T Body of Knowledge that support this chapter.