How LLMs Work:
Free AI APIs for
Web Developers
Understanding the technology behind Large Language Models and the free tools you will use to build AI-powered Earth Observation applications.
The Evolution of the Web: Read, Write, Understand
The web has evolved through three fundamental paradigms. You are building applications for the third one.
Why AI Matters for Earth Observation
Satellite programs like Copernicus Sentinel produce over 12 terabytes of data every day. No human can manually browse, filter, and interpret this volume. AI changes the interface from "search and download" to "ask and understand."
- Natural Language Access: Instead of writing complex API queries, ask: "Show me NDVI changes in the Alsace region over the last 3 months."
- Automated Interpretation: LLMs can explain what spectral band combinations mean, describe land cover changes, and summarize trends in plain language.
- Multi-modal Reasoning: Modern models (like Gemini) can look at a satellite image and describe what they see: urban areas, water bodies, vegetation health.
- Democratization: Non-specialists (policymakers, farmers, journalists) gain access to insights previously requiring GIS expertise.
What You Will Build Today
By the end of this morning session, you will understand how LLMs work and which free APIs are available. This afternoon, you will build your first AI chat interface using these tools.
Your First AI-Powered App Architecture
Types a question
Formats prompt
Gemini / Groq / Puter
Displayed to user
Today's learning path:
- This morning: Theory. How LLMs work, which APIs to use, prompt engineering techniques.
- This afternoon: Practice. Build a working AI chat interface in a single HTML file.
- Rest of the course: Integrate AI with your EO projects (satellite data analysis, map interactions, report generation).
What is a Neural Network?
At the core of every LLM is a neural network: a mathematical function that transforms input data through layers of computation to produce an output.
Simplified Neural Network Architecture
Raw text tokens
Pattern detection
Abstract reasoning
Next token prediction
- Each layer contains neurons (mathematical units) connected by weights (learned parameters).
- During training, the network adjusts millions (or billions) of weights to minimize prediction errors.
- The "deep" in "deep learning" refers to the many hidden layers stacked between input and output.
The Transformer Architecture
In 2017, researchers at Google introduced the Transformer, a new neural network architecture that revolutionized natural language processing.1 Its key innovation: self-attention.
- Before Transformers: Models processed text sequentially, one word at a time (RNNs, LSTMs). This was slow and struggled with long-range dependencies.
- The Transformer Breakthrough: Process all words in parallel. Use "attention" to let every word look at every other word simultaneously.
- Why it matters: The sentence "The satellite captured an image of the river bank" requires understanding that "bank" means "shoreline," not "financial institution." Attention enables this contextual understanding.
The paper that started it all
📰
"Attention Is All You Need"
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin (2017)
Every modern LLM (GPT, Gemini, Llama, Mistral) is built on this architecture.
Key Concepts: Tokens, Embeddings, Attention
🎲 Tokens
LLMs do not read words. They read tokens: sub-word units. The word "understanding" might become ["under", "stand", "ing"]. Roughly 4 characters = 1 token.
📈 Embeddings
Each token is converted into a high-dimensional vector (a list of numbers). Similar words have similar vectors. "Satellite" and "spacecraft" will be close in embedding space; "satellite" and "banana" will be far apart.
🎯 Attention Mechanism
The model computes how relevant each token is to every other token. In "The NDVI index measures vegetation health," attention links "NDVI" strongly to "vegetation" and "health."
📂 Context Window
The maximum number of tokens a model can process at once. This includes both your input (prompt) and the model's output (response).
| Model | Context Window |
|---|---|
| GPT-4o | 128K tokens |
| Gemini 2.0 Flash | 1M tokens |
| Llama 3 (Groq) | 8K - 128K tokens |
| Claude 4 | 200K tokens |
How Training Works: From Raw Text to Intelligence
LLMs learn in stages. Understanding these stages helps you understand why they behave the way they do.
Pre-training
Predict next token on trillions of words from the internet
Fine-tuning
Train on curated Q&A pairs and instruction data
RLHF
Human feedback ranks outputs; model learns preferences
- Pre-training: The model reads billions of web pages, books, and code. It learns grammar, facts, reasoning patterns, and (unfortunately) biases. This is the most expensive stage, costing millions of dollars in compute.
- Fine-tuning (Instruction Tuning): The model is trained on examples of "given this instruction, produce this response." This is what makes it follow your prompts instead of just completing random text.
- RLHF (Reinforcement Learning from Human Feedback): Humans rate model outputs. The model learns to prefer helpful, harmless, and honest responses.2
Why LLMs Can "Understand" Instructions
A raw pre-trained model is like a student who has read every book in the library but has never been taught how to answer questions. Instruction tuning teaches the model to be a helpful assistant.
Before instruction tuning:
// Output: "NDVI is a vegetation index
// that was first described by Rouse
// et al. in 1973 and has been
// widely used in remote sensing..."
// (continues rambling indefinitely)
After instruction tuning + RLHF:
// Output: "NDVI (Normalized Difference
// Vegetation Index) measures plant
// health using the formula:
// (NIR - Red) / (NIR + Red)
// Values range from -1 to +1."
The Scaling Law: Bigger = More Capable
Research has shown a predictable relationship: as you increase model size (parameters), training data, and compute, model capability improves in a smooth, log-linear fashion.2
| Model | Parameters | Release | Key Capability |
|---|---|---|---|
| GPT-2 | 1.5 billion | 2019 | Coherent paragraphs |
| GPT-3 | 175 billion | 2020 | Few-shot learning |
| Llama 3 | 8B - 405B | 2024 | Open-source, competitive with GPT-4 |
| Gemini 2.0 | Undisclosed | 2025 | Multimodal (text + images + video) |
| Mixtral (MoE) | 46.7B (8 experts) | 2024 | Mixture-of-Experts efficiency |
Knowledge Check #1
Google Gemini API
The Gemini API is Google's multimodal AI, capable of understanding text, images, audio, and video in a single request. This makes it ideal for Earth Observation tasks where you need to analyze satellite imagery alongside textual data.
- Multimodal: Send a satellite image + text prompt, get analysis back
- 1M token context window: Process entire documents or long conversations
- Free tier: 1,500 requests/day (Gemini 2.0 Flash)
- Best for: Image understanding, long-document analysis, complex reasoning
Groq: Ultra-Fast Inference
Groq runs open-source models (Llama 3, Mixtral) on custom LPU hardware, delivering responses 10x faster than typical cloud providers. Speed matters for interactive applications.
- Speed: 500+ tokens/second (vs. ~80 for GPT-4)
- Free tier: 14,400 requests/day, 6,000 tokens/minute
- Models: Llama 3 (8B, 70B), Mixtral 8x7B, Gemma 2
- Best for: Real-time chat, quick lookups, high-volume requests
- OpenAI-compatible API: Easy to swap between providers
Puter.js: Zero-Config AI
Puter.js is the simplest way to add AI to any HTML file. No API key needed, no server required, no signup. Just include the script tag and call the function.
- Zero configuration: No API key, no account, no billing
- Works from file:// Open your HTML file directly in a browser
- GPT-4o mini under the hood: Powered by OpenAI through Puter's free proxy
- Best for: Rapid prototyping, classroom demos, first AI experiments
- Limitation: No image input, rate limits are shared across all users
Hugging Face Inference API
Hugging Face hosts over 500,000 models, many of which are free to use via their Inference API. Unlike general-purpose LLMs, these are specialized models for specific tasks.
API Comparison: Speed vs. Capability vs. Setup
| API | Speed | Capability | Setup | Free Limit | Best For |
|---|---|---|---|---|---|
| Gemini | ★★★ | ★★★★★ | API key | 1,500 req/day | Images + complex reasoning |
| Groq | ★★★★★ | ★★★★ | API key | 14,400 req/day | Fast chat, real-time apps |
| Puter.js | ★★★ | ★★★ | None! | Shared pool | Quick prototypes, demos |
| Hugging Face | ★★ | ★★★★ | API token | Varies by model | Specialized tasks |
Live Example: Your First AI Call
Here is a complete, working HTML file that calls an AI API. You can copy this, save it as my-first-ai.html, and open it in your browser.
System Prompt vs. User Prompt
Every interaction with an LLM involves at least two types of prompts. Understanding the distinction is essential for building reliable applications.
🛠️ System Prompt
Hidden instructions that define the AI's behavior, personality, and constraints. The user never sees this. It persists across the entire conversation.
👤 User Prompt
The actual question or instruction from the user. This changes with every message in the conversation.
🤖 AI Response
The Anatomy of an Effective Prompt
Effective prompts have five components. Not every prompt needs all five, but knowing them helps you debug when the AI gives poor results.4
Zero-Shot vs. Few-Shot Prompting
Zero-Shot Prompting
Give the model an instruction with no examples. Works well for simple, well-defined tasks.
Description: Dense grid of buildings with roads and parking lots visible."
👉 The model must figure out the format and reasoning entirely from the instruction.
Few-Shot Prompting
Provide 2-5 examples of input/output pairs before asking the real question. Dramatically improves accuracy for nuanced tasks.2
👉 The examples teach the model the exact format and reasoning pattern you expect.
Chain-of-Thought: "Let's Think Step by Step"
Chain-of-Thought (CoT) prompting asks the model to show its reasoning process before giving a final answer. This simple technique significantly improves accuracy for multi-step problems.3
Without CoT:
// Model might skip to: "Healthy vegetation" // without showing the calculation
With CoT:
// Model shows: Step 1: NDVI = (NIR-Red)/(NIR+Red) // Step 2: NDVI = (0.45-0.10)/(0.45+0.10) // Step 3: NDVI = 0.35/0.55 = 0.636 // Step 4: 0.636 indicates healthy vegetation
Knowledge Check #2
Understanding Context Windows & Token Limits
The context window is the total number of tokens a model can process in a single request. This includes everything: your system prompt, the conversation history, and the model's response. When you exceed it, the model simply forgets earlier content.
Token Budget Breakdown
All three components share the same token budget
- Token Estimation: ~4 characters = 1 token. A 500-word prompt ≈ 375 tokens. A full academic paper ≈ 10,000 tokens.
- Rate Limits: Free APIs limit requests per minute/day. Build retry logic with exponential backoff.
Hallucination: When AI Makes Things Up
LLMs generate text that is statistically plausible, not necessarily factually correct. They can fabricate data, invent citations, and present false information with complete confidence. This is called hallucination.
Why it happens:
- The model predicts the most likely next token, not the most truthful one
- Training data contains errors, outdated information, and contradictions
- The model has no real-time access to databases or sensors
- Ambiguous prompts increase hallucination risk
Common EO hallucinations:
- Fabricated satellite band specifications
- Invented NDVI values for specific locations
- Non-existent Copernicus data products
- Fake DOIs and paper citations
Mitigation strategies:
- Grounding: Provide real data in the prompt for the AI to reference
- Verification prompts: Ask "Are you sure? What is your source?"
- Structured output: Request JSON with source fields
- Temperature = 0: Reduce randomness for factual queries
Knowledge Check #3
Summary of Big Ideas
Glossary of Key Terms
References & Resources
Academic References:
- 1 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. DOI: 10.48550/arXiv.1706.03762
- 2 Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33. DOI: 10.48550/arXiv.2005.14165
- 3 Wei, J., Wang, X., Schuurmans, D., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. DOI: 10.48550/arXiv.2201.11903
- 4 White, J., Fu, Q., Hays, S., et al. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv preprint. DOI: 10.48550/arXiv.2302.11382
API Documentation & Tools:
- 🎯 Google Gemini API: ai.google.dev - Multimodal AI with 1M token context window
- ⚡ Groq Cloud: console.groq.com - Ultra-fast LLM inference on custom LPU hardware
- 💻 Puter.js: docs.puter.com - Zero-config AI for any HTML file
- 🤗 Hugging Face Inference API: huggingface.co/inference-api - 500K+ specialized models
Geoffrey Hinton
Godfather of AI
His pioneering work on artificial neural networks and deep learning laid the foundation for modern LLMs.
Global Data, Local Impact
Applying EO to Community Challenges
Earth Observation provides a macroscopic view of environmental trends, but its true power lies in downscaling this data to affect local policy and design, such as urban planning and sustainable workplaces.
Regional Decisions Scenario
Scenario: Automating Spatial Analysis
Your team needs to process thousands of unstructured reports on workplace well-being and map them to physical office locations.
Your Task:
- Design an LLM prompt to extract location entities.
- Map the sentiment to physical coordinates.
Big Ideas & Glossary
Summary of Big Ideas
- Data is only as valuable as its application.
- Space technology has direct terrestrial benefits.
Glossary of Terms
Auto-Graded Quiz
π Daily Reflection
What was your biggest takeaway from this session, and how does it apply to the TERRA project? Write your response below. Your instructor will review this to track your progress.