AS26 Day 2 Morning

How LLMs Work:
Free AI APIs for
Web Developers

Understanding the technology behind Large Language Models and the free tools you will use to build AI-powered Earth Observation applications.

📅 Tuesday, June 9, 2026
📍 ISU, Strasbourg
⏲ 10:00 - 12:00 (2h)
🎓 Day 2 of 10
1. From Search to Synthesis

The Evolution of the Web: Read, Write, Understand

The web has evolved through three fundamental paradigms. You are building applications for the third one.

📖
1990s - 2000s
Web 1.0: Read
Static pages. Humans browse and read content published by webmasters. Information flows in one direction.
✍️
2005 - 2020
Web 2.0: Read + Write
User-generated content, social media, APIs. Everyone contributes data. Platforms aggregate and monetize.
🧠
2023 - Present
Web 3.0: Read + Write + Understand
AI agents synthesize, reason, and generate. Natural language replaces complex queries. Machines understand context.
💡 Your LeafletJS maps and Firebase apps are Web 2.0 tools. Adding an LLM transforms them into Web 3.0 applications that can answer questions about the data they display.
1. From Search to Synthesis

Why AI Matters for Earth Observation

Satellite programs like Copernicus Sentinel produce over 12 terabytes of data every day. No human can manually browse, filter, and interpret this volume. AI changes the interface from "search and download" to "ask and understand."

  • Natural Language Access: Instead of writing complex API queries, ask: "Show me NDVI changes in the Alsace region over the last 3 months."
  • Automated Interpretation: LLMs can explain what spectral band combinations mean, describe land cover changes, and summarize trends in plain language.
  • Multi-modal Reasoning: Modern models (like Gemini) can look at a satellite image and describe what they see: urban areas, water bodies, vegetation health.
  • Democratization: Non-specialists (policymakers, farmers, journalists) gain access to insights previously requiring GIS expertise.
🌎 This course bridges the gap: you already know how to display maps and data. Now you will teach AI to reason about that data for your users.
1. From Search to Synthesis

What You Will Build Today

By the end of this morning session, you will understand how LLMs work and which free APIs are available. This afternoon, you will build your first AI chat interface using these tools.

Your First AI-Powered App Architecture

👤 User
Types a question
💻 Your HTML/JS
Formats prompt
💬 Response
Displayed to user

Today's learning path:

  • This morning: Theory. How LLMs work, which APIs to use, prompt engineering techniques.
  • This afternoon: Practice. Build a working AI chat interface in a single HTML file.
  • Rest of the course: Integrate AI with your EO projects (satellite data analysis, map interactions, report generation).
2. How LLMs Work

What is a Neural Network?

At the core of every LLM is a neural network: a mathematical function that transforms input data through layers of computation to produce an output.

Simplified Neural Network Architecture

📤 Input Layer
Raw text tokens
📥 Output Layer
Next token prediction
  • Each layer contains neurons (mathematical units) connected by weights (learned parameters).
  • During training, the network adjusts millions (or billions) of weights to minimize prediction errors.
  • The "deep" in "deep learning" refers to the many hidden layers stacked between input and output.
🔎 Think of it like a factory assembly line: raw materials (words) enter, pass through specialized processing stations (layers), and a finished product (the next predicted word) comes out.
2. How LLMs Work

The Transformer Architecture

In 2017, researchers at Google introduced the Transformer, a new neural network architecture that revolutionized natural language processing.1 Its key innovation: self-attention.

  • Before Transformers: Models processed text sequentially, one word at a time (RNNs, LSTMs). This was slow and struggled with long-range dependencies.
  • The Transformer Breakthrough: Process all words in parallel. Use "attention" to let every word look at every other word simultaneously.
  • Why it matters: The sentence "The satellite captured an image of the river bank" requires understanding that "bank" means "shoreline," not "financial institution." Attention enables this contextual understanding.

The paper that started it all

📰

"Attention Is All You Need"

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin (2017)

Every modern LLM (GPT, Gemini, Llama, Mistral) is built on this architecture.

2. How LLMs Work

Key Concepts: Tokens, Embeddings, Attention

🎲 Tokens

LLMs do not read words. They read tokens: sub-word units. The word "understanding" might become ["under", "stand", "ing"]. Roughly 4 characters = 1 token.

📈 Embeddings

Each token is converted into a high-dimensional vector (a list of numbers). Similar words have similar vectors. "Satellite" and "spacecraft" will be close in embedding space; "satellite" and "banana" will be far apart.

🎯 Attention Mechanism

The model computes how relevant each token is to every other token. In "The NDVI index measures vegetation health," attention links "NDVI" strongly to "vegetation" and "health."

📂 Context Window

The maximum number of tokens a model can process at once. This includes both your input (prompt) and the model's output (response).

ModelContext Window
GPT-4o128K tokens
Gemini 2.0 Flash1M tokens
Llama 3 (Groq)8K - 128K tokens
Claude 4200K tokens
2. How LLMs Work

How Training Works: From Raw Text to Intelligence

LLMs learn in stages. Understanding these stages helps you understand why they behave the way they do.

📚 Stage 1
Pre-training
Predict next token on trillions of words from the internet
👍 Stage 3
RLHF
Human feedback ranks outputs; model learns preferences
  • Pre-training: The model reads billions of web pages, books, and code. It learns grammar, facts, reasoning patterns, and (unfortunately) biases. This is the most expensive stage, costing millions of dollars in compute.
  • Fine-tuning (Instruction Tuning): The model is trained on examples of "given this instruction, produce this response." This is what makes it follow your prompts instead of just completing random text.
  • RLHF (Reinforcement Learning from Human Feedback): Humans rate model outputs. The model learns to prefer helpful, harmless, and honest responses.2
2. How LLMs Work

Why LLMs Can "Understand" Instructions

A raw pre-trained model is like a student who has read every book in the library but has never been taught how to answer questions. Instruction tuning teaches the model to be a helpful assistant.

Before instruction tuning:

Raw Model // Prompt: "What is NDVI?"
// Output: "NDVI is a vegetation index
// that was first described by Rouse
// et al. in 1973 and has been
// widely used in remote sensing..."
// (continues rambling indefinitely)

After instruction tuning + RLHF:

Tuned Model // Prompt: "What is NDVI?"
// Output: "NDVI (Normalized Difference
// Vegetation Index) measures plant
// health using the formula:
// (NIR - Red) / (NIR + Red)
// Values range from -1 to +1."
🔐 The system prompt you write acts as a permanent instruction overlay. It shapes every response the model generates during a conversation. This is your primary tool for controlling AI behavior.
2. How LLMs Work

The Scaling Law: Bigger = More Capable

Research has shown a predictable relationship: as you increase model size (parameters), training data, and compute, model capability improves in a smooth, log-linear fashion.2

Model Parameters Release Key Capability
GPT-2 1.5 billion 2019 Coherent paragraphs
GPT-3 175 billion 2020 Few-shot learning
Llama 3 8B - 405B 2024 Open-source, competitive with GPT-4
Gemini 2.0 Undisclosed 2025 Multimodal (text + images + video)
Mixtral (MoE) 46.7B (8 experts) 2024 Mixture-of-Experts efficiency
For this course, model size matters less than cost and speed. You will use free-tier APIs where smaller, faster models (Llama 3 8B on Groq) often outperform slower, larger ones for simple tasks.
2. How LLMs Work

Knowledge Check #1

🎯 What is the "attention mechanism" in a Transformer?
3. Your Free AI Toolkit

Google Gemini API

The Gemini API is Google's multimodal AI, capable of understanding text, images, audio, and video in a single request. This makes it ideal for Earth Observation tasks where you need to analyze satellite imagery alongside textual data.

  • Multimodal: Send a satellite image + text prompt, get analysis back
  • 1M token context window: Process entire documents or long conversations
  • Free tier: 1,500 requests/day (Gemini 2.0 Flash)
  • Best for: Image understanding, long-document analysis, complex reasoning
🔑 Requires an API key from ai.google.dev. Free, takes 30 seconds to set up.
JavaScript // Call Gemini from your HTML file const API_KEY = 'YOUR_KEY'; const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${API_KEY}`; const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ contents: [{ parts: [{ text: "What is NDVI?" }] }] }) });
3. Your Free AI Toolkit

Groq: Ultra-Fast Inference

Groq runs open-source models (Llama 3, Mixtral) on custom LPU hardware, delivering responses 10x faster than typical cloud providers. Speed matters for interactive applications.

  • Speed: 500+ tokens/second (vs. ~80 for GPT-4)
  • Free tier: 14,400 requests/day, 6,000 tokens/minute
  • Models: Llama 3 (8B, 70B), Mixtral 8x7B, Gemma 2
  • Best for: Real-time chat, quick lookups, high-volume requests
  • OpenAI-compatible API: Easy to swap between providers

🔗 console.groq.com

JavaScript // Groq uses the OpenAI format const response = await fetch( 'https://api.groq.com/openai/v1/chat/completions', { method: 'POST', headers: { 'Authorization': `Bearer ${GROQ_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'llama3-8b-8192', messages: [ { role: 'user', content: 'What is NDVI?' } ] }) } );
3. Your Free AI Toolkit

Puter.js: Zero-Config AI

Puter.js is the simplest way to add AI to any HTML file. No API key needed, no server required, no signup. Just include the script tag and call the function.

  • Zero configuration: No API key, no account, no billing
  • Works from file:// Open your HTML file directly in a browser
  • GPT-4o mini under the hood: Powered by OpenAI through Puter's free proxy
  • Best for: Rapid prototyping, classroom demos, first AI experiments
  • Limitation: No image input, rate limits are shared across all users

🔗 docs.puter.com

HTML <!-- Add to your HTML file --> <script src="https://js.puter.com/v2/"></script> <script> // That's it. Call the AI: async function askAI(question) { const response = await puter.ai.chat( question ); console.log(response); } askAI('What is NDVI?'); </script>
This is the fastest path from "nothing" to "working AI." 3 lines of code.
3. Your Free AI Toolkit

Hugging Face Inference API

Hugging Face hosts over 500,000 models, many of which are free to use via their Inference API. Unlike general-purpose LLMs, these are specialized models for specific tasks.

📈
Sentiment Analysis
Classify text as positive, negative, or neutral. Useful for analyzing social media posts about environmental events.
Free Tier
🌐
Translation
Translate text between 100+ languages. Useful for international EO reports and multilingual interfaces.
Free Tier
📸
Image Classification
Classify satellite images by land use type (urban, forest, water, agriculture) using pre-trained vision models.
Free Tier
📝
Text Summarization
Condense long scientific papers or reports into key findings. Ideal for processing EO literature.
Free Tier

🔗 huggingface.co/inference-api

3. Your Free AI Toolkit

API Comparison: Speed vs. Capability vs. Setup

API Speed Capability Setup Free Limit Best For
Gemini ★★★ ★★★★★ API key 1,500 req/day Images + complex reasoning
Groq ★★★★★ ★★★★ API key 14,400 req/day Fast chat, real-time apps
Puter.js ★★★ ★★★ None! Shared pool Quick prototypes, demos
Hugging Face ★★ ★★★★ API token Varies by model Specialized tasks
💡 You are NOT limited to one API. The best projects combine multiple providers. Use Puter.js for the quickest prototype, switch to Gemini when you need image analysis, and use Groq when speed is critical. Choose the right tool for each task.
3. Your Free AI Toolkit

Live Example: Your First AI Call

Here is a complete, working HTML file that calls an AI API. You can copy this, save it as my-first-ai.html, and open it in your browser.

Complete HTML File <!DOCTYPE html> <html> <head> <title>My First AI Chat</title> </head> <body> <h1>Ask the AI</h1> <input id="q" placeholder="Ask something..." style="width:80%;padding:10px"> <button onclick="ask()">Ask</button> <div id="answer"></div> <script src="https://js.puter.com/v2/"></script> <script> async function ask() { const question = document.getElementById('q').value; document.getElementById('answer').innerText = 'Thinking...'; const resp = await puter.ai.chat(question); document.getElementById('answer').innerText = resp; } </script> </body> </html>
🏁 This is a fully functional AI application in under 20 lines of code. No server, no API key, no build tools. This is what we mean by "AI for web developers."
4. Prompt Engineering

System Prompt vs. User Prompt

Every interaction with an LLM involves at least two types of prompts. Understanding the distinction is essential for building reliable applications.

🛠️ System Prompt

Hidden instructions that define the AI's behavior, personality, and constraints. The user never sees this. It persists across the entire conversation.

System Prompt "You are an Earth Observation assistant for ISU students. Answer questions about satellite data, spectral indices, and remote sensing. Always cite your data sources. If you are unsure, say so."

👤 User Prompt

The actual question or instruction from the user. This changes with every message in the conversation.

User Prompt "What Sentinel-2 bands should I use to detect water bodies in Strasbourg?"

🤖 AI Response

Response // Follows system prompt rules: // domain-specific, cites sources, // appropriate for ISU students
4. Prompt Engineering

The Anatomy of an Effective Prompt

Effective prompts have five components. Not every prompt needs all five, but knowing them helps you debug when the AI gives poor results.4

🎭 Role "You are a remote sensing scientist specializing in Sentinel-2 data."
📍 Context "The user is an ISU student building a web app to monitor vegetation health in Alsace, France."
📝 Instruction "Explain which spectral bands to use and why."
📄 Format "Respond as a numbered list with band names, wavelengths, and use cases."
🚫 Constraints "Do not recommend paid data sources. Keep the explanation under 200 words."
💡 When the AI gives a bad answer, check which component is missing. Usually the problem is missing context or missing format instructions.
4. Prompt Engineering

Zero-Shot vs. Few-Shot Prompting

Zero-Shot Prompting

Give the model an instruction with no examples. Works well for simple, well-defined tasks.

Zero-Shot "Classify this satellite image description as: urban, agricultural, forest, or water.
Description: Dense grid of buildings with roads and parking lots visible."

👉 The model must figure out the format and reasoning entirely from the instruction.

Few-Shot Prompting

Provide 2-5 examples of input/output pairs before asking the real question. Dramatically improves accuracy for nuanced tasks.2

Few-Shot "Classify land cover type: Example 1: 'Green fields in rows' -> agricultural Example 2: 'Blue area surrounded by land' -> water Example 3: 'Tall trees covering hills' -> forest Now classify: 'Dense grid of buildings with roads'"

👉 The examples teach the model the exact format and reasoning pattern you expect.

4. Prompt Engineering

Chain-of-Thought: "Let's Think Step by Step"

Chain-of-Thought (CoT) prompting asks the model to show its reasoning process before giving a final answer. This simple technique significantly improves accuracy for multi-step problems.3

Without CoT:

Standard Prompt "A Sentinel-2 pixel has NIR=0.45 and Red=0.10. What is the NDVI value and what does it indicate?"

// Model might skip to: "Healthy vegetation" // without showing the calculation

With CoT:

CoT Prompt "A Sentinel-2 pixel has NIR=0.45 and Red=0.10. Calculate the NDVI value and interpret it. Let's think step by step."

// Model shows: Step 1: NDVI = (NIR-Red)/(NIR+Red) // Step 2: NDVI = (0.45-0.10)/(0.45+0.10) // Step 3: NDVI = 0.35/0.55 = 0.636 // Step 4: 0.636 indicates healthy vegetation
📚 Wei et al. (2022) showed that adding "Let's think step by step" improved accuracy on math problems from 17.7% to 78.7% with the same model.3 For EO applications involving calculations (NDVI, area estimation, coordinate conversion), always use CoT.
4. Prompt Engineering

Knowledge Check #2

🎯 Which prompting technique asks the AI to show its reasoning process before giving a final answer?
5. Context Windows & Limitations

Understanding Context Windows & Token Limits

The context window is the total number of tokens a model can process in a single request. This includes everything: your system prompt, the conversation history, and the model's response. When you exceed it, the model simply forgets earlier content.

Token Budget Breakdown

System Prompt
Conversation History
New Response

All three components share the same token budget

  • Token Estimation: ~4 characters = 1 token. A 500-word prompt ≈ 375 tokens. A full academic paper ≈ 10,000 tokens.
  • Rate Limits: Free APIs limit requests per minute/day. Build retry logic with exponential backoff.
5. Context Windows & Limitations

Hallucination: When AI Makes Things Up

LLMs generate text that is statistically plausible, not necessarily factually correct. They can fabricate data, invent citations, and present false information with complete confidence. This is called hallucination.

Why it happens:

  • The model predicts the most likely next token, not the most truthful one
  • Training data contains errors, outdated information, and contradictions
  • The model has no real-time access to databases or sensors
  • Ambiguous prompts increase hallucination risk

Common EO hallucinations:

  • Fabricated satellite band specifications
  • Invented NDVI values for specific locations
  • Non-existent Copernicus data products
  • Fake DOIs and paper citations
⚠️ Critical Rule: AI-generated scientific claims MUST be verified against real data. Never present AI output as ground truth in your EO applications without cross-referencing actual satellite data or peer-reviewed sources.

Mitigation strategies:

  • Grounding: Provide real data in the prompt for the AI to reference
  • Verification prompts: Ask "Are you sure? What is your source?"
  • Structured output: Request JSON with source fields
  • Temperature = 0: Reduce randomness for factual queries
5. Context Windows & Limitations

Knowledge Check #3

🎯 Why is "hallucination" particularly dangerous in Earth Observation applications?
6. Summary & Glossary

Summary of Big Ideas

🧠
LLMs Predict Tokens
Large Language Models are next-token prediction engines built on the Transformer architecture. They process all tokens in parallel using attention mechanisms.
🔧
Free APIs Are Powerful
Gemini (multimodal), Groq (speed), Puter.js (zero-config), and Hugging Face (specialized models) give you production-grade AI at zero cost.
📝
Prompts Are Programs
System prompts, few-shot examples, and chain-of-thought reasoning are your primary tools for controlling AI behavior. The quality of your prompt determines the quality of the output.
⚠️
Always Verify AI Output
LLMs hallucinate. In Earth Observation applications, fabricated data can lead to bad scientific conclusions. Always cross-reference AI claims with real data sources.
🏁 This afternoon: You will build a working AI chat interface in a single HTML file. Come with your API keys ready (Gemini or Groq) or use Puter.js for zero-setup.
6. Summary & Glossary

Glossary of Key Terms

🧠 LLM
Large Language Model. A neural network with billions of parameters trained on massive text corpora to generate and understand natural language.
⚙️ Transformer
The neural network architecture (Vaswani et al., 2017) that powers all modern LLMs. Uses self-attention to process tokens in parallel.
🎲 Token
The smallest unit of text an LLM processes. Typically a sub-word fragment (~4 characters). Models read and produce tokens, not words.
📈 Embedding
A high-dimensional vector representation of a token. Similar meanings produce similar vectors, enabling the model to understand semantic relationships.
🎯 Attention
A mechanism that allows each token to dynamically weigh the relevance of every other token in the sequence, enabling contextual understanding.
📂 Context Window
The maximum number of tokens (input + output) a model can process in a single request. Ranges from 8K to 1M tokens across models.
📝 Prompt Engineering
The practice of crafting effective instructions (prompts) to guide LLM behavior. Includes techniques like few-shot examples and chain-of-thought.
👀 Zero-Shot
Prompting a model to perform a task with no examples, relying entirely on the instruction and the model's pre-trained knowledge.
📚 Few-Shot
Providing 2-5 examples of input/output pairs in the prompt to teach the model the desired pattern before asking the real question.
🔗 Chain-of-Thought
A prompting technique that asks the model to show intermediate reasoning steps before giving a final answer, improving accuracy on multi-step problems.
👻 Hallucination
When an LLM generates plausible but factually incorrect information. Especially dangerous in scientific applications where false data looks credible.
👍 RLHF
Reinforcement Learning from Human Feedback. A training stage where human raters evaluate model outputs, teaching it to prefer helpful, safe responses.
🎯 Fine-Tuning
Additional training on curated datasets after pre-training. Adapts a general model to follow instructions or specialize in a domain.
6. Summary & Glossary

References & Resources

Academic References:

  • 1 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30. DOI: 10.48550/arXiv.1706.03762
  • 2 Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33. DOI: 10.48550/arXiv.2005.14165
  • 3 Wei, J., Wang, X., Schuurmans, D., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. DOI: 10.48550/arXiv.2201.11903
  • 4 White, J., Fu, Q., Hays, S., et al. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv preprint. DOI: 10.48550/arXiv.2302.11382

API Documentation & Tools:

🌟 Pioneer Profile
πŸ‘€

Geoffrey Hinton

Godfather of AI

His pioneering work on artificial neural networks and deep learning laid the foundation for modern LLMs.

🌍 Local to Global

Global Data, Local Impact

Applying EO to Community Challenges

Earth Observation provides a macroscopic view of environmental trends, but its true power lies in downscaling this data to affect local policy and design, such as urban planning and sustainable workplaces.

πŸ“
Texas Connection: In Texas, EO data is used to monitor the Edwards Aquifer depletion and track the expansion of urban heat islands across the Dallas-Fort Worth metroplex.
πŸ—ΊοΈ
πŸ€” Geographic Inquiry

Regional Decisions Scenario

Scenario: Automating Spatial Analysis

Your team needs to process thousands of unstructured reports on workplace well-being and map them to physical office locations.

Your Task:

  • Design an LLM prompt to extract location entities.
  • Map the sentiment to physical coordinates.
πŸ“š Summary

Big Ideas & Glossary

Summary of Big Ideas

  • Data is only as valuable as its application.
  • Space technology has direct terrestrial benefits.

Glossary of Terms

Earth Observation
Gathering information about Earth via remote sensing.
πŸ“ Knowledge Check

Auto-Graded Quiz

What does an LLM API primarily return when given a prompt?
A
A direct SQL database query
B
A probabilistic prediction of the next tokens (text)
C
A rendered HTML web page
βœ… Correct! LLMs generate text by predicting the next most likely tokens based on the prompt.
❌ Incorrect. The right answer was B. LLMs generate text by predicting the next most likely tokens based on the prompt.

πŸ“ Daily Reflection

What was your biggest takeaway from this session, and how does it apply to the TERRA project? Write your response below. Your instructor will review this to track your progress.

Slide 1 of 28