Generative AI

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

March 17, 2026

Agenda

Warm-up 7 min
LLMs: from predictor to assistant 20 min
Diffusion & the landscape 13 min
Agentic AI: from generation to action 14 min
Wrap-up 6 min

Warm-up

Name one task you did this week that a generative model could have done.

Which pillar does it belong to?

Text: a language model (LLM)
Image, audio, video: a diffusion model
A multi-step job: an agent

Think alone 1 min, then discuss with your neighbour 2 min.

03:00

LLMs: from predictor to assistant

Recap: from transformer to LLM

An LLM is the transformer you already built, scaled up and trained on internet-scale text.

Same mechanism: tokenize, embed, attend, predict the next token
New scale: GPT-3 has 175 billion parameters
New behaviour: at this scale, generation becomes coherent and useful

Recap: three training phases

Next-token prediction is only the first of three phases (Ouyang et al., 2022).

Pretraining: next-token prediction on massive unlabelled text; gives broad capability (the part you saw last unit)
Fine-tuning: supervised learning on curated input-output pairs; makes the model task-appropriate
RLHF: humans rank outputs, a reward model learns their preference, the LLM is optimised toward it; this is where alignment happens

Recap: what still breaks

The failure modes from last unit do not disappear; deployment of LLMs raises the stakes.

Carried over

Hallucination (structural, not a bug)
Bias from training data
No genuine understanding

Newly created

Legal & privacy risk
Static knowledge (a training cutoff)
Resource intensive

Exercise A: train a legal assistant

A startup has a general-purpose LLM (pretraining done). They want to deploy it as a legal-document assistant for law firms.

Tasks (pairs)

What fine-tuning data would you use, and why?
Who would you recruit as RLHF raters, and what criteria do they score?
Fine-tuning risks catastrophic forgetting of general knowledge. How would you mitigate it?

10:00

Expected answers (debrief in ~4 minutes):

Fine-tuning data: verified legal briefs, contracts, court opinions, regulatory filings (e.g., from LexisNexis or Westlaw); diverse jurisdictions and domains; human-annotated “ideal” responses for common drafting tasks. Stress the licensing point: using copyrighted legal databases without authorisation is itself a legal risk.
RLHF raters: practising lawyers or senior paralegals with domain expertise. Criteria: legal accuracy (most important), citation correctness, appropriate hedging (legal advice must acknowledge uncertainty), professional tone, and absence of fabricated case law.
Catastrophic forgetting: mix a small share of general-domain data into the fine-tuning set; use parameter-efficient fine-tuning (e.g., LoRA) that updates only a subset of parameters; monitor a general benchmark (e.g., MMLU) during training to catch degradation early.

Draw out the core message: pretraining gave capability, fine-tuning gave specialism, RLHF gives the judgment and hedging a legal context demands. Connect “fabricated case law” back to hallucination on the previous slide.

Diffusion & the landscape

Recap: the generative landscape

LLMs are one family. Foundational models also include diffusion models, and agents are built on top (Urbach et al., 2026).

LLMs: generate text (GPT, Claude, Gemini, LLaMA)
Diffusion models: generate images, audio, video (DALL-E, Stable Diffusion, Midjourney)
Agentic AI: combines these with planning, memory, and tools to act

Recap: diffusion in one idea

LLMs generate token by token. Diffusion generates by iterative denoising (Ho et al., 2020).

Forward: take a real image and gradually add noise until it is pure noise
Reverse: train a network to undo one step of noise at a time
Generate: start from pure noise and denoise, guided by a text embedding

Same recipe for video and audio: embed the prompt, then denoise toward structure.

Exercise B: map the media company

A media company produces articles, photos, video segments, podcasts. For each:

Tasks (pairs)

Which technology fits: LLM, diffusion, or a combination?
What is the single biggest risk to manage?
If they can pilot only one in year one, which, and why?

07:00

Expected answers (debrief in ~3 minutes):

Content	Technology	Biggest risk
Articles	LLM (+ RAG)	hallucination, false facts in print
Photos	diffusion	copyright, bias, deepfake potential
Video	text-to-video diffusion	deepfakes mistaken for real footage
Podcasts	text-to-audio / voice cloning	impersonation, consent

Table 1: Technology and risk by content type

Pilot recommendation: LLM-assisted article drafting with RAG. Clearest productivity gain, risk is manageable with editorial review, and the legal landscape for text is more settled than for synthetic image, video, or voice. It also builds prompt and review skills the company can reuse for the other modalities later.

This exercise is where diffusion gets applied: parts 1 and 2 force students to pick diffusion for the visual and audio modalities and to name the deepfake and copyright risks that are specific to it.

Agentic AI: from generation to action

Recap: four building blocks

Four components turn an LLM into an agent (Urbach et al., 2026).

Reasoning-augmented LLM: chain-of-thought makes the problem-solving steps visible and checkable
Retrieval-Augmented Generation (RAG): real-time access to external knowledge
Conversational agent: keeps context across a long, multi-turn task
Multi-agent system (MAS): specialised agents divide the labour and check each other

Recap: RAG in three steps

RAG is the fix for the static-knowledge limitation from the first block (Lewis et al., 2020).

Retrieve: search an external knowledge base for passages relevant to the query
Augment: inject those passages into the context alongside the question
Generate: the LLM answers grounded in the retrieved evidence

Payoff: current knowledge without retraining, and a source the user can verify.

Exercise C: design a student-support RAG

Your university wants a chatbot that answers questions on study and exam regulations, which change every semester.

Tasks (pairs)

Sketch the RAG architecture: what are the components, and what goes in the knowledge base?
Name one way it could harm students and propose a mitigation.

08:00

Expected answers (debrief in ~4 minutes):

Architecture: knowledge base of exam regulations (Prüfungsordnung), module handbooks, process guides, FAQs, stored as chunked, embedded passages in a vector database; a retrieval module that embeds the query and fetches the top-k passages; the LLM generator that answers from those passages; source attribution that cites the specific regulation and section.
Harm + mitigation (any one):
- Hallucinated regulation: require every factual claim to be backed by a retrieved passage; show the source alongside the answer.
- Stale knowledge: tag documents with their effective semester, filter retrieval to the current one, and re-ingest each semester.
- Out-of-scope confidence: have the model abstain and escalate to a human advisor when retrieval similarity is below a threshold.

Close the loop out loud: this is the static-knowledge limitation from Block 1, now solved. The recurring pattern across all three blocks is grounding and human oversight, not raw model power.

Wrap-up

Key takeaways

Three pillars, one engine

LLMs: pretraining gives capability; fine-tuning and RLHF turn a predictor into a useful, aligned assistant
Diffusion: a different generative recipe; embed the prompt, then denoise from noise toward image, audio, or video
Agentic AI: wrap an LLM with reasoning, retrieval, memory, and other agents so it can act, not just generate

The recurring theme

More capability brings new failure modes; grounding (RAG) and human oversight are how you deploy responsibly

Bridge

An agentic system takes actions in the world, not just text on a screen.

When an agent makes a consequential mistake, who is accountable: the user, the deploying organisation, or the model developer?

Q&A

Literature

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851. https://arxiv.org/abs/2006.11239

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://arxiv.org/abs/2005.11401

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L. E., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 35, 27730–27744. https://arxiv.org/abs/2203.02155

Urbach, N., Feulner, D., Feulner, S., Guggenberger, T., & Mayer, V. (2026). Introduction to generative artificial intelligence. In N. Urbach & D. Feulner (Eds.), Managing artificial intelligence (pp. 71–95). Springer Nature Switzerland. https://doi.org/10.1007/978-3-032-13308-3_4