Generative AI

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

March 17, 2026

Agenda

  • Warm-up 7 min
  • LLMs: from predictor to assistant 20 min
  • Diffusion & the landscape 13 min
  • Agentic AI: from generation to action 14 min
  • Wrap-up 6 min

Warm-up

Name one task you did this week that a generative model could have done.

Which pillar does it belong to?

  • Text: a language model (LLM)
  • Image, audio, video: a diffusion model
  • A multi-step job: an agent

Think alone 1 min, then discuss with your neighbour 2 min.

03:00

LLMs: from predictor to assistant

Recap: from transformer to LLM

An LLM is the transformer you already built, scaled up and trained on internet-scale text.

  • Same mechanism: tokenize, embed, attend, predict the next token
  • New scale: GPT-3 has 175 billion parameters
  • New behaviour: at this scale, generation becomes coherent and useful

Recap: three training phases

Next-token prediction is only the first of three phases (Ouyang et al., 2022).

  1. Pretraining: next-token prediction on massive unlabelled text; gives broad capability (the part you saw last unit)
  2. Fine-tuning: supervised learning on curated input-output pairs; makes the model task-appropriate
  3. RLHF: humans rank outputs, a reward model learns their preference, the LLM is optimised toward it; this is where alignment happens

Recap: what still breaks

The failure modes from last unit do not disappear; deployment of LLMs raises the stakes.

Carried over

  • Hallucination (structural, not a bug)
  • Bias from training data
  • No genuine understanding

Newly created

  • Legal & privacy risk
  • Static knowledge (a training cutoff)
  • Resource intensive

Diffusion & the landscape

Recap: the generative landscape

LLMs are one family. Foundational models also include diffusion models, and agents are built on top (Urbach et al., 2026).

  • LLMs: generate text (GPT, Claude, Gemini, LLaMA)
  • Diffusion models: generate images, audio, video (DALL-E, Stable Diffusion, Midjourney)
  • Agentic AI: combines these with planning, memory, and tools to act

Recap: diffusion in one idea

LLMs generate token by token. Diffusion generates by iterative denoising (Ho et al., 2020).

  • Forward: take a real image and gradually add noise until it is pure noise
  • Reverse: train a network to undo one step of noise at a time
  • Generate: start from pure noise and denoise, guided by a text embedding

Same recipe for video and audio: embed the prompt, then denoise toward structure.

Exercise B: map the media company

A media company produces articles, photos, video segments, podcasts. For each:

Tasks (pairs)

  1. Which technology fits: LLM, diffusion, or a combination?
  2. What is the single biggest risk to manage?
  3. If they can pilot only one in year one, which, and why?
07:00

Agentic AI: from generation to action

Recap: four building blocks

Four components turn an LLM into an agent (Urbach et al., 2026).

  • Reasoning-augmented LLM: chain-of-thought makes the problem-solving steps visible and checkable
  • Retrieval-Augmented Generation (RAG): real-time access to external knowledge
  • Conversational agent: keeps context across a long, multi-turn task
  • Multi-agent system (MAS): specialised agents divide the labour and check each other

Recap: RAG in three steps

RAG is the fix for the static-knowledge limitation from the first block (Lewis et al., 2020).

  1. Retrieve: search an external knowledge base for passages relevant to the query
  2. Augment: inject those passages into the context alongside the question
  3. Generate: the LLM answers grounded in the retrieved evidence

Payoff: current knowledge without retraining, and a source the user can verify.

Exercise C: design a student-support RAG

Your university wants a chatbot that answers questions on study and exam regulations, which change every semester.

Tasks (pairs)

  1. Sketch the RAG architecture: what are the components, and what goes in the knowledge base?
  2. Name one way it could harm students and propose a mitigation.
08:00

Wrap-up

Key takeaways

Three pillars, one engine

  • LLMs: pretraining gives capability; fine-tuning and RLHF turn a predictor into a useful, aligned assistant
  • Diffusion: a different generative recipe; embed the prompt, then denoise from noise toward image, audio, or video
  • Agentic AI: wrap an LLM with reasoning, retrieval, memory, and other agents so it can act, not just generate

The recurring theme

  • More capability brings new failure modes; grounding (RAG) and human oversight are how you deploy responsibly

Bridge

An agentic system takes actions in the world, not just text on a screen.

When an agent makes a consequential mistake, who is accountable: the user, the deploying organisation, or the model developer?

Q&A

Literature

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851. https://arxiv.org/abs/2006.11239
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://arxiv.org/abs/2005.11401
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L. E., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 35, 27730–27744. https://arxiv.org/abs/2203.02155
Urbach, N., Feulner, D., Feulner, S., Guggenberger, T., & Mayer, V. (2026). Introduction to generative artificial intelligence. In N. Urbach & D. Feulner (Eds.), Managing artificial intelligence (pp. 71–95). Springer Nature Switzerland. https://doi.org/10.1007/978-3-032-13308-3_4