Generative AI

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

March 17, 2026

Introduction

Discussion

When you hear “Generative AI”, what comes to mind?

And what do you think it actually means for a machine to create something?

From recognition to generation

Neural networks learn to recognize patterns.
Generative AI learns to create them.

This shift from recognition to generation is fundamental (Goodfellow et al., 2016; Urbach et al., 2026):

  • Discriminative AI asks:
    “What is this?” (i.e., classifying, predicting, deciding)
  • Generative AI asks:
    “What could this be?” (i.e., creating text, images, audio, video)
  • Instead of mapping inputs to labels, generative models learn the underlying distribution of data
  • They can then sample new instances from that distribution (i.e., producing new content)

The generative AI landscape

Generative AI has rapidly transitioned from a niche research domain to a significant driver of innovation across industries (Urbach et al., 2026).

Two major families of foundational models dominate today:

  • Large Language Models (LLMs): generate coherent, contextually relevant text (examples: GPT-4, Gemini, Claude, LLaMA)
  • Diffusion Models: generate high-quality visual and audio content from noise (examples: DALL-E, Midjourney, Stable Diffusion, AudioLDM)

Beyond standalone models, Agentic AI combines these capabilities with planning, memory, and tool use and, thus, enable AI to act, not just generate.

The catalyst: ChatGPT

The introduction of ChatGPT by OpenAI in November 2022 marked a turning point:

  • Built on the GPT architecture (Generative Pre-trained Transformer)
  • Its simple, user-friendly interface made advanced AI accessible to a mass audience
  • Within 2 months it attracted over 100 million users, one of the fastest-growing applications in digital history (Reuters, 2023)
  • Major tech companies (Microsoft, Google) immediately intensified their generative AI investments (The Verge, 2024)

ChatGPT is a catalyst, not the full picture.

Large Language Models

What are LLMs?

LLMs are neural networks trained on vast amounts of text, capable of generating coherent, contextually appropriate language (Brown et al., 2020; Vaswani et al., 2017).

Key characteristics

  • Built on the Transformer architecture with self-attention
  • Trained via next-token prediction on internet-scale text corpora
  • Process all tokens in a sequence simultaneously (unlike sequential RNNs)
  • Represent words as high-dimensional embedding vectors capturing semantic meaning
  • Scale dramatically: GPT-3 has 175 billion parameters; GPT-4 is estimated at over 1 trillion

How LLMs generate text

The generation process in an LLM follows a clear probabilistic pipeline (Sanderson, 2024):

  1. Tokenization: input text is split into tokens; each token becomes a numerical ID
  2. Embedding: token IDs are mapped to high-dimensional vectors capturing semantic meaning
  3. Transformer layers: self-attention and feed-forward layers refine contextual representations
  4. Output projection: final vector is mapped to scores over the entire vocabulary
  5. Softmax & sampling: scores become probabilities; the next token is sampled
  6. Repeat: the generated token is appended to the context and the process continues

Training phases

LLMs are not trained in a single step, they go through three distinct phases (Ouyang et al., 2022):

  1. Pretraining: self-supervised learning on massive, unlabelled text corpora (internet archives, books, scientific articles); the model learns linguistic patterns, factual associations, and reasoning structures through next-token prediction
  2. Fine-tuning: supervised learning on smaller, high-quality, labelled datasets; adapts the general model to specific tasks (summarisation, Q&A, coding) with more precise, contextually relevant outputs
  3. Reinforcement Learning from Human Feedback (RLHF): human evaluators score model outputs; a reward model is trained on these scores; the LLM is then optimised to produce outputs humans prefer (i.e., aligning behaviour with expectations and ethical considerations)

Discussion

Consider the tasks you do in a typical working day. Where could an LLM genuinely help? And where might it do more harm than good?

Application scenarios

LLMs are applied across a broad spectrum of domains (Gimpel et al., 2023, 2024):

  • Content creation: drafting emails, blog posts, reports, code, and creative writing
  • Text summarisation: condensing lengthy documents into concise, actionable summaries
  • Knowledge dissemination: explaining complex concepts accessibly for varied audiences
  • Research support: structuring literature reviews, drafting paper sections, suggesting methodology
  • Customer service: powering conversational agents that handle routine queries at scale
  • Code generation: writing, explaining, and debugging software across programming languages
  • Translation: converting documents between languages while preserving context and register

Limitations of LLMs

Despite remarkable capabilities, LLMs have fundamental limitations that any responsible deployment must address (Riemer & Peter, 2023; Verma & Oremus, 2023).

  • Hallucination: Generates plausible but factually incorrect information.
  • No genuine understanding: Relies on statistical patterns rather than logic or reasoning.
  • Training data bias: Reflects and amplifies prejudices found in its source material.
  • Legal & privacy risks: Can leak sensitive data or infringe on intellectual property.
  • Static knowledge: Limited by a “cutoff date” and cannot learn post-training.
  • Resource intensive: Demands massive energy and expensive infrastructure.

Diffusion Models

A different generative paradigm

While LLMs generate text token by token, diffusion models generate images, video, and audio through an iterative denoising process inspired by physics (Ho et al., 2020; Urbach et al., 2026).

The core intuition:

  • Forward diffusion: Take a real image and gradually add Gaussian noise over many timesteps until it becomes pure random noise
  • Reverse diffusion: Train a neural network to reverse this process (to predict and remove the noise at each step)
  • At generation time: start from pure noise and apply the learned denoising process to produce coherent, realistic content

Text-to-image generation

The most prominent application of diffusion models is generating images from text descriptions (Rombach et al., 2022):

  1. Text embedding: the text prompt is encoded into a high-dimensional semantic vector
  2. Noise-to-image: a diffusion model starts from random noise and iteratively denoises it, conditioned on the text embedding
  3. Super-resolution: the rough initial image (e.g., 64×64 pixels) is progressively upscaled through further diffusion passes to high resolution (e.g., 1024×1024)

Beyond images

Diffusion models extend naturally to other modalities:

  • Text-to-video (Runway Gen-2, Imagen Video)
    Temporal coherence between frames is learned from large video datasets; enables visualisation of storyboards, animation, and promotional content directly from text
  • Text-to-audio (AudioLDM)
    Musical pieces and soundscapes from descriptive prompts; learns from mood, instrumentation, and genre cues embedded in text; applications in entertainment scoring, advertising jingles, and rapid audio prototyping

All three modalities share the same fundamental mechanism: embedding the prompt to iterative denoising to structured output (Liu et al., 2023; Singh, 2023).

Limitations of diffusion models

  • Limited controllability: precise steering of outputs (exact positions, specific faces, fine typography) remains difficult; users often resort to trial and error for nuanced requirements (Peng, 2024)
  • Embedded bias: models trained on internet imagery inevitably reflect and can amplify societal biases relating to gender, race, and culture (“data mirror effect”) (Milne, 2023)
  • Copyright and IP concerns: outputs may closely resemble training images; legal questions about ownership, attribution, and infringement are unresolved (Brittain, 2023)
  • Deepfakes and misinformation: realistic images and videos can be used to fabricate convincing false narratives, threatening trust in media
  • Safety vs. freedom trade-offs: content filters introduce new complexities; defining universally acceptable generation criteria is culturally contested

Agentic AI

From generation to action

Definition

Agentic AI is an emerging paradigm in AI that refers to autonomous systems designed to pursue complex goals with minimal human intervention. Acharya et al. (2025, p. 18912)

Core characteristics

  • Autonomy & goal complexity: handles multiple complex goals simultaneously; operates independently over extended periods
  • Adaptability: functions in dynamic and unpredictable environments; makes decisions with incomplete information
  • Independent decision-making: learns from experience; reconceptualizes approaches based on new information

Agentic AI vs. Traditional AI

Comparison of traditional AI and agentic AI based on Acharya et al. (2025)
Feature Traditional AI Agentic AI
Primary purpose Task-specific automation Goal-oriented autonomy
Human intervention High (predefined parameters) Low (autonomous adaptability)
Adaptability Limited High
Environment interaction Static or limited context Dynamic and context-aware
Learning type Primarily supervised Reinforcement and self-supervised
Decision-making Data-driven, static rules Autonomous, contextual reasoning

Building blocks

Four key components transform LLMs into agents (Urbach et al., 2026):

  1. Reasoning-augmented LLMs: chain-of-thought prompting and multi-path reasoning enable systematic, verifiable problem-solving rather than surface-level pattern matching
  2. Retrieval-Augmented Generation (RAG): integrates real-time access to external knowledge bases, addressing the static knowledge limitation of standard LLMs
  3. Conversational agents: maintain context over extended dialogues; bridge human intent and machine execution; manage conversation history within token limits
  4. Multi-agent systems (MAS): multiple specialised agents collaborate and delegate tasks, enabling scalable, modular architectures for complex domains

Workflow patterns in agentic systems

Anthrophic (2024) discusses five key patterns for designing agentic AI workflows:

  1. Prompt chaining: output of one step becomes input to the next; creates complex multi-step reasoning flows
  2. Routing: directs tasks to specialised components based on type; improves efficiency through targeted processing
  3. Parallelisation: processes independent subtasks simultaneously; increases throughput
  4. Orchestrator-workers: central orchestrator delegates to specialised worker agents; manages coordination and integration
  5. Evaluator-optimizer: separate components generate, evaluate, and refine; enables iterative quality improvement

Retrieval-Augmented Generation (RAG)

RAG combines the generative power of LLMs with dynamic access to external, up-to-date knowledge (Lewis et al., 2020).

The mechanism:

  1. Retrieval: a retrieval module searches an external knowledge base (databases, documents, web APIs) for passages relevant to the query
  2. Augmentation: retrieved passages are injected into the context alongside the original query
  3. Generation: the LLM generates a response grounded in the retrieved evidence, not just its training data

Key advantages: factual accuracy, updatability without retraining, and interpretability (users can inspect which sources were used)

Multi-agent systems

Multi-agent systems (MAS) represent the frontier of agentic AI: multiple specialised agents collaborating to solve problems beyond any single agent’s capabilities (Doran et al., 1997; Hoek & Wooldridge, 2008).

Architecture:

  • A manager agent orchestrates the overall workflow
  • Specialist agents handle specific subtasks (analysis, reasoning, writing, validation)
  • Agents share a common knowledge base and communicate through structured protocols

Benefits:

  • Division of labour: tasks broken into components matched to agent strengths
  • Scalability: add specialised agents without redesigning the whole system
  • Robustness: multiple agents can check each other’s outputs

The road ahead

  • Agentic AI is not a distant future; systems like AutoGPT, Microsoft Copilot, and Anthropic’s Claude already exhibit agentic behaviour in production deployments (Plaat et al., 2025)
  • Governance challenges grow with agency: autonomous systems that take actions in the world require more robust oversight than text-generation systems
  • Human-AI collaboration models are evolving rapidly: from AI as assistant, to AI as collaborator, toward AI as (supervised) autonomous actor
  • Responsibility questions intensify: when an agentic system makes a consequential mistake, who is accountable (e.g., the user, the deploying organisation, or the model developer)?

Exercises

GenAI landscape mapping

You are advising a media company that produces news articles, photographs, video segments, and podcast episodes.

  1. Map the technologies: For each content type, identify which generative AI technology (LLM, diffusion model, or a combination) would be most relevant and explain why.
  2. Identify risks: For each application, identify the most significant risk that would need to be managed (e.g., hallucination, bias, copyright, deepfake potential).
  3. Prioritise: If the company can only pilot one generative AI application in the first year, which would you recommend and why?

Training phase analysis

A startup has built a general-purpose LLM through pretraining on a large web corpus. They now want to deploy it as a legal document assistant for law firms.

  1. Fine-tuning: What kind of training data would you recommend for fine-tuning, and why?
  2. RLHF: Who would you recruit as human raters, and what specific quality criteria would you have them evaluate?
  3. Trade-offs: Fine-tuning makes the model more specialised, but it risks “catastrophic forgetting” of general knowledge. How would you design the training process to mitigate this?
  4. Evaluation: Before deploying to law firms, what evaluation would you run to assess the model’s readiness?

Diffusion model design

A pharmaceutical company wants to use generative AI to visualise molecular structures and simulate how proposed drug compounds might interact with target proteins. They are considering adapting diffusion models for this scientific domain.

  1. Analogy: The text-to-image pipeline uses (i) text embedding, (ii) denoising conditioned on the embedding, (iii) super-resolution. Design an analogous pipeline for text-to-molecule generation.
  2. Training data: What would a training dataset look like, and where might it come from?
  3. Evaluation challenge: Unlike image generation (where human aesthetic judgement provides a useful signal), how would you evaluate whether a generated molecule is “good”?
  4. Limitations: What specific limitations of diffusion models become especially problematic in this scientific context?

RAG system design

Your university wants to deploy a student support chatbot that can answer questions about study regulations, course requirements, examination procedures, and administrative processes. The university updates its regulations each semester.

  1. Architecture: Sketch a RAG architecture for this system. What are the components, and what documents belong in the knowledge base?
  2. Retrieval quality: A student asks: “Can I take my bachelor’s thesis exam if I still have one failed elective from last semester?” Describe the retrieval steps and explain what could go wrong.
  3. Knowledge base maintenance: How would you ensure the knowledge base stays up to date as regulations change each semester?
  4. Failure modes: Identify three ways this system could fail in ways that harm students, and propose mitigations for each.

Agentic AI evaluation

Consider the following agentic AI scenario: a university deploys an AI research assistant agent that, given a research question, autonomously searches academic databases, reads and synthesises relevant papers, identifies gaps in the literature, and produces a structured literature review draft.

  1. Agent architecture: Identify which building blocks from the lecture (reasoning-augmented LLM, RAG, conversational agent, MAS) are present in this system, and describe the role of each.
  2. Goal specification: How would you specify the agent’s goal precisely enough that it produces useful output without over- or under-shooting? What could go wrong with vague goal specification?
  3. Human oversight: At which points in the agent’s workflow should a human researcher be consulted or able to intervene? Design a human-in-the-loop workflow.
  4. Ethical considerations: Identify two ethical issues this deployment raises for academic integrity, and discuss how they might be addressed.

Literature

Acharya, D. B., Kuppan, K., & Divya, B. (2025). Agentic AI: Autonomous intelligence for complex goals–a comprehensive survey. IEEE Access.
Anthrophic. (2024). Building effective agents. Anthropic Research Team; https://www.anthropic.com/engineering/building-effective-agents.
Berente, N., Gu, B., Recker, J., & Santhanam, R. (2021). Managing artificial intelligence. MIS Quarterly, 45(3), 1433–1450. https://doi.org/10.25300/MISQ/2021/16274
Bhatia, A. (2023). We need to talk about how good A.I. Is getting. https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html
Brittain, B. (2023). Getty images lawsuit says Stability AI misused photos to train AI. Reuters. https://www.reuters.com/legal/getty-images-lawsuit-says-stability-ai-misused-photos-train-ai-2023-02-06/
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877–1901. https://arxiv.org/abs/2005.14165
Cao, H., Tan, C., Gao, Z., Xu, Y., Chen, G., Heng, P.-A., & Li, S. Z. (2023). A survey on generative diffusion model. arXiv Preprint arXiv:2209.02646. https://arxiv.org/abs/2209.02646
Doran, J., Franklin, S., Jennings, N., & Norman, T. (1997). On cooperation in multi-agent systems. The Knowledge Engineering Review, 12(3), 309–314. https://doi.org/10.1017/s0269888997003111
Gimpel, H., Gutheil, N., Mayer, V., Bandtel, M., Büttgen, M., Decker, S., et al. (2024). (Generative) AI competencies for future-proof graduates: Inspiration for higher education institutions [Hohenheim Discussion Papers in Business, Economics and Social Sciences]. University of Hohenheim.
Gimpel, H., Ruiner, C., Schoch, M., Schoop, M., Hall, K., Eymann, T., Röglinger, M., Vandrik, S., Lämmermann, L., Urbach, N., Mädche, A., & Decker, S. (2023). Unlocking the power of generative AI models and systems such as GPT-4 and ChatGPT for higher education: A guide for students and lecturers (Hohenheim Discussion Papers in Business, Economics and Social Sciences 02-2023). University of Hohenheim. https://hohpublica.uni-hohenheim.de/items/fe53b2bb-ab75-463c-9383-ec74416fd940
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851. https://arxiv.org/abs/2006.11239
Hoek, W. van der, & Wooldridge, M. (2008). Multi-agent systems. In Handbook of knowledge representation (Vol. 3, pp. 887–928). Elsevier. https://doi.org/10.1016/S1574-6526(07)03024-6
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 33, 9459–9474. https://arxiv.org/abs/2005.11401
Liu, H., Chen, Z., Yuan, Y., et al. (2023). AudioLDM: Text-to-audio generation with latent diffusion models. https://arxiv.org/abs/2301.12503
Milne, S. (2023). AI image generator Stable Diffusion perpetuates racial and gendered stereotypes, study finds. https://www.washingtonpost.com/news/2023/11/29/ai-image-generator-stable-diffusion-perpetuates-racial-and-gendered-stereotypes-bias/
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L. E., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 35, 27730–27744. https://arxiv.org/abs/2203.02155
Peng, Y. (2024). A comparative analysis between GAN and diffusion models in image generation. Transactions on Computer Science and Intelligent Systems Research, 5(1).
Plaat, A., D’Ascoli, S., Bubeck, S., Chan, B., Chen, D., Chi, E. H., et al. (2025). Agentic large language models. https://arxiv.org/abs/2503.23037
Reuters. (2023). ChatGPT sets record for fastest-growing user base—Analyst note. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
Riemer, K., & Peter, S. (2023). What the lone banana problem reveals about the nature of generative AI. ACIS 2023 Proceedings.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
Sanderson, G. (2024). But what is a GPT? Visual intro to transformers. https://www.3blue1brown.com/lessons/gpt
Singh, A. (2023). A survey of AI text-to-image and AI text-to-video generators. https://arxiv.org/abs/2311.06329
The Verge. (2024). Inside the launch—and future—of ChatGPT. https://www.theverge.com/23610427/chatgpt-openai-history-two-year-anniversary
Urbach, N., Feulner, D., Feulner, S., Guggenberger, T., & Mayer, V. (2026). Introduction to generative artificial intelligence. In N. Urbach & D. Feulner (Eds.), Managing artificial intelligence (pp. 71–95). Springer Nature Switzerland. https://doi.org/10.1007/978-3-032-13308-3_4
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30, 5998–6008. https://arxiv.org/abs/1706.03762
Verma, P., & Oremus, W. (2023). ChatGPT invented a sexual harassment scandal and named a real law prof as the accused. https://www.washingtonpost.com/technology/2023/04/05/chatgpt-lies/
Wang, L., Liu, Z., Wang, Z., & Li, L. (2024). A survey on large language model based autonomous agents. Journal of Computer Science and Technology. https://doi.org/10.1109/JOCST.2024.10849561