Generative AI

Introduction to AI (I2AI)

Andy Weeger

Neu-Ulm University of Applied Sciences

March 17, 2026

Introduction

Discussion

When you hear “Generative AI”, what comes to mind?

And what do you think it actually means for a machine to create something?

From recognition to generation

Neural networks learn to recognize patterns. Generative AI learns to create them.

The shift is fundamental (Goodfellow et al., 2016; Urbach et al., 2026):

  • Discriminative AI asks:
    “What is this?” (i.e., classifying, predicting, deciding)
  • Generative AI asks:
    “What could this be?” (i.e., creating text, images, audio, video)
  • Instead of mapping inputs to labels, generative models learn the underlying distribution of data
  • They can then sample new instances from that distribution (i.e., producing new content)

The generative AI landscape

Generative AI has rapidly transitioned from a niche research domain to a significant driver of innovation across industries (Urbach et al., 2026).

Two major families of foundational models dominate today:

  • Large Language Models (LLMs): generate coherent, contextually relevant text
    Examples: GPT-4, Gemini, Claude, LLaMA
  • Diffusion Models: generate high-quality visual and audio content from noise
    Examples: DALL-E, Midjourney, Stable Diffusion, AudioLDM

Beyond standalone models, Agentic AI combines these capabilities with planning, memory, and tool use and, thus, enable AI to act, not just generate.

The catalyst: ChatGPT

The introduction of ChatGPT by OpenAI in November 2022 marked a turning point:

  • Built on the GPT architecture (Generative Pre-trained Transformer)
  • Its simple, user-friendly interface made advanced AI accessible to a mass audience
  • Within 2 months it attracted over 100 million users, one of the fastest-growing applications in digital history (Reuters, 2023)
  • Major tech companies (Microsoft, Google) immediately intensified their generative AI investments (The Verge, 2024)

ChatGPT is a catalyst, not the full picture.

Large Language Models

What are LLMs?

LLMs are neural networks trained on vast amounts of text, capable of generating coherent, contextually appropriate language (Brown et al., 2020; Vaswani et al., 2017).

Key characteristics

  • Built on the Transformer architecture with self-attention
  • Trained via next-token prediction on internet-scale text corpora
  • Process all tokens in a sequence simultaneously (unlike sequential RNNs)
  • Represent words as high-dimensional embedding vectors capturing semantic meaning
  • Scale dramatically: GPT-3 has 175 billion parameters; GPT-4 is estimated at over 1 trillion

How LLMs generate text

The generation process in an LLM follows a clear probabilistic pipeline (Sanderson, 2024):

  1. Tokenization: input text is split into tokens; each token becomes a numerical ID
  2. Embedding: token IDs are mapped to high-dimensional vectors capturing semantic meaning
  3. Transformer layers: self-attention and feed-forward layers refine contextual representations
  4. Output projection: final vector is mapped to scores over the entire vocabulary
  5. Softmax & sampling: scores become probabilities; the next token is sampled
  6. Repeat: the generated token is appended to the context and the process continues

Training phases

LLMs are not trained in a single step, they go through three distinct phases (Ouyang et al., 2022):

  1. Pretraining: self-supervised learning on massive, unlabelled text corpora (internet archives, books, scientific articles); the model learns linguistic patterns, factual associations, and reasoning structures through next-token prediction
  2. Fine-tuning: supervised learning on smaller, high-quality, labelled datasets; adapts the general model to specific tasks (summarisation, Q&A, coding) with more precise, contextually relevant outputs
  3. Reinforcement Learning from Human Feedback (RLHF): human evaluators score model outputs; a reward model is trained on these scores; the LLM is then optimised to produce outputs humans prefer (i.e., aligning behaviour with expectations and ethical considerations)

Discussion

Consider the tasks you do in a typical working day. Where could an LLM genuinely help? And where might it do more harm than good?

Application scenarios

LLMs are applied across a broad spectrum of domains (Gimpel et al., 2023, 2024):

  • Content creation: drafting emails, blog posts, reports, code, and creative writing
  • Text summarisation: condensing lengthy documents into concise, actionable summaries
  • Knowledge dissemination: explaining complex concepts accessibly for varied audiences
  • Research support: structuring literature reviews, drafting paper sections, suggesting methodology
  • Customer service: powering conversational agents that handle routine queries at scale
  • Code generation: writing, explaining, and debugging software across programming languages
  • Translation: converting documents between languages while preserving context and register

Limitations of LLMs

Despite remarkable capabilities, LLMs have fundamental limitations that any responsible deployment must address (Riemer & Peter, 2023; Verma & Oremus, 2023).

  • Hallucination: Generates plausible but factually incorrect information.
  • No genuine understanding: Relies on statistical patterns rather than logic or reasoning.
  • Training data bias: Reflects and amplifies prejudices found in its source material.
  • Legal & privacy risks: Can leak sensitive data or infringe on intellectual property.
  • Static knowledge: Limited by a “cutoff date” and cannot learn post-training.
  • Resource intensive: Demands massive energy and expensive infrastructure.

Diffusion Models

A different generative paradigm

While LLMs generate text token by token, diffusion models generate images, video, and audio through an iterative denoising process inspired by physics (Ho et al., 2020; Urbach et al., 2026).

The core intuition:

  • Forward diffusion: Take a real image and gradually add Gaussian noise over many timesteps until it becomes pure random noise
  • Reverse diffusion: Train a neural network to reverse this process (to predict and remove the noise at each step)
  • At generation time: start from pure noise and apply the learned denoising process to produce coherent, realistic content

Text-to-image generation

The most prominent application of diffusion models is generating images from text descriptions (Rombach et al., 2022):

  1. Text embedding: the text prompt is encoded into a high-dimensional semantic vector
  2. Noise-to-image: a diffusion model starts from random noise and iteratively denoises it, conditioned on the text embedding
  3. Super-resolution: the rough initial image (e.g., 64×64 pixels) is progressively upscaled through further diffusion passes to high resolution (e.g., 1024×1024)

Beyond images: video and audio

Diffusion models extend naturally to other modalities:

  • Text-to-video (Runway Gen-2, Imagen Video)
    Temporal coherence between frames is learned from large video datasets; enables visualisation of storyboards, animation, and promotional content directly from text
  • Text-to-audio (AudioLDM)
    Musical pieces and soundscapes from descriptive prompts; learns from mood, instrumentation, and genre cues embedded in text; applications in entertainment scoring, advertising jingles, and rapid audio prototyping

All three modalities share the same fundamental mechanism: embedding the prompt to iterative denoising to structured output (Liu et al., 2023; Singh, 2023).

Limitations of diffusion models

  • Limited controllability: precise steering of outputs (exact positions, specific faces, fine typography) remains difficult; users often resort to trial and error for nuanced requirements (Peng, 2024)
  • Embedded bias: models trained on internet imagery inevitably reflect and can amplify societal biases relating to gender, race, and culture (“data mirror effect”) (Milne, 2023)
  • Copyright and IP concerns: outputs may closely resemble training images; legal questions about ownership, attribution, and infringement are unresolved (Brittain, 2023)
  • Deepfakes and misinformation: realistic images and videos can be used to fabricate convincing false narratives, threatening trust in media
  • Safety vs. freedom trade-offs: content filters introduce new complexities; defining universally acceptable generation criteria is culturally contested

Agentic AI

From generation to action

Definition

Agentic AI is an emerging paradigm in AI that refers to autonomous systems designed to pursue complex goals with minimal human intervention. Acharya et al. (2025, p. 18912)

Core characteristics

  • Autonomy & goal complexity: handles multiple complex goals simultaneously; operates independently over extended periods
  • Adaptability: functions in dynamic and unpredictable environments; makes decisions with incomplete information
  • Independent decision-making: learns from experience; reconceptualizes approaches based on new information

Agentic AI vs. Traditional AI

Comparison of traditional AI and agentic AI based on Acharya et al. (2025)
Feature Traditional AI Agentic AI
Primary purpose Task-specific automation Goal-oriented autonomy
Human intervention High (predefined parameters) Low (autonomous adaptability)
Adaptability Limited High
Environment interaction Static or limited context Dynamic and context-aware
Learning type Primarily supervised Reinforcement and self-supervised
Decision-making Data-driven, static rules Autonomous, contextual reasoning

Building blocks

Four key components transform LLMs into agents (Urbach et al., 2026):

  1. Reasoning-augmented LLMs: chain-of-thought prompting and multi-path reasoning enable systematic, verifiable problem-solving rather than surface-level pattern matching
  2. Retrieval-Augmented Generation (RAG): integrates real-time access to external knowledge bases, addressing the static knowledge limitation of standard LLMs
  3. Conversational agents: maintain context over extended dialogues; bridge human intent and machine execution; manage conversation history within token limits
  4. Multi-agent systems (MAS): multiple specialised agents collaborate and delegate tasks, enabling scalable, modular architectures for complex domains