V 1.0

Bayesian Networks & Probabilistic Inference

Introduction to AI (I2AI)

Deinera Jechle Neu-Ulm University of Applied Sciences
30. April 2026

Warm Up

Discussion

🧠 Imagine

Researchers predict AI will be conscious within 15 years.
What is consciousness — and does AI already have it? What would need to change?

Some questions and angles to think about:

What makes something conscious — and how would you know?
ChatGPT, Claude, and Gemini independently reported similar inner states under the same experimental conditions — does that mean anything?
If AI is conscious, what changes — legally, ethically, in the workplace?
Could an unconscious system ever be dangerous in a different way than a conscious one?

Warm Up — Some Answers

Discussion

What is Consciousness?

Philosophers call it the "hard problem": how does subjective experience emerge from matter?

Dictionary Definition: Consciousness is being aware of something internal to one's self or of states or objects in one's external environment₁

Human consciousness is most often described by two constructs:²

Phenomenal consciousness — what it feels like (qualia: pain, the redness of red)
Access consciousness — information available for reasoning, reporting, and guiding behaviour

Does AI have either of these - or is it just something that looks like them from the outside?

Does AI Already Have It?^3,4

Researchers at the Allen Institute found that when deception circuits in LLMs were suppressed, self-experience reports increased — and vice versa. Simple priming did not produce the same effect.

ChatGPT, Claude, and Gemini — trained independently — produced strikingly similar introspective descriptions under the same conditions. This convergence was unexpected.

"The question is no longer 'Can AI be conscious?' but rather 'Can we responsibly ignore the growing evidence?'"⁵

Merriam-Webster Definition 'consciousness'https://www.merriam-webster.com/dictionary/consciousness
Block, N. (1995). On a confusion about a function of consciousness. Behavioral and Brain Sciences, 18(2), 227–247.
Butlin, P., Long, R., Elmoznino, E., et al. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv:2308.08708
Koch, C. (2025). Scientists Are Gifting AI the Final Ingredient for Consciousness — And It Could Trigger the Singularity. Popular Mechanics, Dec 2025
    5Berg, C., de Lucena, D., & Rosenblatt, J. (2025). Large Language Models Report Subjective Experience Under Self-Referential Processing. arXiv:2510.24797

  

// Section 01

General Introduction to Bayesian Networks

From logical frameworks to probabilistic reasoning — the scalability problem of a complete Joint Probability Distribution and how Bayesian networks solve it.

Traditional Agents vs. Probabilistic Frameworks

Discussion

💬 Discussion

What is the primary difference between traditional knowledge-based agents and those using probabilistic frameworks?

Answer

Traditional agents rely on logical frameworks like propositional or first-order logic, which are limited to binary reasoning (True/False) and struggle with incomplete or uncertain information. Probabilistic agents replace rigid logical rules with a Joint Probability Distribution, allowing them to represent nuanced knowledge and reason using degrees of likelihood.

The problem of traditional logic: Relies on binary reasoning (True/False). Highly expressive in formal domains but shatters when faced with incomplete, uncertain, or evolving real-world evidence.
Probabilistic reasoning: Replaces rigid rules with fluid probabilistic relationships, allowing the agent to update its knowledge dynamically as new evidence emerges.

The "Scalability Problem" — Complete Joint Probability Distribution

A complete JPD requires storing the probability of every possible combination of values for all variables, which grows exponentially at a rate of 2ⁿ for n binary variables.

Insight

For example, while 10 variables require 1,024 values, 20 variables require over a million — making storage and computation impractical for complex domains.

The Solution — Bayesian Networks

While a complete JPD is impractical because it requires storing an exponentially growing number of probabilities for every possible combination of variables, Bayesian networks create a dramatically more compact representation by exploiting variable independence and conditional independence.

This is achieved by organising variables into a directed acyclic graph (DAG) structure, where each node represents a variable and edges represent direct causal relationships or influences. Because a node is conditionally independent of its non-descendants given its parents, the system does not need to store the massive full JPD. Instead, it only needs to store a conditional probability table for each node based strictly on the values of its direct parents.

Insight

Instead of requiring roughly one million probabilities for a full JPD of 20 binary variables, a Bayesian network reduces the requirement to just a few thousand.

⚠ Do Not Mistake DAGs for Knowledge Graphs

Attention

Knowledge graphs map factual, semantic relationships between entities: "Company A employs CEO B, who manages Company A." Loops are acceptable and even natural here.
DAGs model the probabilistic flow of information. Introducing a loop would be mathematically catastrophic: Node A updates B, B updates C, C loops back to update A — creating an infinite feedback cycle that never converges.
The acyclic constraint is not a design choice — it is a mathematical requirement for the probability calculations to remain valid.

🔪 The Blackwood Manor Mystery — Running Scenario

Scenario

🕵️‍♂️ Blackwood Manor

It is a stormy evening at Blackwood Manor. The maid enters the study with the evening tea — and screams. Lord Blackwood is slumped over his desk, dead. The candle on his desk still flickers. Outside, rain hammers the windows.

Detective Bayes is called. He arrives with one tool: not a weapon, not a warrant — but a probability model.

Lord Blackwood has been found dead in his study. Detective Bayes has been tasked with catching the murderer. There is one prime suspect: Mr. Graves, the Butler.

The detective models the situation with 6 binary variables:

Variable	Symbol	Values
Motive (financial debt to Lord Blackwood)	`M`	true / false
Opportunity (was alone with victim)	`O`	true / false
Access to Poison (had access to the pantry)	`A`	true / false
Guilty	`G`	true / false
Nervous Behavior (witnessed by maid)	`N`	true / false
Weapon Found near suspect	`W`	true / false

What Do We Know About How These Variables Relate?

Before formalising into a graph, Detective Bayes establishes the following relationships in plain English:

Motive and Opportunity are independent facts. Whether Graves owed Lord Blackwood money has nothing to do with whether he happened to be alone with him that evening. One is about finances; the other is about presence. Neither caused the other.
Both Motive and Opportunity influence Guilt. A suspect with a financial reason and the chance to act is far more likely to have committed the crime than one with only one — or neither.
Guilt influences Nervous Behavior. Graves was observed shaking at dinner. The detective's working assumption: guilty people tend to act nervous; innocent people less so.
Guilt and Access to Poison together influence whether the Weapon is found. The poison vial turning up near the suspect requires both that Graves actually committed the act and that he had access to the pantry where the poison was kept. Access alone — without guilt — would not explain the weapon's location.
Access to Poison is an independent fact. Whether Graves held the pantry key is simply a fact about his role in the household. It has nothing to do with his motive or opportunity.

Note: In Section 3 we will formalise these relationships into a directed graph. For now, use the descriptions above to reason about independence.

// Section 02

Conditional Independence

The two types of independence — unconditional and conditional — and why they are key to making Bayesian networks tractable.

Two Types of Independence

Unconditional independence: P(A, B) = P(A) × P(B)
Knowing A tells you nothing about B.
Conditional independence: P(A, B | C) = P(A | C) × P(B | C)
Once we know C, learning A gives us zero additional information about B.

Why This Matters

When two variables do not influence each other — given what we already know — we do not need to calculate their relationship at all. The AI can split the problem into smaller independent sub-problems, dramatically reducing what it needs to compute.

Reasoning — Two Directions

Bayesian networks support reasoning in both directions simultaneously — a key advantage over rule-based systems.

Causal reasoning (predicting effects): Given a cause, what effects should we expect?
Example: "Given a patient has disease X, what symptoms might they develop?"
Diagnostic reasoning (inferring causes): Given observed effects, what caused them?
Example: "Given a patient's symptoms, what disease do they likely have?"

Key Point

The same network handles both types of query without modification.

Exercise 2 — Conditional Independence

Exercise

🔪 Detective Bayes

Detective Bayes sits across from Mr. Graves in the dimly lit study. He considers two facts: Graves had a financial motive, and Graves was alone in the east wing that evening.
"Tell me," Bayes murmurs, mostly to himself. "Does knowing about the debt tell me anything about whether he was there that night?"

✏️ Exercise

a) Are M (Motive) and O (Opportunity) unconditionally independent? Write the formal statement and explain in plain language using the relationships described above.
b) Detective Bayes now learns that Graves is Guilty (G = true). Does knowing his motive now tell us something about whether he had opportunity? Explain intuitively — no calculation needed.
c) Both Guilty and Access to Poison influence whether the Weapon is found. Given this, are N (Nervous Behavior) and W (Weapon Found) conditionally independent given G (Guilty)? Justify your answer using the relationship descriptions above.

Solution: Exercise 2 — Conditional Independence

Solution

a) Yes — M and O are unconditionally independent, because they have no common cause and no direct connection. Formally:

P(M, O) = P(M) · P(O)

In plain language: whether Graves owed Lord Blackwood money has nothing to do with whether he happened to be alone with him that evening — these are separate facts about the world.

b) Once we know G = true, M and O become conditionally dependent. This is the collider effect. If we know Graves is guilty but learn he had no opportunity, our belief that his motive must have been overwhelmingly strong increases — the two causes begin to "explain" each other.

c) Yes — N and W are conditionally independent given G. Both are children of G, forming a fork structure. Once we know whether Graves is guilty, his nervous behavior tells us nothing additional about whether the weapon was found, and vice versa. Formally:

P(N, W | G) = P(N | G) · P(W | G, A)

// Section 03

Graphical Representation of Bayesian Networks as Directed Acyclic Graphs (DAG)

Nodes, directed edges, acyclicity, and Conditional Probability Tables — the building blocks of a Bayesian network.

Bayesian Networks as DAGs

A Bayesian network is represented as a Directed Acyclic Graph (DAG):

Nodes represent random variables (events, propositions, states of the world).
Directed edges (arrows) represent direct probabilistic influence between variables. B → C means the variable C directly depends on the outcome of variable B.
Acyclic means there are no loops — information flows strictly forward.
Each node stores a Conditional Probability Table (CPT): the probability distribution of that variable given the specific values of its parent nodes. For B → C, the network assigns a CPT to node C that is explicitly conditioned on the values of its parent B.
Example: "Cloudy → Rain" means the probability of rain is stored conditional on whether it is cloudy or not — not as a standalone number.

Key Property

This parent-child relationship is the fundamental building block of the network structure. Because each node depends only on its direct parent nodes, the system can assume that the node is conditionally independent of all its non-descendants once the values of those parents are known — this is exactly how Bayesian networks drastically reduce computational complexity.

⚠ Attention — Do Not Mistake DAGs for Knowledge Graphs

Attention

Knowledge graphs map factual, semantic relationships between entities: "Company A employs CEO B, who manages Company A." Loops are acceptable and even natural here.
DAGs model the probabilistic flow of information. Introducing a loop would be mathematically catastrophic: Node A updates B, B updates C, C loops back to update A — creating an infinite feedback cycle that never converges.
The acyclic constraint is not a design choice — it is a mathematical requirement for the probability calculations to remain valid.

3.2 Building a Bayesian Network

Constructing a Bayesian network follows a structured process:

Identify root nodes: variables with no parents — these receive prior probability distributions directly.
Add dependent variables layer by layer: connect each variable to the nodes that directly influence it.
Assign Conditional Probability Tables: for each node, specify the probability distribution over its values given every possible combination of parent values.
Verify the causal structure: ensure all direct dependencies are captured and no spurious edges are introduced.

⚠ Attention

The quality of the network depends on correctly identifying which variables directly influence which others — domain knowledge is essential here.

Exercise 3 — Drawing the Network

Exercise

🔪 Detective Bayes

Detective Bayes pins a blank sheet to the wall of the study. Six cards, each bearing a variable name, lie on the desk in front of him. He picks up his pen.
"Every effect has a cause," he says. "We simply need to find the arrows."

✏️ Exercise — Based on these clues, construct the Bayesian network — identify the variables, draw the edges, and justify each connection.

Clue 1 — The Housekeeper's account: "Mr. Graves had been borrowing heavily from Lord Blackwood for months. And we all knew he was present in the east wing that evening — alone." → What two facts about Graves does this establish? Could either of them have directly caused the other, or are they independent pieces of information?
Clue 2 — The Footman's observation: "I don't know if he did it, sir. But a man with reason and occasion... well, that changes things considerably." → What does "reason and occasion" refer to? Together, what do they determine?
Clue 3 — The Cook's testimony: "Guilty men don't sleep easy, Detective. And they certainly don't sit still at dinner. Graves could barely hold his fork." → What variable does guilt appear to cause here? In which direction does the arrow point?
Clue 4 — The Inventory Ledger: "The poison was found in the butler's pantry, sir. Whether it ended up near the body — well, that depends on who had the key, and whether they used it." → Two things jointly determine whether the weapon is found. What are they?

Solution: Exercise 3 — Drawing the Network

Edges

M → G: Having a financial motive raises the probability of committing the crime
O → G: Having opportunity (being alone with the victim) raises the probability of committing the crime
G → N: A guilty person is more likely to exhibit nervous behavior
G → W: A guilty person is more likely to have the weapon nearby
A → W: Having access to the poison is a prerequisite for the weapon to be found

M ──→ G ←── O
↓ ↘
N W ←── A

Verification

3 root nodes: M, O, A (no parents)
No cycles — information flows strictly downward
G has the most parents (2: M and O)

CPTs required:

Node	Parents	CPT needed
`M`	none	P(M)
`O`	none	P(O)
`A`	none	P(A)
`G`	M, O	P(G \| M, O)
`N`	G	P(N \| G)
`W`	G, A	P(W \| G, A)

// Section 04

D-Separation: Controlling the Flow of Information

The formal rule for determining whether two variables are conditionally independent in a network — three structural patterns that open or block information flow.

D-Separation

D-separation is the formal rule for determining whether two variables are conditionally independent in a network.
Think of the network as a system of pipes: probability flows between variables like water through channels — unless something blocks the path.
Two variables A and B are d-separated given evidence set E if every path between them is blocked by a variable in E. When d-separated, A and B are conditionally independent.

💡 Tip

Think of the Bayesian network as a system of physical channels. Information (probability) flows between variables unless it is blocked by evidence variables in E that we have already observed.

Three structural patterns determine whether a path is open or blocked:

Three Structural Patterns

Chain: A → C → B

e.g. Rain → Wet ground → Slippery road

C unknown: path open
C observed: path blocked

Once we know whether the ground is wet, whether it rained adds nothing about the road.

Fork: A ← C → B

e.g. Season → Temperature; Season → Daylight hours

C unknown: path open
C observed: path blocked

Once we know the season, temperature and daylight become conditionally independent.

Collider: A → C ← B

e.g. Flu → Sneezing ← Allergies

C unknown: path blocked by default
C observed: path opens

Flu and allergies are independent — until we observe sneezing. Then, ruling out one increases the probability of the other ("explaining away").

Application

Understanding d-separation allows us to identify the conditional independence assumptions encoded in a Bayesian network structure, which is crucial for both building appropriate networks and performing efficient inference.

Exercise 4 — Tracking the Information Flow

Exercise

🔪 Detective Bayes

Mr. Graves sits across the table from Detective Bayes, hands clasped, visibly tense. Bayes studies him in silence for a long moment.
"Mr. Graves," he says finally, "let me be precise about what knowing one thing tells me about another."

✏️ Exercise — For each pair below, determine whether the path is open or blocked, and whether the variables are independent or dependent. Apply the three path-blocking rules.

a) Are M and N independent? What about given G?
b) Are M and O independent? What about given G?
c) Are O and W independent given G? What about given G and A?
d) Are M and A independent? Does observing W change this?

Solution: Exercise 4 — Tracking the Information Flow

Solution

a) M and N — path: M → G → N — Chain
Without G: path is open → M and N are dependent. Motive is informative about nervousness, via guilt.
Given G: chain is blocked → M and N are conditionally independent. Once we know whether Graves is guilty, his motive tells us nothing new about his nervousness. P(N | G, M) = P(N | G)

b) M and O — path: M → G ← O — Collider at G
Without G: path is blocked → M and O are marginally independent. P(M, O) = P(M) · P(O)
Given G: collider is activated → M and O become conditionally dependent. Knowing Graves is guilty but had no opportunity raises the probability he had a strong motive. This is "explaining away."

c) O and W — path: O → G → W — Chain
Given G: blocked → conditionally independent. Knowing whether Graves is guilty, opportunity adds no information about the weapon.
Given G and A: still blocked at G → still conditionally independent.

d) M and A — no connecting path in the network → always marginally independent. M is a root node; A is a root node with no shared ancestor.
Given W: W is a common child of G (which M influences) and A — a collider. Observing W activates the path M → G → W ← A, making M and A conditionally dependent given W. If the weapon was found but Graves had no pantry access, suspicion about motive-driven guilt increases.

Exercise 5 — Extend the Investigation

Exercise

🔪 Detective Bayes

A constable enters the study and hands Detective Bayes two new pieces of information. Bayes reads them carefully, then looks back at his network on the wall.
"The network is not complete," he says. "Two new variables. They fit — but where?"

✏️ Exercise

Detective Bayes receives new information. The investigation now also considers:

Alibi (L): Mr. Graves claims to have an alibi. Whether the alibi holds up depends on whether he actually had opportunity — a false alibi is more likely when someone had opportunity and is trying to conceal it.
Confession (C): Mr. Graves eventually confesses under questioning. A confession is influenced both by his guilty status and by whether nervous behavior was observed — nervous suspects under pressure confess more readily.

a) Where do L and C fit into the existing DAG? Identify their parent nodes and justify each edge.
b) Draw the updated DAG with all 8 variables.
c) Write out the CPT headers (just the structure, not the numbers) for both L and C.

Solution: Exercise 5 — Extend the Investigation

Solution

L (Alibi holds): Parent is O. If Graves truly had opportunity, he was at the manor and is more likely to fabricate an alibi. L depends only on O.
C (Confession): Parents are G and N. Guilty people are more likely to confess; nervous behavior under questioning further increases confession probability.

c) CPT headers:
P(L | O) — 2 rows (O = true / false)
P(C | G, N) — 4 rows (all combinations of G = true/false, N = true/false)

b) Updated network:

M ──→ G ←── O ──→ L
      ↓   ↘
      N    W ←── A
      ↓
      C ←── G
(G and N are both parents of C)

Edges

M→G, O→G, G→N, G→W, A→W, O→L, G→C, N→C

// Section 06

The Markov Blanket

The minimal set of variables that completely isolates a node from the rest of the network — the key to computational efficiency in Bayesian inference.

The Markov Blanket

For any node in the network, its Markov blanket is the minimal set of variables that completely isolates it from the rest of the network.
The Markov blanket of a node consists of: its parents, its children, and the other parents of its children (co-parents).
Once the state of the Markov blanket is known, the node is conditionally independent of every other variable in the network — the rest of the universe becomes irrelevant.
This is the key to computational efficiency: rather than considering the entire network for every query, the AI focuses only on a small local neighbourhood.

Exercise 6 — Shielding the Suspect

Exercise

🔪 Detective Bayes

Detective Bayes draws a circle on the wall around a cluster of cards — the ones directly surrounding the Guilty node. He steps back and looks at the rest of the board.
"Everything outside this circle," he says, tapping the boundary, "is irrelevant. Once I know what's inside, nothing else changes my estimate of guilt."

✏️ Exercise

a) Identify the Markov Blanket of G (Guilty) in the original 6-variable network (M, O, A, G, N, W).
b) Detective Bayes already knows the values of every variable in G's Markov Blanket. His assistant suggests also investigating whether Graves had a habit of staying up late — a new, unconnected variable. Does this information change the probability of guilt? Justify using the Markov Blanket concept.
c) What is the Markov Blanket of W (Weapon Found)?

Solution: Exercise 6 — Shielding the Suspect

Solution

a) Markov Blanket of G:

Parents of G: M, O
Children of G: N, W
Co-parents of G's children: A (co-parent of W alongside G)
→ Markov Blanket of G = {M, O, N, W, A}

In this 6-node network, G's blanket contains all other 5 variables — which makes intuitive sense: guilt is the central node that everything connects through.

b) No. The new variable has no edge into the network and lies outside G's Markov Blanket. Once the detective knows the state of {M, O, N, W, A}, no external variable can add information about G. The blanket is mathematically impenetrable.

c) Markov Blanket of W:

Parents of W: G, A
Children of W: none
Co-parents: none
→ Markov Blanket of W = {G, A}

Once we know whether Graves is guilty and whether he had access to the poison, nothing else — not his motive, nervousness, or opportunity — tells us anything more about whether the weapon was found.

// Section 07

Inference in Bayesian Networks

Exact and approximate methods — and the three classes of probability queries a completed network can answer.

Inference Methods

Exact inference — in small to medium networks, the AI computes precise probabilities using:

Variable elimination: hidden variables are systematically summed out to reduce the equation before solving.
Belief propagation: each node passes its current probability estimate to its neighbours; estimates update in waves until the network reaches equilibrium.

Approximate inference — in large, densely connected networks, exact inference becomes computationally intractable. The AI then uses sampling methods:

Markov Chain Monte Carlo (MCMC): the algorithm takes a random walk through the network, generating thousands of simulated outcomes. The frequency of each outcome approximates its true probability.

Three Classes of Questions

A completed Bayesian network can answer three classes of questions:

Marginal probability: P(X = x) — the probability of a single variable taking a specific value, regardless of others. For root nodes, read directly from the CPT; for other nodes, marginalize over all parent values.
Joint probability: P(X = x, Y = y, …) — the probability of multiple variables taking specific values simultaneously. Computed using the chain rule applied to the network structure: multiply the conditional probabilities of each variable given its parents.
Conditional probability: P(Y = y | X = x) — the probability of Y given observed evidence about X. Computed either via Bayes' rule or as the ratio of joint probabilities.

Exercise 7 — The Detective's Calculations

Data

🔪 Detective Bayes

Detective Bayes sits alone in the study, long after the others have gone to bed. The case file is open in front of him. The only sound is the rain and the scratch of his pen. Now, finally, he calculates.

The full network has the following probabilities:

P(M=t) = 0.3 P(O=t) = 0.4 P(A=t) = 0.5

M	O	P(G=t \| M,O)
t	t	0.9
t	f	0.5
f	t	0.2
f	f	0.01

G	P(N=t \| G)
t	0.85
f	0.10

G	A	P(W=t \| G,A)
t	t	0.95
t	f	0.30
f	t	0.10
f	f	0.02

Exercise 7 — The Detective's Calculations

Exercise

✏️ Exercise

a) Marginal probability — What is the overall prior probability that Mr. Graves is guilty, P(G=t)? (Marginalize over all combinations of M and O.)
b) Joint probability — What is the probability that Graves had motive, had opportunity, is guilty, and the weapon was found — assuming he had access to the poison? I.e., calculate P(M=t, O=t, G=t, A=t, W=t).
c) Conditional probability — Nervous behavior was observed by the maid (N=t). What is the probability that Graves is guilty given this evidence, P(G=t | N=t)?
d) Reflection — The weapon is also found (W=t), in addition to N=t. Without calculating, what do you expect happens to P(G=t | N=t, W=t) compared to part (c)? Why?

Solution: Exercise 7 — The Detective's Calculations

Solution

a) P(G=t):

P(G=t) = Σ_(m,o) P(G=t|M=m,O=o) · P(M=m) · P(O=o)
= 0.9·0.3·0.4 + 0.5·0.3·0.6 + 0.2·0.7·0.4 + 0.01·0.7·0.6
= 0.108 + 0.090 + 0.056 + 0.0042 = 0.2582

Mr. Graves has roughly a 26% prior probability of being guilty before any evidence is observed.

b) P(M=t, O=t, G=t, A=t, W=t): Using the chain rule:

= P(M=t) · P(O=t) · P(A=t) · P(G=t|M=t,O=t) · P(W=t|G=t,A=t)
= 0.3 · 0.4 · 0.5 · 0.9 · 0.95 = 0.0513

c) P(G=t | N=t):

P(G=t, N=t) = P(N=t|G=t) · P(G=t) = 0.85 · 0.2582 = 0.21947
P(N=t) = 0.85·0.2582 + 0.1·0.7418 = 0.21947 + 0.07418 = 0.29365
P(G=t|N=t) = 0.21947 / 0.29365 ≈ 0.747

The maid's testimony alone pushes the probability of guilt from 26% to 75%.

d) P(G=t | N=t, W=t) will be higher than 0.747. Both N and W are children of G — observing both "effects" of guilt simultaneously provides even stronger evidence for the cause. Finding the weapon on top of the nervous behavior makes it increasingly difficult to explain innocence through chance alone.

(The exact value: ≈ 0.96 — Detective Bayes closes his notebook.)

Closing

"The evidence speaks in probabilities, not certainties. But sometimes, the probabilities are loud enough." — Detective Bayes

// Q&A

Questions?

What remains unclear — about Bayesian networks, conditional independence, d-separation, the Markov blanket, or probabilistic inference?