Introduction to ML

🧠 Introduction to AI

Andy Weeger

Neu-Ulm University of Applied Sciences

April 15, 2025

Introduction

Characteristics

Learning agents are those that can improve their behavior through diligent study of past experiences and predictions of the future. Russel and Norvig (2022, 668)

At its core, a learning agent (LA):

  • Uses machine learning (ML) when it’s a computer system
  • Improves performance based on experience (observations)
  • Is necessary when designers lack complete knowledge of environments
  • Solves problems that are difficult to program explicitly (e.g., face recognition)

Definition

A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Mitchel (1997, 2)

Any ML project needs to clearly specify:

  • The task T (what problem are we solving?)
  • The experience E (what data will the system learn from?)
  • The performance measure P (how will we evaluate success?)

Exercise

Select one of the specific applications for ML below and define T, E, and P.

  1. Face recognition system
  2. Language translation service
  3. Credit card fraud detection

Why learning works

How can we be sure that our learned hypothesis will predict well for previously unseen inputs? I.e., how do we know that the hypothesis \(h\) is close to the target function \(f\) when \(f\) is unknown?

The underlying principle of computational learning theory is, that any hypothesis that is seriously wrong will almost certainly be “found out” with high probability after a small number of examples.

Thus, any hypothesis that is consistent with a sufficiently large set of training examples is unlikely to be seriously wrong: that is, it must be probably approximately correct (PAC).

ML x AI

ML constitutes one of the 4 categories of AI and is currently dominant in AI applications.

flowchart TD
    AI[Artificial Intelligence] --> ML[Machine Learning]
    AI --> SP[Search and Planning]
    AI --> KI[Knowledge and Inference]
    AI --> MU[Modeling of Uncertainty]
    
    ML --> SL[Supervised Learning]
    ML --> UL[Unsupervised Learning]
    ML --> RL[Reinforcement Learning]
    
    style AI fill:#000,stroke:#000,stroke-width:1px,color:#fff
    style ML fill:#0333ff,stroke:#0333ff,stroke-width:1px,color:#fff

LA architecture

Visualization

Figure 1: A learning agent based on Russel and Norvig (2022, 74)

Building blocks

Performance element: Processes percepts and chooses actions (relates to the basics of AI we have studied so far)

Learning element: Carries out improvements. Requires awareness and feedback on how the agent is doing in the environment

Critic: Evaluation of the agent’s behavior based on a given external behavioral measure (i.e., feedback)

Problem generator: Suggests explorative actions that lead the agent to new experiences

The learning element

The design of the learning element is influenced by four important aspects:

  • Which component of the performance element is to be improved?
  • What representation should be chosen?
    (i.e., model type)
  • What prior information is available?
    (i.e., prior knowledge that influences the model)
  • What form of feedback is available?

Supervised learning

Visualization

Figure 2: Training phase of supervised learning
Figure 3: Application phase of supervised learning

Key challenges

  • Getting enough labeled data
  • Ensuring labels are accurate
  • Dealing with imbalanced data classes
  • Feature selection and engineering

Practical applications

  • Sentiment analysis
  • Spam detection
  • Loan approval prediction
  • Image classification

Unsupervised learning

Visualization

Figure 4: Training phase of unsupervised learning
Figure 5: Application phase of unsupervised learning

Key challenges

  • Evaluating the quality of results without ground truth
  • Determining the optimal number of groups or patterns
  • Interpreting the discovered patterns meaningfully
  • Dealing with high-dimensional data

Practical applications

Computer vision — when shown millions of images, a computer vision system could identify large cluster of similar images (without “knowing” what is shown on these).

  • Customer segmentation
  • Anomaly detection
  • Topic modeling in text
  • Recommender systems
  • Image compression

Reinforcement learning

Visualization

Figure 6: Reinforcement learning

Key challenges

  • Designing appropriate reward functions
  • Balancing exploration vs. exploitation
  • Dealing with delayed rewards and credit assignment
  • Sample efficiency in real-world applications
  • Transferring learning across different environments

Practical applications

Game playing — imagine, it is told at the end of a game that it has won (a reward) or lost (a punishment). Based on that feedback, it has to decide which of the actions prior to the reinforcement were most responsible for it, and to alter its actions to aim towards more rewards in future.

  • Game playing (Chess, Go, video games)
  • Robotics and control systems
  • Resource management and scheduling
  • Recommendation systems

The learning process

Phases

Training-phase
Validation-phase
Test-phase
Operational phase

Model complexity

The Bias-variance tradeoff

In ML, selecting the appropriate model complexity is a fundamental challenge. Here this principle is demonstrated through polynomial curve fitting – one of the simplest yet most illustrative examples of the bias-variance tradeoff.

When building a machine learning model, we must balance two competing concerns:

  1. Bias: The error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss relevant relations between features and outputs (underfitting).
  2. Variance: The error from sensitivity to small fluctuations in the training set. High variance models can fit the training data very well but perform poorly on new, unseen data (overfitting).

Polynomial curve fitting example

In this visualization, we’ll see how polynomials of different degrees fit a dataset generated from a true quadratic function with some added noise:

Figure 7: Comparison of linear underfitting, quadratic good fit, and high-degree overfitting to noisy data generated from a quadratic function.

Underfitting

The linear model (blue line) is too simple to capture the underlying pattern. It has:

  • High bias: The model makes strong assumptions about the data structure (linearity)
  • Low variance: Different training sets would produce similar models
  • High training and test error: The model fails to capture the fundamental pattern

Good fit

The quadratic model (purple line) captures the underlying pattern well. It has:

  • Balanced bias and variance: The model makes appropriate assumptions
  • Low training and test error: Performs well on both seen and unseen data
  • Good generalization: Likely to predict new data points accurately

Overfitting

The high-degree polynomial (red line) fits training data closely but wiggles excessively. It has:

  • Low bias: Makes few assumptions about the data structure
  • High variance: Would change dramatically with different training sets
  • Low training error but high test error: Memorizes the training data rather than learning the pattern

The learning principle

This example illustrates Ockham’s razor in action:

The simplest model that adequately explains the data is likely to have the best predictive power. While we could create a complex polynomial that passes through every training point perfectly, such a model would likely perform poorly on new data.

Conclusion

  • Machine learning fundamentally changes how we approach problem-solving with computers
  • Instead of explicit programming, we design systems that learn from data and experience
  • The field combines statistics, optimization, and domain knowledge
  • Understanding the core concepts helps in developing effective learning systems
  • Key tradeoffs include:
    • Bias vs. variance
    • Model complexity vs. generalization
    • Exploration vs. exploitation
    • Accuracy vs. interpretability

Literature

Mitchel, T. 1997. Machine Learning (Mcgraw-Hill International Edit). McGraw-Hill Education. https://books.google.de/books?id=dMp2uwEACAAJ.
Russel, Stuart, and Peter Norvig. 2022. Artificial Intelligence: A Modern Approach. Harlow: Pearson Education.

Footnotes

  1. Examples for Goodhart’s Law in ML: A recommendation system optimized solely for clicks might discover that clickbait titles and thumbnails maximize this metric, even if the content quality suffers and user satisfaction decreases long-term. If a content filter is optimized only to minimize false negatives (letting harmful content through), it might become overly restrictive and block large amounts of legitimate content.

  2. To illustrate the curse of dimensionality: imagine a unit hypercube (with sides of length 1) in different dimensions. In 2D, it has area 1. In 3D, volume 1. In 100D, to capture just 1% of the hypercube’s volume, you’d need to extend 0.955 units along each dimension — meaning 99% of the volume is in the “corners.” This explains why data points become increasingly distant from each other and distance metrics become less useful as dimensions increase.] When implementing unsupervised methods, success depends on careful feature selection, appropriate distance metrics, and clear alignment with the underlying questions you’re trying to answer.

  3. Lift measures how much more likely the consequent (Y) is when the antecedent (X) is present, compared to when the antecedent is absent.

    \[\text{Lift}(X \rightarrow Y) = \frac{\text{Confidence}(X \rightarrow Y)}{\text{Support}(Y)}\]

    Where:

    \[\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \text{ and } Y)}{\text{Support}(X)}\] \[\text{Support}(Y) = \frac{\text{Count}(Y)}{\text{Total Transactions}}\]

    Interpretation:
    Lift > 1: Positive correlation (products appear together more than expected by chance)
    Lift = 1: No correlation (independence)
    Lift < 1: Negative correlation (products appear together less than expected by chance)

  4. Example for reward hackin:, a reinforcement learning agent tasked with playing a video game might discover an unintended bug that produces high scores without completing the actual objective. Rather than learning the intended gameplay strategy, it optimizes for exploiting this glitch.