Generative AI

🧠 Introduction to AI

Andy Weeger

Neu-Ulm University of Applied Sciences

April 13, 2024

Resources

In preparation for the lecture, you need to read Stephen Wolfram’s article on what ChatGPT is doing and why it works

If you have only limited understanding of what GenAI is, please go through: Geeks for Geeks articla on the basis of generative AI

If you want to do a deepdive, please consider working through Microsoft’s Artificial Intelligence for Beginners - A Curriculum

✏️ Exercises

Deep Learning

What are the two things you have newly learned about deep learning?

Key concepts

Identify, list and explain the key concepts discussed in the article in your own words (e.g., temperature).

Reflect on how these concepts contribute to ChatGPT’s functionality.

Loss function

Explain in your own words what a “loss function”, sometimes also called “cost function”, is.

How does the loss function change over the course of training a neural network?

Learning

Analyze following statements and determine if they are true or false. Justify your answer.

  1. It can be easier to solve more complicated problems with neural nets than simpler ones.
  2. Optimizing neural network training relies heavily on trial-and-error approaches. Researchers have gradually built a collection of effective techniques through experimentation.
  3. The field of neural network training is shifting away from building models entirely from scratch. Instead, researchers are increasingly using two main approaches: transfer learning1 and data augmentation2.
  4. In the early days of neural networks features have been discovered through the training process, which allowed the network to identify patterns that might not be obvious to human experts. In modern neural networks, features are often hand-crafted by domain experts.
  5. Features are not directly stored within individual neurons. Instead, the network’s ability to identify features emerges from the collective behavior of many neurons and their connections.
  6. One technique for neural network training is to iterate through the entire dataset multiple times, allowing the network to learn from each example.
  7. A common approach for generating training data for Large Language Models (LLMs) involves a technique called “guessing”. This method takes an existing piece of text, removes a portion of the ending (masking it out), and presents it to the LLM. The LLM is then tasked with predicting the guessed portion, essentially completing the text. By comparing its prediction to the original, unmasked text, the LLM learns the patterns and relationships within language.
  8. Even with powerful hardware like GPUs, training large neural networks can be inefficient. This is because current computer architectures often have a separation between memory (where data is stored) and processing units (like CPUs or GPUs) that limits how much data can be accessed simultaneously. This separation forces the network to process information in small chunks, with most of the network waiting for the relevant data to be fetched from memory. This can significantly slow down the training process.
  9. The ability of LLMs to handle tasks like writing essays challenges our understanding of computational difficulty. It seems that these tasks, while complex for computers in the past, may be computationally simpler than we initially believed.

Embeddings

  1. Form small groups of two to three students.
  2. Discuss the concept of embeddings within your group.
  3. Compile a list of 10 words relevant to a specific topic (e.g., technology, sports, food).
  4. Create a word-embedding that makes use of a four-dimensional space that captures the semantic relationship of the 10 words.
  5. Reflect on how word embeddings capture semantic relationships between words and how they contribute to language understanding in AI systems like ChatGPT.

Transformers

Form groups of two and do additional research on the architecture and building blocks of the most notable feature of technologies like GPT, so called transformers. Prepare to explain the concept to the group.

Good read: Medium — Transformer Architecture Simplified

Limitations

Consider what limitations you have perceived and/or heard about when using Large Language Models (LLM). Relate the limitations to the things you have learnt about how LLMs work and find explanations for these limitations. Prepare a short presentation about the most interesting limitation and the explanation you found.

Mega prompts

Research about “mega prompts” and create a mega prompt that turns ChatGPT into a research question creator coach that guides you through multiple steps in finding a good research question on a topic that raises your interest.

Create a research question using the coach and reflect if using GPT as a guide is a meaningful strategy.

Ethical concerns

Identify ethical concerns related to AI and language models, choose one ethical concern and discuss how it applies to ChatGPT.

Literature

Wolfram, Stephen. 2023. What Is ChatGPT Doing ... And Why Does It Work? Sanage Publishing House.

Footnotes

  1. Transfer learning involves incorporating the knowledge of a pre-trained network into a new model, allowing the new model to learn faster and achieve better performance.

  2. Data augmentation refers to the use of pre-trained networks to generate new training examples, expanding the available data and potentially improving the performance of the new model

  3. The output of the transformer encoder is a higher-dimensional representation of the entire input sequence. It captures not only the meaning of individual words but also the relationships and context between them. This representation is often much richer and more complex than a single embedding vector.