Table of Contents

Chapter 1 - Epic Introduction
Chapter 2 - Intelligent Agents
1. Agent
Chapter 3 - Solving Problems by Searching
Chapter 4 - Beyond Classical Search
Chapter 5 - Adversarial Search
1. MiniMax
2. Alpha-Beta Pruning
  1. Alpha
  2. Beta
  3. AI Critics
Chapter 6 - Constraint Satisfaction Problems
Chapter 7 - Logical Agents
1. Standard logical equivalence
  1. Resolution
Chapter 8 - First-Order Logic
1. Quantifiers
Chapter 9 - Inference in First-Order Logic
1. Skolemization
2. Conversion to CNF
Chapter 10 - Classical Planning
1. Stanford Research Institute Planning system (STRIPS)
  1. PDDL
2. Two basic approaches
  1. State-space planning
    1. Partial Order Planning
  2. Plan-space planning
    1. Planning graph:
Chapter 11 - Planning and Acting in the Real World
1. In Nondeterministic Domains
Chapter 12 - Knowledge Representation
1. Knowledge-based systems(KBS):
Chapter 22 - Natural Language Processing
Multiagent-Systems and Game Theory (not in book)
1. Game Theory
  1. Strategic normal form games

‹

TDT4136: Introduction to Artificial Intelligence

Tags:

"A computer would deserve to be called intelligent if it could deceive a human into believing that it was human." - Alan Turing

Chapter 1 - Epic Introduction

Since the dawn of time, humans have tried to define how we think, and this struggle has led us to create artificial intelligence. Historically, four approaches to artificial intelligence have been followed, each described below.

Acting Humanly

If we can't distinguish between a computer and a human, the computer is said to act humanly. The computer's capability to act humanly can be tested by performing a turing test. A computer passes the turing test if a human interrogator cannot tell whether he is communicating with a computer or a person. To pass a turing test, the computer would need to possess the following capabilities:

Natural language processing to enable it to communicate successfully.
Knowledge representation to store what it knows or hears.
Automated reasoning to use the stored information to draw conclusions.
Machine learning to adapt to new circumstances and to detect patterns.

Thinking humanly

To make a computer think like a human, we must know how humans think. The computer's ability to think humanly can be determined by comparing the computer's input-output mechanism by the corresponding human behaviour.

Acting Rationally

An agent is something that acts. A rational agent is an agent that does the right thing based on what it knows, its functions, and the surrounding environment; it acts so that it achieves the best expected outcome.

Thinking rationally

Using sound logic rules to reach the right conclusion.

A relevant quote, demonstrating the logical rule of modus ponens: "Socrates is a man; all men are mortal; therefore, Socrates is mortal."

Chapter 2 - Intelligent Agents

Agent

An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.

Agent function

Mathematically speaking, we say that an agent’s behaviour is described by the agent function that maps any given percept sequence to an action.

Rational agent

For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has.

Task environment

PEAS - Performance, Environment, Actuators, Sensors.

Properties of task environments

Fully Observable vs. Partially Observable

If an agent's sensors has access to the complete state of the environment, at all times, we say that the environment is fully observable. An environment can be partially observable because of noise or inaccurate sensors. It can also be partially observable when sensors only give information about parts of the environment. If an agent doesn't have sensors, we call the environment unobservable.

Single agent vs. multiagent

Single agent:: Example: Crossword, sudoku
Multi agent:: Example: Chess, ludo
Competitive multiagent environment:: E.g. in chess, where two players are playing against each other. One contestant tries to maximize his performance by minimizing the other's.
Cooperative multiagent environment:: When driving cars, drivers cooperate not to crash into each other.

Deterministic vs. Stochastic

Deterministic:: Any action has a single guaranteed effect. There is no uncertainty about the state that will result from performing an action.
Strategic:: If the environment is deterministic except from the actions of other agents, we say that the environment is strategic.
Stochastic:: There is some quantifiable uncertainty about the outcome of an action. "Action X has a 25% chance of leading to Y."
Non-Deterministic:: There is non-quantifiable uncertainty about the outcome of an action. "Action X may lead to Y."

Episodic vs. Sequential

Episodic:: The agent's experience is divided into atomic episodes. Each episode consists of the agent perceiving and then performing a single action. The episodes are independent. The choice of action in each episode depends only on the episode itself.
Sequential:: The current decision could affect all future decisions.

Static vs. Dynamic

Static:: Everything stays unchanged until the agent makes a change.
Dynamic:: The environment might change while the agent is considering what to do.
Semidynamic:: The environment itself does not change with the passage of time, but the agents performance score does.

Discrete vs. Continous

A discrete-state environment such as chess game has a finite number of distinct states. Chess has a discrete set of percepts and actions.

Taxi driving: Continuous state and continuous-time. The speed and location of the taxi and of the other vehicles sweep through a range of continuous values. Taxi driving actions are also continuous (steering angles, etc).

Known vs. Unknown

This isn't so much about the environment as the agent's (or the programmer's) knowledge about the laws of physics in the environment. In a known environment, the outcomes for all actions are given. In an unknown environment you can't predict all outcomes, even if you have a full overview of the environment. This means that the agent must learn from its actions, and how the environment reacts to them.

Various Agent Types

Simple reflex agent

The simplest kind of agent. Selects actions on the basis of the current percept, ignoring the rest of the percept history. The agent will work only if the correct decision can be made on the basis of only the current percept - that is, only if the environment is fully observable (if it's not fully observable, the agent may also make a decision, but it might not be an optimal one). Infinite loops are often unavoidable when operating in partially observable environments. This can sometimes be escaped if the agent randomizes its actions.

Model-based reflex agents

Maintain internal state to track aspects of the world that are not evident in the current percept. Two kinds of knowledge needs to be represented/coded in the agent program to update the internal state info:

How the world evolves independently of the agent (e.g., an overtaking car generally will be closer behind than it was a moment ago)
How the agent's own actions affect the world (e.g., when the agent turns the steering wheel clockwise, the car turns to the right or that after driving for five minutes northbound on the freeway one is usually about five miles north of where one was five minutes ago)

Goal-based agents

Works as model-based, but also take the agent's goals into account.

Utility-based agents

Works like the other agents, in addition to trying to maximize their expected “happiness”. Needs a utility function to determine what action leads to the highest performance measure.

Learning agents

Learns from their actions. A learning agent can be divided into four conceptual components:

Learning element: Responsible for making improvements.
Performance element: Responsible for selecting external actions (In the other agents, this is considered the whole agent).
Critic: Gives feedback to the learning element on how the agent is doing and determines how the performance element should be modified to do better in the future. It tells the agent how well it is doing with respect to a fixed performance standard.
Problem generator: Responsible for suggesting actions that will lead to new and informative experiences.

Chapter 3 - Solving Problems by Searching

Search: is a (preferably methodical) process for finding something.

Uninformed vs. informed:

Do points in the search space give information that helps the searcher to determine the next step?

Uninformed search strategies use only the information available in the problem definition. Some uninformed search strategies are:

Breadth-first search
Depth-first search
Uniform-cost search
Depth-limited search
Iterative deepening depth-first search
Bidirectional search

Informed search takes into account the information gained throughout the search, not only the information given in the problem definition. Some informed search strategies:

Best-first search
A-Star search

Partial vs. Complete solutions:

Could the current state of the search always be considered a complete solution? Or is it often a partial state that must be incrementally enhanced to become a solution? Searches with a complete solution is often called local search.

Single-state problem formulation:

A problem is defined by five components:

Initial state: The initial state that the agents starts in
Actions: A description of the possible actions available to the agent
Transition model: A description of what each action does
Goal test: Determines whether a given state is a goal state
Path cost: A function that assigns a numeric cost to each path

A solution is a sequence of actions leading from the initial state to the goal state.

Breadth-first search

Nodes that are waiting to be explored are placed in a FIFO queue. New successors go at the end.

Complete: Yes
Time: O($b^d$) - Horrible
Space: O($b^d$) - Keeps every node in memory
Optimal: Yes, if cost=1 per step. Not optimal in general

Uniform-cost search

Like breadth-first, but the queue is ordered by path cost, lowest first.

Depth-first search

Unexplored nodes are placed in a LIFO queue. New successors are put at the front.

Complete: No, fails in infinite-depth spaces, and fails with loops
Time: O($b^d$) - Horrible
Space: O(V) - Linear in space (Only has to keep track of one ancestors children)
Optimal: No

Depth-limited search:

Depth-first search with a depth limit. Does not go further than the depth limit.

Iterative deepening search

You do depth-limited search, but you may be doing several of them. You start with a limit of 1, and increase the limit with +1 for every time you do the search.

Complete: Yes
Time: O($b^d$)
Space: O(bd)
Optimal: Yes, if step cost = 1.

Iterative deepening search uses only linear space, and not much more time than other uninformed algorithms.

Best first search

Idea: use an evaluation function for each node – estimate of “desirability”.

Expand the most desirable unexpanded node. Implementation: nodes are put in a queue sorted in decreasing order of desirability

Special cases: greedy search, A* search

Greedy search

Evaluation function h(n) (heuristic) = estimate of cost from n to the closest goal. Greedy search always expands the node that appears to be closest to the goal.

Complete: No – can get stuck in loops. But is complete in ﬁnite space with repeated-state checking.
Time: O($b^d$) - But a good heuristic can give dramatic improvement.
Space: O($b^d$) - Keeps all nodes in memory.
Optimal: No

A*

A* is a form of best-first search where the idea is not to expand "expensive" paths.

A* evaluates nodes with: $f(n) = g(n) + h(n)$

$g(n)$	The cost of getting to the node.
$h(n)$	Estimated cost of getting from the node to the goal. (heuristic)
$f(n) = g(n) + h(n)$	Estimated total cost of passing through n to reach the goal.

$h(n)$ may never over estimate, i.e. $h(n) ≤ h^*(n)$ where $h^*(n)$ is the actual cost from n to the goal.

When A* traverses the graph, it follows the path of nodes that gives the lowest value of $g(n)+h(n)$.

A* starts by looking at the root node. It's added to a closed set (of nodes already traversed). All its child nodes are added to the list of open nodes – the open set. Each node maintain a list of parent nodes, and the root node is added as a parent to each of its child nodes. It's also marked as "best parent" for each of them, as it is also the only node that may be their parent.

To continue the search, we pick the node in the open set that has the lowest value for $f(n)$. We then check if the node is a solution to the problem. If it is not, the search continues, by first moving it from the open set to the closed set. Then we check all of its neighbors, and add them to the open set, if they are not there already. Our current node is added to their list of parent nodes.

If the node was in the list from before, we compare their existing $g(n)$ value, with the value they'll get from using our current node as their parent. If our current node turns out to give a lower value for $g(n)$, we update the node's "best parent". Then we recalculate $f(n)$ and $g(n)$ values for the node, as well as any nodes that have it listed as their parent.

Complete: Yes, unless there are inﬁnitely many nodes with f ≤ f (G)
Time: Exponential in [relative error in h × length of solution.]
Space: Keeps all nodes in memory
Optimal: Yes—cannot expand fi+1 until fi is ﬁnished

Why A* is optimal

Because the A* algorithm always expands the node with the lowest f(n) value, and the h(n) value never overestimates, it will never be possible to reach the goal with a lower cost than the cost of the node we're expanding. Therefore, the first node we expand, and turns out to be a solution, will also be the optimal solution.

Heuristic:

A heuristic ranks alternatives in a search algorithm based on the available information. Some properties of a good heuristic are:

Admissible	Never overestimates the cost of reaching the goal. This guarantees that we will find the minimum cost path.
Consistency	The cost of moving is higher than (or equal to) the reduction in the heuristic. The final estimated cost never decreases.

In grid-search, a good heuristic may be:

Manhattan distance - may only move back/forth and sideways.
Euclidean - The length one would measure with a ruler (straight line distance)

Dominance:

A heuristic (h1) dominates another (h2) if h1(n) ≥ h2(n) for all n.

As long as both heuristics are admissible, we can expect h1 to generate less nodes than h2, and lead to a more efficient search towards the target node. Because of this, it's generally better to choose a heuristic with higher values (provided it's admissible).

Relaxed problems

Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem

Key point: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem

Chapter 4 - Beyond Classical Search

Local Search

All nodes in local search are complete solutions. As the search progresses, solutions will become gradually better.

The path is unimportant; only the final state matters. Local search doesn't use heuristics, but a similar concept called objective functions. The objective function answers the question “How optimal are you, solution?”.

Hill climbing

The hill climbing algorithm is Greedy meaning it always moves to states with immediate beneﬁts. It is therefore quick in smooth landscapes, but easily gets stuck in local maxima in rough landscapes.

Stochastic hill climbing:: Chooses at random from among the uphill moves; the probability of selection can vary with the steepness of the uphill move.
First-choice hill climbing:: implements stochastic hill climbing by generating successors randomly until one is generated that is better than the current state. This is a good strategy when a state has many (e.g., thousands) of successors.
Random-restart hill climbing:: It conducts a series of hill-climbing searches from randomly generated initial states, until a goal is found. Is complete.

Simulated Annealing

Simulated annealing behaves like hill climbing, but is not greedy in the same sense. Simulated annealing uses a probability for not choosing the neighbor with immediate benefit - which is decreased every iteration. This probability decreases according to how bad the choice is, and as the temperature cools down.

Randomized search for a solution. The algorithm tries to avoid local maxima by allowing some bad solutions.

Simulated annealing starts with a node and an $F_{target}$ (defines how close to an optimal solution, we want our solution to be). Then it sets a temperature T (between 0 and 1).

The current state S is evaluated with the objective function, resulting in the value $F(S)$. If $F(S)$ is bigger than our $F_{target}$, the solution S is good enough.

If not:

Generate S's neighbors
For each neighbor, calculate its objective function
Pick the neighbor with the highest objective function score. $P_{max}$
Calculate: $q = \frac{F(P_{max}) - F(S)}{F(S)}$
$p = min(1, e^{-q/T})$, r = random number in range [0,1]
If r > p: set $S = P_{max}$ If not: Randomly pick another neighbor. S = <random neighbor>
T = T - dT
Check $F(S) \ge F_{target}$ (S is now a new node/state)

The lower the T, the fewer bad choices are allowed.

Local Beam Search: K parallel searches

A parallel search. Can jump to somebody else's neighbor. K parallel searches are not independent, since the best K neighbors chosen at each step, but some states may contribute 0 or 2+ neighbors to the next step.

Good search = efﬁcient distribution of resources (i.e., the K states).

Jiggle can be introduced via Stochastic Beam Search: pick some of the K neighbors randomly, with probabilities weighted by their evaluations.

Genetic Algorithms

A Genetic Algorithm (GA) is a variant of stochastic beam search in which the successor states are generated by combining two states instead of modifying one; an analogy to natural selection.

Genetic Algorithms begin with a set of k randomly generated states, represented by bit strings, called the population. Each state is rated by the fitness function, which returns higher values for better states. The successor states are then generated by combining two of the states to produce a child state. This is done by choosing a random crossover point x, where the first x bits are the corresponding bits from the first parent, and the rest are from the second parent. The new states are then subject to random mutations.

Stochastic Local Beam Search with (some of) the K neighbors generated by crossover.
Supports long jumps in search space, and combinations of the best of both parents.
Crossover more constructive than destructive...sometimes.

Chapter 5 - Adversarial Search

Classes of games:

Perfect info: state of playing arena and other players holdings known at all times.
Imperfect info: some info about arena or players not available.
Deterministic: outcome of any action is certain.
Stochastic: some actions have probabilities outcomes

MiniMax

Starting at a top node, each possible move is depicted down to a maximum depth, or a terminal node (win or lose position). For the leaf nodes at the bottom level, each node is assigned an evaluation function value. The maximum function assigns the values from the point of view of the players (called Max). The evaluation function must assign high values to win positions and low values to losing positions, and a value in between for non-terminal nodes. A draw is typically assigned a value of 0.

Then, all values are backed up so that a Max node (Max to move) gets the maximum of its daughter nodes (Min nodes), while each Min node gets the minimum value of its daughter nodes (Max nodes). It is only complete if the tree is finite, and it is only optimal against an optimal opponent (meaning the opponent always picks what is best for it self). The tree is explored as a depth-first search. Heuristic only used at the bottom.

Alpha-Beta Pruning

Alpha-Beta pruning is a way to make Min-Max search much more effective. It is a way of essentially saying “you don’t need to generate any more children, because there is no way we are going to take your path”. It is a method where a node is not evaluated further if it can be proved that the node will never influence the choice. This is done by assigning provisional values (alpha values at Max nodes, beta values at Min nodes) when a value is backed up from below. Only a max node can modify alpha, and only min nodes can modify beta.

The alpha and beta values are defined locally at different nodes. They can also be passed in from above. Pruning does not affect the final result but can improve the efficiency of the search

Alpha

A modifiable property of a MAX node. Indicates the lower bound on the evaluating of that MAX node. Updated whenever a larger evaluation is returned from a child MIN node. Used for pruning MIN nodes.

Beta

A modifiable property of a MIN node. Indicates the upper bound on the evaluation of that MIN node. Updated whenever a smaller evaluation is returned from a child MAX node. Used for pruning MAX nodes.

The values (alpha and beta) are used to see if further processing of a branch is necessary. For instance, if a Min node with beta value β is below a Max node with alpha value α, and β <= α then Max will never choose that node anyway, and any further analysis of the subnodes will not change that.

Alpha-beta pruning is order dependent.

AI Critics

Critic = a system that evaluates search states. Very similar to both heuristics and objective functions. Many AI applications involve standard AI search algorithms (A*, Minimax, etc.) plus problem-speciﬁc critics. Building the critic is the hard part. Actor = a system for mapping search states to actions. Actor-Critic systems use AI to both compute actions and evaluate states. In some versions, the best action is simply the one that leads to a state with the highest evaluation.

Chapter 6 - Constraint Satisfaction Problems

A Constraint satisfaction problem (CSP) is a type of state-search problem defined by:

a set of variables
a set of domains (legal values) for these variables
a set of constraint relations between the variables

A problem is solved when each variable has a value that satisfies all the constraints on the variable. A problem described this way is called a constraint satisfaction problem. Can be represented in a graph where nodes are variables and arcs show constraints, by doing this we can divide the problem into subproblems.

Common for all CSP solutions:

Initial state: the empty assignment, { }
Successor function: assign a value to an unassigned variable that does not conﬂict with current assignment. → fail if no legal assignments (not ﬁxable!)
Goal test: the current assignment is complete

This is common for all CSP algorithms. The intelligence comes from picking the right variables to assign values to, and then picking what value to give them.

Varieties of constraints:

Unary:: involve a single variable. Example: Variable A has to be 'green'
Binary:: involve pairs of variables. Example: Variable A has to differ from variable B
Higher-order:: 3 or more variables. Example: AllDiff – All variables need to have different values
Preferences:: “better than”-constraints - soft constraints. These constraints may be violated in a legal solution.

Consistency:

Node consistency:

A single variable is node-consistent if all the values in the variable's domain satisfy the variable's unary constraints. If variable A has the domain {1, 2, 3}, and there's a unary constraint saying variable A can't have even-numbered values, we can make it node-consistent by removing 2 from its domain.

We say that a network is node-consistent if every variable in the network is node-consistent.

Arc consistency:

A variable in CSP is arc-consistent if every value in its domain satisfies the variable's binary constraints. X is arc consistent with Y if for every value in the current domain $D_x$ there is some value in the domain $D_y$ that satisfies the binary constraint on the arc (X,Y). A network is arc-consistent if every variable is arc-consistent with every other variable.

We filter on constraints with binary constraints. We filter on one variable at a time. So we use a stack of constraints where we begin with the top constraint and work our way down.

The most popular algorithm for arc-consistency is called the AC-3. To make every variable arc-consistent, the AC-3 algorithms maintains a queue of arcs to consider. Initially the queue contains all the arcs in the CSP. AC-3 then pops off an arbitrary arc {X,Y} from the queue and makes X arc-consistent with Y. If this leaves $D_x$ unchanged, the algorithm just moves to the next arc. But if it revises $D_x$ (makes the domain smaller), then we add to the queue all arcs {Z,X} where Z is a neighbor of X. We need to do this because the change in $D_x$ might enable further reductions in the domains of $D_z$, even if we have previously considered Z. If $D_x$ is revised down to nothing, then we know the whole CSP has no consistent solution, and AC-3 can immediately return failure. Otherwise, we keep checking, trying to remove values from the domains of variables until no more arcs are in the queue. At that point, we are left with a CSP that is equivalent to the original CSP - they both have the same solutions - but the arc-consistent CSP will in most cases be faster to search because its variables have smaller domains.

Path consistency:

A two-variable set {X,Y} is path-consistent with respect to a third variable Z, if for every assignment {X=a, Y=b} consistent with the constraints on {X,Y}, there is an assignment to Z that satisfies constraints on {X,Z} and {Y,Z}.

Focus on two or more constraints at a time. So we filter with more than one constraint at a time (X>Z and Y<Z).

Forward checking

Forward checking means that each variable is assigned an apriori set of legal values, and that each assignment leads to a subsequent elimination of all other assignments that become impossible. For instance, whenever a variable X is assigned, for each variable Y that is connected to X by a constraint, delete from Y’s domain any value that is inconsistent with the value chosen for X

Backtracking Search for CSPs

The term backtracking search is used for a depth-first search that chooses values for one variable at a time and backtracks when a variable has no legal values left to assign. Keeping track on the variables domains to make it easier to detect fail. So when placing a variable on the board all other variables domains must be checked.

The most prominent strategies for solving CSP with backtracking search are:

Minimum Remaining Value (MRV) Heuristics: Select the variable with the fewest remaining legal values. Variables with the smallest domain.
Degree heuristics: Select the variable which is involved in the largest number of constraints. (It's a good choice when MRV gives a tie)
Least constraining value: Prefer the value that rules out the fewest choices of for the remaining variables in the constraint graph.
Forward checking: Keep track of remaining legal values for unassigned variables. Terminates when any variable has no legal values.

Local Search for CSPs

When the algorithm starts, the initial state is filled out (e.g. the board is populated with queens) and the search changes one variable at a time (i.e. one queen moves at a time). The point of the algorithm is to eliminate the constraints that are currently violated. Which variable to change is picked at random, but which change is made is determined by the heuristic. The heuristic chooses the move that will violate the fewest constraints (min-conflicts), from the current point of view.

Chapter 7 - Logical Agents

Knowledge base: set of sentences in a formal language.

Logics: Formal languages to represent information so that conclusions might be made.

Syntax: Defines how sentences are constructed to make sense. You can say "plane a Hannah is in sitting", but it doesn't make sense according to the English syntax rules. If the sentence is changed to "Hannah is sitting in a plane", however, the syntax is valid.

Semantics: Defines the meaning of the sentence.

World: A possible world (model) is a complete set of the truth values of all the logical variables.

Entailment: Means that one thing follows from another.

KB |= a if and only if $M(KB) \subseteq M(a)$

KB “entails” a if KB is a subset of a. That is, if for every world where KB is true, a is also true.

Satisfiable: A sentence is satisfiable if it is true in some model.

Valid: A sentence is valid if it is true for all models.

Number of models: The number of models is the number of models that are true.

Horn Clauses: At most one variable in a logical sentence is unnegated (positive).

Definite Clauses: Exactly one variable in a logical sentence is unnegated (positive).

Goal Clauses: No variable in a logical sentence is unnegated (positive).

Standard logical equivalence

$(\alpha \wedge \beta) \equiv (\beta \wedge \alpha)$	Commutativity of $\wedge$
$(\alpha \vee \beta) \equiv (\beta \vee \alpha)$	Commutativity of $\vee$
$((\alpha \wedge \beta) \wedge \gamma) \equiv (\alpha \wedge (\beta \wedge \gamma))$	Associativity of $\wedge$
$((\alpha \vee \beta) \vee \gamma) \equiv (\alpha \vee (\beta \vee \gamma))$	Associativity of $\vee$
$\neg(\neg \alpha) \equiv \alpha$	Double-negation elimination
$(\alpha \Rightarrow \beta) \equiv (\neg \beta \Rightarrow \neg \alpha)$	Contraposition
$(\alpha \Rightarrow \beta) \equiv (\neg \alpha \vee \beta)$	Implication elimination
$(\alpha \Leftrightarrow \beta) \equiv ((\alpha \Rightarrow \beta) \wedge (\beta \Rightarrow \alpha))$	Biconditional elimination
$\neg (\alpha \wedge \beta) \equiv (\neg \alpha \vee \neg \beta)$	De Morgan
$\neg (\alpha \vee \beta) \equiv (\neg \alpha \wedge \neg \beta)$	De Morgan
$(\alpha \wedge (\beta \vee \gamma)) \equiv ((\alpha \wedge \beta) \vee (\alpha \wedge \gamma))$	Distributivity of $\wedge$ over $\vee$
$(\alpha \vee (\beta \wedge \gamma)) \equiv ((\alpha \vee \beta) \wedge (\alpha \vee \gamma))$	Distributivity of $\vee$ over $\wedge$

CNF - Conjunction Normal Form

Rewrite logical expressions to a product of sums. In other words, it is an AND of ORs. Example: $(A \vee B) \wedge (\neg B \vee C \vee \neg D) \wedge (D \vee \neg E)$

Resolution

Cancel out a variable with negated variable. To do resolution first you have to get all your facts onto CNF and then take the negation of what you want to prove and cancel out facts based on what you know. When you can cancel out what you want to prove with what you can derive from what you know then you have solved the problem. It may happen that you derive facts that may be irrelevant to what you are trying to prove.

Chapter 8 - First-Order Logic

The world in first order logic contains:

Objects: people, houses, numbers...
Relations: red, round, prime...
Functions: father of, best friend...

$$ \forall x \neg P = \neg \exists x P$$

$$ \exists x \neg P = \neg \forall x P$$

"Two plus two equals four minus one"

Objects: one, two, three, four, one plus two, four minus one
Relation: equals
Function: plus, minus

In this example, "two plus two" and "four" are two different names for the same object.

The equals relation can be written as

$$ \{ \langle \text{one plus two, three} \rangle \} $$

The plus function can be expressed as plus(one, two). Note that this is just a way to denote a function symbol, and note some kind of subroutine that takes two inputs and return an answer.

Quantifiers

Universal Quantification(∀): For all
Existential Quantification(∃): There is at least one

Chapter 9 - Inference in First-Order Logic

Unification: Matching a free variable with a constant. θ = {x/John}

Forward chaining: Start with base facts and try to find out which facts fires which rule. Stop when you have reached the goal.

Backward chaining: Start with the goal and work your way backward. Apply rule to the goal and use unification to bind variables to prove the goal. As we apply a rule we get subproblems and try to solve these. If we can solve the subproblems we can solve the whole problem. (Basically it is a depth-first search).

Skolemization

Skolemization is the way of removing existential quantifiers by introducing new function symbols.

How: For each existentially quantified variable introduce a n-place function where n is the number of previously appearing universal quantifiers. Special case: introducing constants (trivial functions: no previous universal quantifier).

$\exists x \ P(x) \ \text{becomes} \ P(g)$

$\exists x \ \forall y \ P(x,y) \ \text{becomes} \ \forall y \ P(g,y)$

$\boldsymbol{\forall y} \ \exists x \ P(x,y) \ \text{becomes} \ \forall y \ P(g(\boldsymbol{y}),y) $

$\boldsymbol{\forall y} \ \boldsymbol{\forall z} \ \exists x \ P(x,y,z) \ \text{becomes} \ \forall y \ \forall z \ P(g(\boldsymbol{y},\boldsymbol{z}),y,z) $

$\boldsymbol{\forall y} \ \exists x \ \forall z \ P(x,y,z) \ \text{becomes} \ \forall y \ \forall z \ P(g(\boldsymbol{y}),y,z) $

Conversion to CNF

Eliminate biconditionals and implications (⇒ and ⇔)
Move ¬ inwards
Standardized variables: each quantifier should use a different one
Each existential variable is replaced by a Skolem function of the enclosing universally quantified variables. (F(x), G(x)...)
Drop universal quantifiers
Use laws to get it on CNF form
forward chaining in propositional logic (start at facts and apply rules)
backward chaining in propositional logic (start at goal and make subgoals)
resolution in propositional logic (negate goal and combine facts with rules)
forward chaining in first-order logic (same but with bindings)
backward chaining in first-order logic (same but with bindings)
resolution in first-order logic (same but with bindings)

Chapter 10 - Classical Planning

Plan: A plan is a collection of actions for performing some task.

Stanford Research Institute Planning system (STRIPS)

Domain: a set of typed objects; usually represented as propositions
States: are represented as first-order predicates over objects
Closed-world assumption: everything not stated is false; the only objects in the world are the ones defined
Operators/Actions: defined in terms of preconditions: when can the action be applied? Effects: what happens after the action

Effects are represented in terms of:

Add-list:: list of propositions that become true after action
Delete-list:: list of propositions that become false after action
Goals:: conjunction of literals

PDDL

PDDL (Planning Domain Definition Language) describes the four things we need to define a search problem: the initial state, the actions that are available in a state, the result of applying an action, and the goal test. PDDL accepts negative literals in preconditions and goals.

Two basic approaches

State-space planning:: works at the level of states and actions, Finding a plan is formulated as search through state space. Most similar to constructive search.
Plan-space planning:: works at the level of plans. Finding a plan is formulated as search through space of plans. Start with partial incorrect plan, then apply changes to correct it. Most similar to iterative improvement

State-space planning

Progression planners：: reason from the start state, trying to ﬁnd the actions that can be applied (match preconditions).
Regression planners：: reason from the goal state, trying to ﬁnd the actions that will lead to the start state (match eﬀects).

Partial Order Planning

Search in plan space and use least commitment whenever possible. Least commitment: only make choices that are relevant to solving the current part of the problem

Plan-space planning

Planning graph:

Is a directed graph organized into levels: S0 represents the first level which is also the initial state of the graph. The initial state consists of node representing each fluent that holds in S0. The next level A0 consists of nodes representing each ground action that might be applicable to S0. Then alternating levels Si followed by Ai until we reach a termination condition. A persistence action take place if no actions negates it (small empty boxs in the graph). Mutual exclusion links are drawn between states when they are opposite of each other or when the states can not appear together regardless of the choice of action. Meaning that only one of them could be the result. If we get to a point in our graph where two consecutive levels are identical, then the graph has leveled off.

A mutex for actions holds when:

Inconsistent effects: one action negates an effect of the other.
Interference: one of the effects of one action is the negation of a precondition of the other.
Competing needs: one of the preconditions of one action is mutually exclusive with a precondition of the other

A mutex for states holds when:

Negation: One state is the negation of another.
Unreachable: All actions leading to the states are mutex.

When a solution exists

For there to exist a plan in the planning graph, the goal states must not be in mutex with each other at the last level. This is not an existence guarantee for a valid solution however, as any states or actions along the path from the start to the goal states must also not be in any mutex relation with each other.

How to find a solution

To extract a solution, start at the level where the goal states are not in mutex with each other. Then search backwards (state search) to find a path where actions are not in mutex with each other. If one is able to arrive at the start state S0, a valid plan exist and that path can be extracted.

Chapter 11 - Planning and Acting in the Real World

As of H2019 this i s not part of the syllabus

Critical path method: the path whose total duration is longest.

In Nondeterministic Domains

Extends planning to cover nondeterministic environments (where outcomes of actions are uncertain).

Methods for planning that handle partially observable, nondeterministic and unknown environments include:

Sensorless planning	Environments with no observations
Contingency planning	Partially observable and nondeterministic environments
Online planning and replanning	Unknown environments

In sensorless and partially observable planning, we have to switch from the closed-world assumption to the open-world assumption in which states that are not mentioned have an unknown value (instead of false).

Chapter 12 - Knowledge Representation

Knowledge-based systems(KBS):

Is a model of something in the real world (outside the agent). There are 3 main players of modeling:

Knowledge engineer	designs, builds and tests the “expert system”
Domain expert	possesses the skill and knowledge to find a solution
End User	the final system should meet the needs of the end user.

The KB represent the knowledge using a KB language, which is a system for encoding knowledge. The inference engine has the ability to find implicit knowledge by reasoning over the explicit knowledge. Decides what kind of conclusions can be drawn.

Declarative knowledge	Expressed in declarative sentences or indicative propositions. (knowinf of).
Procedural knowledge	Knowledge exercised in the performance of some task (shopping list).
Domain knowledge	What we reason about
Strategic knowledge	How we reason

5 roles of knowledge representation:

Surrogate: A representation is a substitute for direct interaction with the world.
All representations are approximations to reality and they are invariably imperfect
Fragmentary theory of Intelligent Reasoning
Is a medium for efficient computation
Is a medium of human expression

Rule based systems (early KBs expert systems)

Working memory: contains facts about the world, observed directly or derived from a rule.
Rule base: contains rules, where each rule is a step in a problem solving process (if-then format)
Interpreter: match rules and the current contents of the working memory.

KR languages:

Syntactic: possible allowed constructions.
Semantic: What the representation mean, mapping from sentence to world.
Inferential: Interpreter, what conclusions can be drawn.

Requirements:

Representation adequacy: Should allow for representing all the required knowledge.
Inferential adequacy: Should allow inferring new knowledge.
Inferential efficiency: Inferences should be efficient.
Clear syntax and semantics.
Naturalness: Easy to read and use.

Semantic networks:

A semantic net is a structure that shows the relation between concepts, instances and values. The concepts can be categories in sets of entities ("bunches"). The relations can be memberships, subclasses and attributes or others.

Semantic networks store our knowledge in the form of a graph, with nodes representing objects in the world, and arcs representing relationships between those objects. It uses the term inheritance which is used when an object belong to a class and hence inherits all the properties of that class.

Semantic networks can be translated to frames. Nodes then become frame names, links become slot and the node at the other end becomes the slot value. Frames is a collection of attributes (slots) and associated values and constraints on those values. (F stands for false, and T stands for True in the image)

Inheritance in semantic networks

Inheritance of properties are done following the principles:

If a category has an attribute, then all subcategories have that attribute.
If a category has a property (e.g. attribute and value), then all subcategories and instances have that property.

The exception is when a semantic entity inherits a property or attribute from several superclasses. Then the following applies:

In a hierarchy (one parent for each semantic entity), it is the closest property that is valid.
In a heterarchy (different parents along different paths), categories blocked by interfering inheritance are excluded from inheritance.

Chapter 22 - Natural Language Processing

as of H2019 this is not part of the syllabus. It is replaced by "AI and Ethics"

Knowledge aquisition & language models

Language models

A language model is a model that predicts the probability distribution of language expressions. One of the simplest language models you can make is the probability distribution over a sequence of characters.

N-gram char models

Probability that some sequence of n charachters will appear in a given document.

This can be extended to sentences, and then you have markov chains. Bi-gram = markov chain of length 2 etc etc.

Def: Corpus is a body

Language identification

You take a given word and compute the probability that it will appear in each language -> select maximum

Smoothing n-gram models

Words not occuring in a language model will receive a low probability. But a language model could not always guarantee that all the words in the language is there. (Slang, languages evolving etc.) The process of adjusting the probability of low-frequency counts is called smoothing.

There are different ways of doing this smoothing.

Laplace smoothing: If a random Boolean variable $X$ has been false in all n observations so far then the estimate for $P (X = true )$ should be $\frac{1}{(n + 2)}$ (Performs poorly)
Backof model: The backoff model, in which we start by estimating n-gram counts, but for any particular sequence that has a low (or zero) count, we back off to (n − 1)-grams
Linear interpolation smoothing: Linear interpolation smoothing is a backoff model that combines trigram, bigram, and unigram models by linear interpolation.

Model evaluation

Use training sets to choose best model for current case. Use perplexity (roughly speaking average branching factor)

N-gram word models

Char model extended for words. Solve out of vocabulary using wildcard words when no match found.

Text classification

This is known as categorization. Use cases => Spam detection. Ex: Categorize text as spam or some other thing based on probabilities..

Use test sets to tune algorithm.

Information retrieval

Characterized by:

A corpus of documents
Queries posed in a query language
A result set
A presentation of the result set

Models and IR Scoring functions:

Boolean keyword model. Count matches. Return with most matches.
BM25 scoring function. For each word multiply the inverse frequency of the word in all documents with the frequency in the current document. Create index for each vocabulary word for faster lookup

Evaluation:

Precision measures the proportion of documents in the result set that are actually relevant.
Recall measures the proportion of all the relevant documents in the collection that are in the result set.
$F_1$ score is the summary of both precision (P) and recall (R). That is the harmonic mean: $\frac{2PR}{P + R}$

Refinements:

Case folding
Stemming (reduce to similar words)
Synonyms
Metadata

The HITS algorithm

Find pages relevant to query. Get all pages in the link neighbourhood. A page is considered an authority if other pages point to it, and a hub if it points to others. Iterate ground up and build list over authority and hubs. Normalize.

Question answering

Requires natural language processing.

Information Extraction.

Return useful information from documents.

We can use

Finite state automata (Regexes).
Relational extraction => Implement with cascaded finite-state transducers.
Process => Tokenize, match on word types (Basic groups) => Combine to complex phrases

So we introduce hidden markov. Tailored for specific queries.

Can have HMM for e.g. speaker of a sentence etc etc. Uses Bayesian propabilities. Train them on example datasets.
Improve with linear weights for last n segments of the HMM. Can reward specific words.

Ontology from large corpa: Use regexes with word-classes instead of variables. Example: NP such as NP (,NP)*(,)? ((and | or)NP)?

Automated template Construction:

Given a set of initial templates we can derive additional similar facts from some corpus.
If we combine many of these we can get a machine to learn properly. One template for each word type. Extract data from them in turn.

Summary:

PLM based on n-grams are pretty decent
Remember to preprocess due to noise et al.
Text classification can be done with Bayes n-gram.
Information retrieval uses simple language models based on bags of words, pretty decent.
Question answering can be handled by information retrieval. Bigger corpus, better results
Information extraction => Complex models with syntax etc.

Multiagent-Systems and Game Theory (not in book)

TODO: Add theory for this part of the curriculum.

Game Theory

Game theory studies the decisions of agents in a multiagent environment where the actions of each agent have an effect in the environment. The outcome of an action may depend on the the other agents' actions which leads to "strategical decision" making. As the agent is uncertain of what the other agents will do, it will try to predict the other's decisions, compute how they affect him, and in turn make its own decision based on the results.

There are different types of games in Classical Game Theory:

Sequential games
- Example: Chess
Repeated games
Strategic/simultaneous normal form games

Strategic normal form games

Strategic normal form games are "one-shot" games where all the agents choose only once and at the same time. The main issue of this game is to predict what the other agents will play. A strategic normal game is defined as:

More than one agent (N>1).
Each agent is rational, and each agent knows that the other agents are rational.
Each agent has a set A_i which contains all possible actions of the agent.
The set of outcomes Ω contains all outcomes ω. Where ω is an outcome where all agents have performed an action.

If an action gives the best outcome of all, independent of what the other agents do, it is a strictly dominant strategy. The outcomes (or states) of these games can be represented in a payoff matrix.

Written by

sile Stian Jensen sigveseb Assios CharlieTang aleksanb anna ingsollid duvholt runholm mariofrans Skarding forbord bujordet enstulen CarstenA jonasft hanne.olssen emre Jon Eide Johnsen Martin Hallén Vages iverjo eivindre hanskhe pbsds mortengk bvx89 bragelyt nina kasperrt aaursand sofiesandvik paal Ose edvardvb johakn erikskaar odin audunskj MatsMoll stianste vegard_hellem mjlading nikzy kjetiaun

Last updated: Fri, 28 Nov 2025 07:24:55 +0100 .