TDT4171: Artificial Intelligence Methods
$$
%* Kalman filter
% * Assumptions
% * Usage
% * Generally how to calculate
%* Strong/weak AI
%* Agents
% * Rationality
%* Bayesian networks
% * Syntax and semantics
% * Uses and problems
% * Conditional independence
%* Decision networks/Influence diagrams
%* Reasoning
% * Case-based reasoning
% * Instance-based reasoning
%* Markov processes
% * Markov assumptiom
%* Neural networks
%* Decision trees
$$
## Intelligent Agents and rationality
(See TDT4136 for more details)
An intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals (wikipedia). Part of this subject focus on the rationality of agents.
# Bayesian Networks
A Bayesian network is a model of a set of random variables and their conditional dependencies through a directed acyclic graph. In the graph, the nodes represent the random variables and the edges conditional dependencies.
Much of the calculation in a bayesian network is based on Bayes' theorem: $ P(A|B)=\frac{P(B|A)P(A)}{P(B)} $. $P(B)$ is the total probability: $P(B)=\sum_{i=1}^{n} P(B|A_i)P(A_i)$.
## Bayesian networks in use
* Good: Reason with uncertain knowledge. "Easy" to set up with domain experts (medicine, diagnostics).
* Worse: Many dependencies to set up for most meaningful networks.
* Ugly: Non-discrete models
Bad at for example image analysis, ginormous networks.
## Influence Diagram/Decision Network
An influence diagram, or a decision network, is a graphical representation of a decision situation. It is a directed, asyclig graph (DAG), just like the Bayesian network (the Bayesian network and the influence diagram are closely related), and it contains of three types of nodes and three types of arcs between the nodes.
### Influence Diagram Nodes
- Decision nodes (drawn as a rectangle) are corresponding to each decision to be made.
- Uncertainty nodes (drawn as an oval) are corresponding to each uncertainty to be modeled.
- Value nodes (drawn as an octagon or diamond) are corresponding to an utility function, or the resulting scenario.
### Influence Diagram Arcs
- Functional arcs are ending in the value nodes.
- Conditional arcs are ending in the decision nodes.
- Informational arcs are ending in the uncertainty nodes.
### Evaluating decision networks
The algorithm for evaluating decision networks is a straightfoward extension of the Bayesian Network Algorithm. Actions are selected by evaluating the decision network for each possible setting of the decision node. Once, the decision node is set, it behaves like a chance node that has been set as an evidence variable.
The algorithm is as follows:
1. Set the evidence variables for the current state.
94. For each possible value of the decision node:
53. Set the decision node to that value
666. Calculate the posterior probabilities for the parent nodes of the utility node, using a standard probabilistic inference algorithm.
12. Calculate the resulting utility for the action.
43. Return the action with the highest utility.
# Markov Chains
A Markov chain is a random (and generally discrete) process where the next state only depends on the current state, and not previous states. There also exists higher-order markov chains, where the next state will depend on the previous n states. Note that any k'th-order Markov process can be expressed as a first order Markov process.
In a Markov chain, you typically have observable variables which tell us something about something we want to know, but can not observe directly.
## Operations on Markov Chains
### Filtering
Calculating the unobservable variables based on the evidence (the observable variables).
### Prediction
Trying to predict how the variables will behave in the future based on the current evidence.
### Smoothing
Estimate past states including evidence from later states.
### Most likely explanation
Find the most likely set of states.
## Bernoulli scheme
A special case of Markov chains where the next state is independent of the current state.
# Learning
## Decision trees
Decision trees are a way of making decision when the different cases are describable by attribute-value pairs. The target function should be discretely valued. Decision trees may also perform well with noisy training data.
A decision tree is simply a tree where each internal node represents an attribute, and it's edges represent the different values that attribute may hold. In order to reach a decision, you follow the tree downwards the appropriate values until you reach a leaf node, in which case you have reached the decision.
### Building decision trees
The general procedure for building decision is a simple recursive algorithm, where you select an attribute to split the training data on, and then continue until all the data is classified/there are no more attributes to split on. How big the decision tree is depends on how you select which attribute you split on.
For an optimal solution, you want to split on the attribute which holds the most information, the attribute which will reduce the entropy the most.
#### Overfitting
With training sets that are large, you may end up incorporating the noise from the training data which increases the error rate of the decision tree. A way of avoiding this is to calculate how good the tree is at classifying data while removing every node (including the nodes below it), and then greedily remove the one that increases the accuracy the most.
# Case-based reasoning
Case-based reasoning assumes that similar problems have similar solutions and that what works today still will work tomorrow. It uses this as a framework to learn and deduce what to do in both known and unknown circumstances.
## Main components
* Case base
* Previous cases
* A method for retrieving relevant cases with solutions
* Method for adapting to current case
* Method for learning from solved problem
## Approaches
### Instance-based
*Answer:* Describe the four main steps of the case-based reasoning (CBR) cycle. What
is the difference between “instance-based reasoning” and “typical CBR”?
An approach based on classical machine learning for *classification* tasks. Attribute-value pairs and a similarity metric. Focus on automated learning without intervention, requires large knowledge base.
### Memory-based reasoning
As instance-based, only massively parallel(?), finds distance to every case in the database.
### (Classic) Case-based reasoning
Motivated by psychological research. Uses background-knowledge to specialise systems. Is able to adapt similar cases on a larger scale than instance-based.
### Analogy-based
Similar to case-based, but with emphasis on solving cross-domain cases, reusing knowledge from other domains.