This is an old version of the compendium, written May 17, 2014, 1:36 p.m. Changes made in this revision were made by trmd. View rendered version.

TDT4171: Artificial Intelligence Methods

$$ %* Kalman filter % * Assumptions % * Usage % * Generally how to calculate %* Strong/weak AI %* Agents % * Rationality %* Bayesian networks % * Syntax and semantics % * Uses and problems % * Conditional independence %* Decision networks/Influence diagrams %* Reasoning % * Case-based reasoning % * Instance-based reasoning %* Markov processes % * Markov assumptiom %* Neural networks %* Decision trees $$ # Artificial Intelligence ## Intelligent Agents and rationality (See TDT4136 for more details) An intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals (wikipedia). Part of this subject focus on the rationality of agents. An agent can be said to be rational if it has clear preferences for what goals it want to achieve, and always act in ways that will eventually lead to an expected optimal outcome among all feasible actions. ## Philosophical foundations ### Strong AI When AI research first became popular it was driven by a desire to create an artificial intelligence that could perform as well, or better, than a human brain in every relevant aspect. May it be a question of intellectual capacity, a display of emotion or even compassion. This is what we refer to as __Strong AI__. It can more concisely be described as _Can machines really think?_, equating the epitomy of AI intelligence with human or superhuman intelligence.

The early view was that a strong AI had to actually be consicous as opposed to just acting like it was conscious. Today most AI researchers no longer make that distinction. A problem arising from holding the view that strong AI has to be conscious is describing what being conscious actually means. Many philosophers attribute human consciousness to human's inherent biological nature (e.g. John Searle), by asserting that humans neurons are more than just part of a "machinery".

Some key words in relation to Strong AI are: * __Mind-body problem__ The question of whether the _mind_ and the _body_ are two distincs entities. The view that the mind and the body exist in separate "realms" is called a __dualist__ view. The opposite is called __monist__ (also referred to as __physicalism__), proponents of which assert that the mind and its mental states are nothing more than a special category of _physical states_. * __Functionalism__ is a theory stating that a mental state is just an intermediary state between __input__ and __output__. This theory strongly opposes any glorification of human consciousness; asserting that any _isomorphic_ systems(i.e. system which will produce similar outputs for a certain input) are for all intents and purposes equivalent and has the same mental states. Note that this not does not mean their "implementation" is equivalent. * __Biological naturalism__ is in many ways the opposing view of functionalism, created by John Searle. It asserts that the neurons of the human brain possesses certain (unspecified) features that generates consciousness. One of the main philosophical arguments for this view is __the Chinese room__; a system consisting of a human equipped with a chinese dictionary, who receives inputs in chinese, translates and performs the required actions to output outputs in chinese. The point here is that the human inside the room _does not understand chinese_, and as such the argument is that running the program does not necessarily _induce understanding_. A seemingly large challenge to biological naturalism is explaining how the features of the neurons came to be, as their inception seem unlikely in the face of _evolution by natural selection_. ### Weak AI A __weak AI__ differs from a strong AI by not actually being intelligent, _just pretending to be_. In other words, we ignore all of the philosophical ramblings above, and simply request an agent that performs a given task as if it was intelligent. Although weak AI seems like less of a stretch than strong AI, certain objections have been put forth to the question whether one can really make an agent that _acts fully intelligently_:

* __An agent can never do Xx, Yy, Zz__. This argument postulates that there will always be some task that an agent can never simulate accurately. Typical examples are _being kind, show compassion, enjoy food_, make a simple mistakes_. Most of these points are plainly not true; and this argument can be considered _debunked_. * __An agent is a formal system, being limited by e.g. Gödel's incompleteness system__. This argument states that since machines are formal systems, limited by certain theorems, they are forever inferior to human minds. We're not gonna go deep into this, but the main counterarguments are that: _m * Machines are not inifinite turing machines, and the theorems does not apply_" * It is not particularly relevant, ~~_it is not particularly rele~~as their limitations doesn't hav~~ant, as their limitations doesn't have any practical impact_, _h~~e any practical impact. * Humans are, given functionalism, also vulnerable to these arguments_.

* __The, sometimes subconscious, behaviour of humans are far too complex to captured by a set of logical rules__. This seems like a strong argument, as a lot of human processing occurs subconsciously (ask any chess master, *cough Asbjørn*, for instance). However, it seems infeasible that this type of processing can not be encompassed by logical rules themselves. Saying something along the lines of: _"When Asbjørn processes a chess position he doesn't consciously evaluate the position, he merely sees the correct move"_, seems to simply be avoiding the question. The processing in question here might not occur in the consciousness * Finally, a problem with weak AI seems to be that there are currently no solid way of incorporating _current knowledge, or common background knowledge_ with decision making. In the field of neural networks, this point is something deserving future attention and research according to (Norvig, Russel). # Bayesian Networks A Bayesian network is a model of a set of random variables and their conditional dependencies through a directed acyclic graph. In the graph, the nodes represent the random variables and the edges conditional dependencies. Much of the calculation in a bayesian network is based on Bayes' theorem: $ P(A|B)=\frac{P(B|A)P(A)}{P(B)} $. $P(B)$ is the total probability: $P(B)=\sum_{i=1}^{n} P(B|A_i)P(A_i)$. ## Bayesian networks in use * Good: Reason with uncertain knowledge. "Easy" to set up with domain experts (medicine, diagnostics). * Worse: Many dependencies to set up for most meaningful networks. * Ugly: Non-discrete models Bad at for example image analysis, ginormous networks. ## Influence Diagram/Decision Network An influence diagram, or a decision network, is a graphical representation of a decision situation. It is a directed, asyclig graph (DAG), just like the Bayesian network (the Bayesian network and the influence diagram are closely related), and it contains of three types of nodes and three types of arcs between the nodes. ### Influence Diagram Nodes - Decision nodes (drawn as a rectangle) are corresponding to each decision to be made. - Uncertainty nodes (drawn as an oval) are corresponding to each uncertainty to be modeled. - Value nodes (drawn as an octagon or diamond) are corresponding to an utility function, or the resulting scenario. ### Influence Diagram Arcs - Functional arcs are ending in the value nodes. - Conditional arcs are ending in the decision nodes. - Informational arcs are ending in the uncertainty nodes. ### Evaluating decision networks The algorithm for evaluating decision networks is a straightfoward extension of the Bayesian Network Algorithm. Actions are selected by evaluating the decision network for each possible setting of the decision node. Once, the decision node is set, it behaves like a chance node that has been set as an evidence variable. The algorithm is as follows: 1. Set the evidence variables for the current state. 94. For each possible value of the decision node: 53. Set the decision node to that value 666. Calculate the posterior probabilities for the parent nodes of the utility node, using a standard probabilistic inference algorithm. 12. Calculate the resulting utility for the action. 43. Return the action with the highest utility. # Markov Chains A Markov chain is a random (and generally discrete) process where the next state only depends on the current state, and not previous states. There also exists higher-order markov chains, where the next state will depend on the previous n states. Note that any k'th-order Markov process can be expressed as a first order Markov process. In a Markov chain, you typically have observable variables which tell us something about something we want to know, but can not observe directly. ## Operations on Markov Chains ### Filtering Calculating the unobservable variables based on the evidence (the observable variables). ### Prediction Trying to predict how the variables will behave in the future based on the current evidence. ### Smoothing Estimate past states including evidence from later states. ### Most likely explanation Find the most likely set of states. ## Bernoulli scheme A special case of Markov chains where the next state is independent of the current state. # Learning ## Decision trees Decision trees are a way of making decision when the different cases are describable by attribute-value pairs. The target function should be discretely valued. Decision trees may also perform well with noisy training data. A decision tree is simply a tree where each internal node represents an attribute, and it's edges represent the different values that attribute may hold. In order to reach a decision, you follow the tree downwards the appropriate values until you reach a leaf node, in which case you have reached the decision. ### Building decision trees The general procedure for building decision is a simple recursive algorithm, where you select an attribute to split the training data on, and then continue until all the data is classified/there are no more attributes to split on. How big the decision tree is depends on how you select which attribute you split on. For an optimal solution, you want to split on the attribute which holds the most information, the attribute which will reduce the entropy the most. #### Overfitting With training sets that are large, you may end up incorporating the noise from the training data which increases the error rate of the decision tree. A way of avoiding this is to calculate how good the tree is at classifying data while removing every node (including the nodes below it), and then greedily remove the one that increases the accuracy the most. # Case-based reasoning Case-based reasoning assumes that similar problems have similar solutions and that what works today still will work tomorrow. It uses this as a framework to learn and deduce what to do in both known and unknown circumstances. ## Main components * Case base * Previous cases * A method for retrieving relevant cases with solutions * Method for adapting to current case * Method for learning from solved problem ## Approaches ### Instance-based *Answer:* Describe the four main steps of the case-based reasoning (CBR) cycle. What is the difference between “instance-based reasoning” and “typical CBR”? An approach based on classical machine learning for *classification* tasks. Attribute-value pairs and a similarity metric. Focus on automated learning without intervention, requires large knowledge base. ### Memory-based reasoning As instance-based, only massively parallel(?), finds distance to every case in the database. ### (Classic) Case-based reasoning Motivated by psychological research. Uses background-knowledge to specialise systems. Is able to adapt similar cases on a larger scale than instance-based. ### Analogy-based Similar to case-based, but with emphasis on solving cross-domain cases, reusing knowledge from other domains.