Table of Contents

Model Human Processor
GOMS
Fuzzy reasoning
Artificial neural networks
Soar
Icarus
ACT-R
1. Architecture
BDI
1. Architecture
2. Functions and processes
Brooks' Subsumption Architecture
Brain / Neuronal Computer Interaction (BNCI)
Natural Language Processing (NLP)
Automatic Speech Recognition (ASR)
Multimodal interaction
1. Fusion

‹

TDT4137: Cognitive Architectures

Tags:

Model Human Processor

The Model Human Processor (MHP) is an abstraction of how a human percieves and interacts with the world, and gives a limited description of psychological knowledge about human performance.

Fig. 1: The Model Human Processor

MHP can be divided into three interacting subsystems:

Perceptual system
Cognitive system
Motor system

MHP also contains some important parameters:

$\delta$ - Half time for memory
$\mu$ - Memory capacity
$\kappa$ - Main code type (physical, acoustic, visual, semantic)
$\tau$ - Cycle time of a processor

The Perceptual System

The perceptual system is the system handling all of the perception. It does not do any cognitive processing of the percept (i.e. thinks and reflect about the percepts). It gets input from the sensors, encodes (processes) the inputs to a storable format and makes it available in a short period of time to the working memory (WM).

The most important components of the perceptual system are:

Perceptual Processor
Visual Image Store
Auditory Image Store

Perceptual Processor

The perceptual processor receives input from sensors (e.g. eyes and ears) and passes the received input to one of the perceptual stores (e.g. the visual image store). The time the processor uses to process input is described by the cycle time of the processor:

$\tau_{p} = 100\ [50\sim200]\ ms$

This means that the perceptual processor uses an average of 100 ms to process an input (a minimum of 50 ms and a maximum of 200 ms). This is an empirical estimation of the acual time a human uses (which of course varies from human to human). The consequences of this is that a perception (input) that lasts shorter than the cycle time will not be perceived.

Visual Image Store

The visual image store (VIS) is a buffer where the perceptual processor can store a physical code (hence the code type of VIS is physical) of a visual perception. All items in the buffer are transferred to the working memory (WM) after a negligible delay (as you can see from Fig. 1, VIS is actually inside the WM).

The empirical estimated parameter values of VIS are:

$\delta_{VIS} = 200\ [70\sim1000]\ ms$

$\mu_{VIS} = 17\ [7\sim17]\ letters$

$\kappa_{VIS} = Physical$

As we can se from $\kappa_{VIS}$, the code is physical, meaning that if your eyes perceives a cup, VIS will store a physical code representing the cup. The size of this physical code is limited in space by $\mu_{VIS}$ and in time by $\delta_{VIS}$. In a way $\delta_{VIS}$ represents how long the physical code of a perception is accessable in the buffer. If you perceive something while your working memory is full for more than $\delta_{VIS}$ time, you will perceive it, but never really think about what you've perceived, thus not remembering it.

Everyone who has watched TV and talked on the phone at the same time knows about this issue. Your eyes are seeing everything that is happening on Game Of Thrones, but when your mom asks you what you want for christmas and you're thinking really hard for a couple of seconds, everyone is suddenly dead, and you have no clue of what happened. Damn you mom!

Auditory Image Store

The Auditory Image Store (AIS) is a buffer where the perceptual processor can store a physical code (hence the code type of AIS is physical) of a auditory perception. All items in the buffer are transferred to the working memory after a negligible delay (as you can see from Fig. 1, AIS is actually inside the WM).

The empirical estimated parameter values of AIS are:

$\delta_{AIS} = 1500\ [900\sim3500]\ ms$

$\mu_{AIS} = 5\ [4.4\sim6.2]\ letters$

$\kappa_{AIS} = Physical$

As we can se from $\kappa_{AIS}$, the code is physical, meaning that if your ears perceive "christmas", AIS will store a physical code representing christmas. The size of this physical code is limited in space by $\mu_{AIS}$ and in time by $\delta_{AIS}$. In a way $\delta_{AIS}$ represents how long the physical code of a perception is accessable in the buffer. If you perceive something while your working memory is full for more than $\delta_{AIS}$ time, you will perceive it, but never really think about what you've percieved, thus not remembering it. As you might notice, AIS has a much larger half time ($\delta_{AIS}$) than VIS, which means that you actually might be able to process (cognitively) something you heard a couple of seconds ago.

Everyone who has watched TV and talked on the phone at the same time knows about this feature. Your mom is asking a question, but at the same time something thrilling is happening on Game Of Thrones, and you have to pay attention to that (after all, you have your priorities straight). A couple of seconds later, you're about to say "What?", but in the middle of the word you suddenly know that she asked about what you wanted for christmas. Way to go auditory image store!

The Cognitive System

The cognitive system receives coded data from the perceptual system's buffer into the working memory (WM). It can do cognitive processing (classifying, comparing etc.) of the data stored in the WM, and fetch data from the long term memory (LTM) into WM, using the cognitive processor.

The main components of the cognitive system are:

Cognitive Processor
Working Memory

Cognitive Processor

The cognitive processor is the processing unit of the cognitive part of the MHP. It can do various cognitive operations with data in the working memory.

Generally the the cycle time of the cognitive processor is:

$\tau_{C} = 70\ [25\sim170]\ ms$

This time will vary depending on the cognitive operation and the representation of the data involved. The following three basic operations takes one cycle for cognitive processor.

Match Content In WM

Data can be represented in many different ways (code types): physical code, visual, semantic (name or class). Comparing two data representations is an important cognitive process, and is for example the cognitive operation that lets you recognize a specific water bottle (object) as a water bottle (class). The result of a match is put into WM (f.ex. Yes/No).

Time to match content in the cognitve processor Fig. 2: Empirical time estimate of matching

Fetch Data From LTM

Often you don't have the information you need in the WM (which mainly consist of perceptions) and need to fetch them from LTM. The result is put into WM.

Generate Motor Command Into WM

Sometimes a specific data in the WM should result in your hand moving, head twisting or another motor command. Such commands are put into WM, and later read and processed by the motor processor.

Working Memory

The working memory (WM) holds the percepts from the perceptual system, fetched data from the LTM and results of cognitive operations. All mental operations happen in WM.

The empirical estimated parameter values of WM are:

$\delta_{WM} = 7\ [5\sim226]\ s$

$\mu_{WM} = 3\ [2.5\sim4.1]\ chunks$

$\mu_{WM^{*}} = 7\ [5\sim9]\ chunks$

$\kappa_{WM} = Acoustic\text{ or }Visual$

As we might expect, the half time ($\delta_{WM}$) is far greater than was the case in the buffers from the perceptual system.

Unlike VIS and AIS, WM's memory capacity has chunks as units. We see that the code type is acoustic or visual, so these "chunks" contain either acustic og visual data, but what the heck is a "chunk"?

A chunk is one unit of recognizable/sensible data. For example "DNRIAIKSS" would maybe need 9 chunks (one for each letter), since the word as a whole doesn't make any sense. The word "IDISASNRK" would maybe need three chunks because we can "chunk" the word into three sensible word/units (IDI, SAS, NRK), and therefore take up far less space in the working memory and be easier to remember.

You might still be confused, since there are two memory capacities ($\mu_{WM}$ and $\mu_{WM^{*}}$), but do not despair, the anwer is simple! The one with the star ($\mu_{WM^{*}}$) is representing data fetched from LTM. The one without the star ($\mu_{WM}$) is everything else (perceptions, motor commands and result from cognitive operations).

Actually, I lied when I told you the half time of the WM. You probably do not need to remember the exact numbers, but the half time actually depends on the number of chunks you have in WM. If you just think really hard about one single thing, you can remember it a little longer than if you are trying to remember three things at once.

$\delta_{WM}(1\ chunk) = 73\ [73\sim226]\ s$

$\delta_{WM}(3\ chunks) = 7\ [5\sim34]\ s$

The Motor System

The motor system consists only of the motor processor, which receives motor commands from WM (put there by the cognitive processor) and executes them.

Generally the cycle time of the motor processor is:

$\tau_{M} = 70\ [30\sim100]\ ms$

When the motor processor executes a motor command, it only tells the motor(s) to do the desired action. The movement itself might take longer to execute, depending on the type of movement. In motor commands like "push button", if the finger is already at the button, the time it takes for the finger to actually push the button is usually neglected. In more time consuming motor commands, like "move the finger to the button" (if the finger was not at the button) Fitts' Law or similar formulas are used to estimate the time.

Principles Of Operation

There are 10 principles of operation, P0 to P9.

P0: Recognize-Act Cycle of the Cognitive Processor

On each cycle of the Cognitive Processor, the contents of the WM initiate actions associatively linked to them in LTM; these actions in turn modify the contents of WM.

P1: Variable Perceptual Processor Rate Principle

The Perceptual Processor cycle time $\tau_{P}$ varies inversely with stimulus intensity.

P2: Encoding Specificity Principle

Specific encoding operations performed on what is perceived determine what is stored, and what is stored determines what retrieval cues are effective in providing access to what is stored.

P3: Discrimination Principle

The difficulty of memory retrieval is determined by the candidates that exist in the memory, relative to the retrieval clues.

P4: Variable Cognitive Processor Rate Principle

The Cognitive Processor cycle time $\tau_{C}$ is shorter when greater effort is induced by increased task demands or information loads; it also diminishes with practice.

P5: Fitts' Law

The time $T_{pos}$ to move the hand to a target of size $S$ which lies a distance $D$ away is given by:

$$T_{pos} = I_{m}*\log_{2}\left(\frac{D}{S} + 0.5\right)$$

Where $I_{m} \sim 100\ [70 - 120]\ ms/bit$

$\log_{2}\left(\frac{D}{S} + 0.5\right)$ is known as the Index of Difficulty (ID).

This is actually not the original, but a corrected version (Welford correction) due to its inaccuracy with small values of $\frac{D}{S}$.

A common version of Fitts' Law is Shannon's version:

$$T = a + b * \log_{2}\left(\frac{D}{S} + 1\right)$$

Where $a$ and $b$ are tweakable parameters and $\log_{2}\left(\frac{D}{S} + 1\right)$ is the ID

But what is this ID?

Fitt's Law Model Fig. 3

Fig. 3 shows some data points from experiments and a regression line that closely matches the points. Suitable values for a and b (53 and 148 respectively) were found based on the regression line.

P6: Power Law of Practice

The time $T_{n}$ to perform a task on the n'th trial follows a power law:

$$T_{n} = T_{1}*n^{-\alpha}$$

Where $\alpha = 0.4[0.2 - 0.6]$

P7: Uncertainty Principle

Decision time $T$ increases with uncertainty about the judgement or decision to be made:

$$T = I_{c}*H$$

Where $H$ is the information theoretic entropy of the decision $I_{c} = 150 [0 \sim 157]\ ms/bit$. For $n$ equally probable alternatives (Hick's Law):

$$H = log_{2}(n + 1)$$

Fig. 4: Hick's Law of choice reaction time

The figure above (Fig. 4) shows the reaction time when:
At the onset of one of $n$ lights, arranged in a row, the subject is to press the key located below the light.

P8: Rationality Principle

A person acts so as to attain his goals through rational action, given the structure of the task and his inputs of information and bounded by limitations of his knowledge and processing ability:

$$\text{Goals} + \text{Task} + \text{Operators} + \text{Inputs} + \text{Knowledge} + \text{Processing Limits}$$ $$\Downarrow$$ $$\text{Behavior}$$

P9: Problem Space Principle

The rational activity in which people engage to solve a problem can be described in terms of:

A set of states of knowledge
Operators for changing one state into another
Constraints on applying operators
Control knowledge for deciding which operator to apply next

GOMS

GOMS is a kind of specialized human information processor model for human-computer interaction observation. GOMS stands for:

Goals
Operators
Methods
Selection rules

GOMS reduces a user's interaction with a computer to the basic interactions (physical, cognitive and perceptual)

The Keystroke-Level Model (KLM)

To estimate execution time for a task, the analyst lists the sequence of operators and then totals the execution times for the individual operators. There are several operators:

K - Press a key or button on the keyboard
B - Press mouse button
H - Home hands to the keyboard or mouse
P - Point with a mouse to a target on a display
D - Draw a line segment on a grid
M - Mentally prepare to do an action or a closely related series of primitive actions
R - Represent the system response time during which the user has to wait for the system
CP - Cognitive/Perceptual operator (for example locate an object on the screen)

The KLM is based on a simple underlying cognitive architecture: a serial stage model of human information processing in which one activity is done at a time until the task is complete.

Example:

A Keystroke-Level Model for moving text

NGOMSL (Natural GOMS Language)

Goals
- Action object pair.
- Can be device dependent or device independent
Operators
- What basic actions can be performed?
- Keystroke level or High level
Methods
- Sequences of operators
Selection rules
- Which method should be used to accomplish goal

Syntax example

Selection rule set for the goal: call
If task is call by using favorites then accomplish: callByFavorites
If task is call by typing number accomplish: callByNumber
(...)
Return with goal accomplished

Method to accomplish goal of callByFavorites
Step 1. Accomplish goal of showFavorites
Step 2. Accomplish goal of phoneByName
Step 3. Return with goal accomplished

Method to accomplish goal of showFavorites
Step 1. Locate Favorites symbol (CP)
Step 2. (...)
(...)

CPM-GOMS

The CPM in CPM-GOMS stands for two things: Cognitive, Perceptual and Motor, and Critical Path Method. CPM-GOMS, like the other GOMS models, predicts execution time based on an analysis of component activities. However, CPM-GOMS requires a specific level of analysis where all its operators are simple perception, cognition and motor acts. CPM-GOMS does not make the assumption that operators are followed serially in contrast to the other GOMS models stated in this article. In CPM-GOMS, the operators can be performed in parallel. CPM-GOMS uses a schedule chart to represent operators and dependencies, and predicted execution time of tasks are calculated by finding critical path in this schedule chart.

Fuzzy reasoning

"Everything is a matter of degree" -Zadeh 1994

Membership functions

In fuzzy reasoning, an object can "partially belong" to a set. This is inspired by real life. For example, Britney Spears once sang the song "I'm not a girl, not yet a woman". She probably felt that she was somewhere between clearly being a girl and clearly being a woman. She partially belonged to both classes.

Britney Spears fuzzy reasoning example

Fuzzy rules

Examples:

IF height is tall THEN action is crouch
IF height is short OR height is medium THEN action is do_nothing

Mamdani Fuzzy Inference

Fuzzification of the (crisp) input variables
- Determine the degree to which the crisp inputs belong to each of the fuzzy sets
Rule evaluation (inference)
- Evaluate all fuzzy rules. "OR" means taking the maximum value, "AND" means taking the minimum value
- Clip the degree of membership to the resulting value
Aggregation of the rule outputs (composition)
- Combine the memberships to a single fuzzy set
Defuzzification
- Find a point representing the centre of gravity of the aggregated fuzzy set. This is done by performing numerical integration.

Sugeno Fuzzy Inference

In Sugeno Fuzzy Inference all consequent membership functions are represented by singleton spikes. This way, there is no need to integrate a two-dimensional shape to find the centroid. That makes the Sugeno method more computationally efficient than the Mamdani method. However, Mamdani method is more intuitive for humans.

Artificial neural networks

The brain

Neural networks are biologically inspired. The human brain has around ten billion neurons, and it has a high degree of parallel computation. A neuron is pretty complex. It has branching input (dendrite), a cell body and a branching output (axon). Electro-chemical signals are propagated from the input, through the cell body, and down the axon to other neurons. A neuron only fires if its input signal exceeds a threshold.

Nevron

Perceptrons

Computer scientists don't care much about the biological details of neurons, but instead use an abstraction: each neuron is a node with an activation function (mathematically defined), a number of inputs (that are summed) and a single output value. An edge is a connection from the output of a neuron to the input of another neuron. An edge has a weight, which is a multiplier that decides how much signal should be passed through. The whole mechanism that is a node with many inputs, an activation function, and an output is called a perceptron.

Perceptron

Activation functions

The mapping from the sum of the inputs to the output. Examples:

Linear, i.e. $f(x) = x$
Rectified linear unit
Hyperbolic tangent (tanh)
Sigmoid function, i.e. $f(x) = \frac{1}{1+e^{-x}}$

Supervised learning

You feed data into the ANN and tell what the expected (correct) output is. The ANN will learn from its errors (i.e. the difference between its actual output and the expected output). The concept of updating weights to gradually minimize errors is called backpropagation ("backward propagation of errors").

Linear separability

As illustrated above, at least two layers are required for the Exclusive OR-problem. A single-layer perceptron can be trained to perform the logical operations AND and OR, because they are linearly separable, but not XOR (exclusive or). XOR is not linearly separable, i.e. there is no single straight line that can separate all the black dots from all the white dots:

A perceptron cannot learn XOR

Perceptron learning algorithm

Initialization: Set initial weights $w_1, w_2, ..., w_n$ and threshold $\theta$ to random numbers in the range $[-0.5, 0.5]$
Activation: Activate the perceptron by applying inputs $x_1(p), x_2(p), ..., x_n(p)$ and desired output $Y_d(p)$. Calculate the actual output at iteration $p = 1$ $$Y(p) = step \left[\left(\displaystyle\sum_{i=1}^{n} x_i(p)w_i(p)\right) - \theta \right]$$ where $n$ is the number of the perceptron inputs, and step is a step activation function.
Weight training: Update the weights of the perceptron $$w_i(p+1)=w_i(p) + \Delta w_i(p)$$ where $\Delta w_i(p)$ is the weight correction at iteration p. The weight correction is computed by the delta rule: $$\Delta w_i(p) = \alpha \times x_i(p) \times e(p)$$ where $\alpha$ is the learning rate, and $e(p)$ is given by the difference between desired and actual output: $e(p) = Y_d(p) - Y(p)$
Iteration: Increase iteration $p$ by one, go back to step 2 and repeat the process until convergence.

Gradient descent

This is about gradually minimizing errors. You "walk down" the N-dimensional surface. The learning rate determines how big steps you take down the surface. Too big steps can be bad, because it might cause the algorithm to miss the global minimum, and in extreme cases the algorithm can diverge instead of converge. Also, another problem in gradient descent is local minima where the algorithm can get stuck. One popular solution to this problem is adding momentum, where you carry over some of the "velocity" of the change in the last iteration. When the gradient keeps changing direction, momentum will smooth out the variations.

Local minima vs global minimum in gradient descent

Backpropagation in multi-layer networks

TODO

Soar

Originally stood for State Operators And Result, but is no longer treated as an acronym.

BEHAVIOR = ARCHITECTURE + CONTENT

Soar decomposition:

Goals
Problem spaces
States
Operators

The Soar theory of cognitive behavior

Common characteristics of cognitive behavior:

Goal-oriented (you do actions to get closer to a goal)
Takes place in a rich, complex and detailed environment
Requires a large amount of knowledge
Requires use of symbols and abstractions
It is flexible, and a function of the environment
It requires learning from the environment and experience

Memory structure

Memory structure in SOAR

Impasse

Impasse is an event that arises when Soar cannot resolve operator preferences (i.e. it fails to choose a specific action). A substate that represents the information relevant to resolving the impasse is created. Soar may use different strategies to solve the impasse. Episodic memory can be used to help solve the problem. When a solution is found, Soar uses a learning technique called chunking to transform the actions taken into a new rule. This rule can be applied when Soar encounters the situation again.

Building Soar Programs

All of the knowledge in a Soar agent is basically represented as If ... Then ... rules (a.k.a. production rules).

The Soar Processing Cycle

Soar Processing Cycle

TODO: Explain what happens in each part of the process

Syntax

TODO: Explain the syntax of Soar code

Icarus

Icarus - A unified cognitive theory

The ICARUS Architecture

The ICARUS Architecture

ICARUS is largely based on two things: concepts (categories) and skills.
It has two different long-term memories: one for skills and one for concepts. Concepts are things it knows (the facts and ideas of the world) and skills are things it can do. It is important to notice that these two are not connected.

The ICARUS Modules

ICARUS can be viewed as four modules where, as stated above, concepts and skills are the two most basic. On top of them we first have problem solving which is explained in the Impasse section, and then learning.

Percepts and concepts

When ICARUS perceives something from the environment it gets stored in the perceptual buffer. The perception is then categorized and matched (through pattern matching) with known concepts in the long term conceptual memory (LTCM). Concepts in the LTCM are hierarchically organized, where complex concepts are defined in terms of simpler (less complex) concepts. Imagine a tree-structured graph where the leaf nodes are categorized perceptions (like "car"), and the internal nodes are more complex concepts. This gives a bottom-up structure (see the image below). Conceptual inference occurs from the bottom up. In other words, you start with feeding data about observed percepts into the bottom nodes, and then you go upwards until you meet the top nodes. This process produces high-level beliefs about the current state.

Concepts in ICARUS

After the percept is matched with a concept in the LTCM, the result is put into the short term conceptual memory (STCM or belief memory). A result of this is that everything in the STCM are instances of LTCM-elements.

Skills and goals

The only connection between skills and concepts are between the STCM and the skill retriever. The skill retriever gives the short term skill memory (STSM) the applicable skills, based on the current situation, made available to it through the STCM, and the known skills in the long term skill memory (LTSM).

All skills in the LTSM are a description of how ICARUS might interact with the environment. They describe changes in the conceptual structure (the current situation) as a result of their execution. They are, as the concepts in LTCM, stored organized as a hierarchy, but unlike LTCM, elements (skills) are indexed by the goals they achieve, giving a top-down stucture (see the image below). Skill execution occurs from the top down, starting from goals, to find applicable paths through the skill hierarchy. A skill clause is applicable if its goal is unsatisfied and if its conditions hold, given bindings from above.

Skills in ICARUS

The learning process is interleaved with problem solving, which provides it with experience. Problem solving operates top down, but Icarus acquires skills bottom up.

Execution cycle

On each successive execution cycle, the Icarus architecture:

Places descriptions of sensed objects in the perceptual buffer
Infers instances of concepts implied by the current situation
Finds paths through the skill hierarchy from top-level goals
Selects one or more applicable skill paths for execution
Invokes the actions associated with each selected path

Impasse

Impasse occurs when Icarus cannot find a path through the skill hierarchy. A problem solver is invoked. It uses means-ends analysis (MEA). Means-ends analysis is a problem solving technique where you search for a sequence of actions that lead to a desirable goal. Icarus chains backwards off a single skill (or concept) to create subgoals. One of those subgoals is selected and Icarus tries to achieve that instead. As a result of the problem solving process, Icarus may learn new skills. These skills can be used in the future.

To be more specific, here's what happens during problem solving in Icarus:

Find a skill that matches a goal that has not been achieved, and try to achieve that goal
If no skill matches, try to find a concept that is not fulfilled and create a subgoal about fulfilling that concept

More in-depth article: A Unified Cognitive Architecture for Physical Agents

Learning

When ICARUS carries out means-ends analysis in Impasse, it retains traces of successful decompositions. Each trace includes details about the goal, skill, and the initially satisfied and unsatisfied subgoals. As the system solves each subgoal, it generalizes the associated trace and stores it as a new skill. The skill hierarchy is expanded with new learned skills for later use when ICARUS learns to accomplish new goals or to accomplish goals in another way.

ACT-R

ACT-R is an integrated cognitive architecture. ACT-R stands for Adaptive Control of Thought - Rational. It is a framework. It is a theory about how human cognition works. It brings together perception, cognition and action. It runs in real time. It learns. It has robust behavior in the face of the unexpected and the unknown.

Architecture

ACT-R is made up of modules, buffers and a subsymbolic level

ACT-R Architecture

This architecture is inspired by the functional organization of the human brain. The brain has several regions that perform different tasks, and so does ACT-R.

Declarative module

Remembers knowledge/facts in the form of chunks. Neural computation can activate chunks so that they become available.

Procedural module

The procedural module is an important thing in ACT-R, because it is the module that has the knowledge about how to do something. It is made up of condition-action rules, a.k.a. production rules. The procedural module does not have its own buffer, but is part of the working brain of ACT-R and accesses the other modules trough their buffers.

Buffers

For each module (visual, declarative, etc.) there is a buffer that serves as the interface with that module. The contents of the buffers at any given time represent the state of ACT-R at that time.

Subsymbolic level

A set of processes run in parallel. The production system is symbolic. If several productions match the state of the buffers, a subsymbolic utility equation estimates the relative cost and benefit associated with each production and selects the production with the highest utility.

BDI

Originally developed as system that can reason and plan in dynamic environments, origin in computer science. BDI meets real-time constraints by reducing time used in planning and reasoning. Designed to be situated, goal directed, reactive and social. BDI rational agent able to react to changes and communicate in their embedded environment, while attempting to achieve goals.

Architecture

Main components: beliefs, desires and intention, interpreter. Adopts database for storing beliefs and plans. Beliefs = facts about world, inference rules that may lead to acquisition of new beliefs. Beliefs updated by perception of environment and execution of intentions.

Plans = sequence of actions that agent can perform to achieve intentions. Type of declarative knowledge containing ideas on how to accomplish set of given goals, or how to react to certain situations. Plan consists of body and invocation condition. Body of plan = possible courses of actions and procedures to achieve goal. Invocation condition = specifies prerequisities to be met for plan to be executed, or continue executing. Plan can also be subgoals to achieve.

Desires (goals) objectives aimed to accomplish. Conflicts among desires resolved using rewards and penalties. A goal is achieved successfully when behaviour satisfying goal description is executed.

Intentions = actions that agent committed to perform in order to achieve desires. Contains all tasks chosen by system for execution.

Functions and processes

System intepreter manipulates components of BDI architecture: in each cycle, update event queue by the perceptual input and internal actions to reflect events observed, then select an event. New possible desires generated by finding relevant plans in plan library for selected event. From set of relevant plans, an executable plan selected and instance plan is thus created. Instance plan pushed onto existing intention stack (when plan is triggered by internal event s.a. another plan) or new intention stack (when plan tribbered by external event). BDI interact with environment through database when new beliefs acquired or through actions performed during execution of intention. Interpreter cycle repeats after intention executed, but acquisition of new belief or goal may lead to alteration of plans → agent works on another intention.

BDI integrates means-ends reasoning with use of decision knowledge in reasoning mechanism in context of existing intentions. Once agent is committed to process of achieving goals, it does not consider other pathways although they might be better → cuts down decision time.

Active goal in BDI dropped once system recognizes goal as accomplished or cannot be accomplished (happens only when trying all applicable plans have failed).

Original BDI architecture do not have mechanism of learning.

Brooks' Subsumption Architecture

Based on a layered architecture where each layer is more intelligent than the one below. Modules in higher levels can override or subsume output from the lower layers. Behaviour layers operate concurrently and independently and hence need mechanism to handle potential conflicts. The winner is always the highest layer. This design is robust because if a higher level fails, the lower levels will still work.

The subsumtion layers

Subsumption architecture has no internal model of the real world because it has no free communication and no shared memory. Instead, the real world is used as model.

The higher levels in the Subsumption architecture can suppress or inhibit the lower layers in a one-way communication channel.

Brain / Neuronal Computer Interaction (BNCI)

EMG - muscle activity

Recording electrical activity created by muscles. Can be combined with eye tracking. For example, eye tracking can move a mouse cursor while moving a facial muscle can trigger a mouse click.

EOG - eye activity

Finding the angle of an eye by measuring electrical potential around the eye. When the eye moves, the potential changes. Electrodes are typically placed either above and below the eye or to the left and right of the eye.

EEG - brain activity

Recording electrical activity of the brain with the help of electrodes that are placed on the head. We generally focus on the spectral content of the EEG. In other words, we inspect the frequencies of the brain waves. We can obtain the amplitude of low frequencies, mid frequencies and high frequencies by applying Fourier Transform to the raw signal coming from a sensor.

The following figure shows the signal flow in an application where brain waves are used to control a wheelchair: Using EEG to control a wheelchair

A spectrogram is a useful tool for visualizing frequency changes over time, as brain activity changes. FFT is applied for each time frame:

Frequency bands

Delta: 0 - 4 hz. Deep sleep.
Theta: 4 - 8 hz. Idle.
Alpha: 8 - 13 hz. Relaxing, creativity, reflection.
Beta: 13 - 30 hz. Activity, concentration, general work.
Gamma: 30 - 100 hz. Treating multiple sensory input.

Natural Language Processing (NLP)

Why do we want to process natural language? So AI agents can communicate with people and understand text written by humans.

Ambiguity

NLP isn't easy: natural language is highly ambiguous. For example the following sentences have multiple meanings:

I made her duck

duck can be a verb and a noun. made can mean cook, but it can also mean create.

I saw a man in the park with a telescope

This can be either interpreted as a man in the park that has a telescope or it can mean that someone saw a man through a telescope.

Syntactic ambiguity occurs when a string of words can be assigned to more than one syntatic structure.

Redundancy

Another challenge in NLP is redundancy, i.e. the same information is expressed more than once.

Syntactic structure (parse tree)

Parse tree

General NLP system architecture

Phonetical analysis: Classification of speech sounds
Morphological analysis: Individual words are analyzed into their components and non word tokens such as punctuation are separated from the words
Syntactical analysis: Linear sequences of words are transformed into structures that show how the words relate to each other
Semantic analysis: The structures created by the syntactic analyzer are assigned meanings

Automatic Speech Recognition (ASR)

Signal analysis

For every 10 ms, do the following procedure with a frame size of 25 ms:

Do Fast Fourier Transform - FFT (to convert to the frequency domain)
Apply Mel scaling. This basically means taking the log of the frequencies (to approximate human perception of frequencies)
Do Discrete Cosine Transform (to get a single real value for each frequency bin)
Create a feature vector, which consists of:
- 12 MFCC features (how much of each frequency bin right now)
- "total energy" (how loud is the sound right now)
- derivative of 12 delta MFCC features (i.e. how fast is the volume in each frequency bin changing)
- derivative of energy (how fast is the loudness changing)
- second derivative of 12 delta MFCC features ("acceleration")
- second derivative of energy ("loudness acceleration")

In conclusion, we just took 25 ms of audio and transformed it into a more comprehensible feature vector with 39 entries.

Acoustic Scoring

The goal of acoustic scoring is to find out what kind of sound (vowel/phone) is spoken, given a feature vector. This is typically done by feeding the vector into an Artifical Neural Network (ANN) with softmax activation in the output layer or a Gaussian Mixture Model (GMM). An acoustic model and a pronounciation model is needed. The output is a set of scores; one for each kind of sound (vowel/phone).

Linguistic Scoring

How do we decode a set of consecutive phones? What phones can follow each other in a word? We need a pronunciation model. And what words do typically follow each other? We need a language model (n-gram). Typically, a Hidden Markov Model combined with the Viterbi algorithm (dynamic programming) works well for finding the most probable sequence of words based on the phones given by the acoustic scoring.

Multimodal interaction

Multiple modes of interacting with a system. Multimodal Interaction Loop Legg merke til i figuren at fusion kan skje på tre forskjellige nivåer: data, features og decision.

Fusion

The goal of fusion is to extract meaning from a set of input modalities and pass it to a human-machine dialog manager.

Last updated: Mon, 6 Dec 2021 13:37:39 +0100 .