TDT4215: Web-intelligence
# Curriculum
As of spring 2016 the curriculum consists of:
- __A Semantic Web Primer__, chapter 1-5, 174 pages
- __Sentiment Analysis and Opinion Mining__, chapter 1-5, 90 pages
- __Recommender Systems__, chapter 1-3, and 7, 103 pages
- _Kreutzer & Witte:_ [Opinion Mining Using SentiWordNet](http://stp.lingfil.uu.se/~santinim/sais/Ass1_Essays/Neele_Julia_SentiWordNet_V01.pdf)
- _Turney:_ [Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews](http://www.aclweb.org/anthology/P02-1053.pdf)
- _Liu, Dolan, & Pedersen:_ [Personalized News Recommendation Based on Click Behavior](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.3087&rep=rep1&type=pdf)
# Semantic Web
## The Semantic Web Vision / Motivation## Ontology## RDF (Resource Description Framework)### Basics of RDF### RDF model### Resources### Properties### Statements### Serialization formats## RDFS (Resource Description Framework Schema)### Basics of RDFS### Vocabulary### Classes### Properties### Literals
### Problems with RDFS
## SPARQL (SPARQL Protocol and RDF Query Language)### Basics of SPARQL### Graph traversal### Queries
## OWL (Web ontology language)### Basics of OWL### Language constructs#### Classes#### Properties#### Property characteristics#### Cardinality#### Individuals#### Others### Semantics and Reasoning#### Description Logics## Ontology guidelines
# Sentiment analysis
Sentiment analysis is about searching through data to get people's opinions about something. Sentiment analysis is also called opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysys, emotion analysis, or review mining.Businesses and organizations are interested in knowing what people think of their products. Individuals are interested in what others think of a product or service. We would like to be able to do a search like "How good is iPhone" and get a general opinion on how good it is.
With "explosions" of opinion data in social media and similar applications it is easy to gather data for use in decisionmaking. It is not always necessary to create questionnaires to get peoples opinions. It can, however, be difficult to extract meaning from long blogposts and arcticles, and make a summary. This is why we need _Automated Sentiment analysis systems_.### Document lifferent Level sentiment classification
s of Analysis
Research has mainly been done on three different levels.
#### Document-level
Document-level sentiment classification works with entire documents and tries to figure out of the document has a positive or negative view of the subject in question. An example of this is to look at many reviews and see what the authors mean, if they are positive, negative or neutral.
#### Sentence-level
Sentence-level sentiment classification works with opinion on a sentence level, and is about figuring out if a sentence is positive, negative or neutral to the relevant subject. The point of document leis is closely related to _subjectivel sentiment classification is to classify documents based on oity classification_, which distinguishes between objectiverall sentiment and subjective sentences.
#### Entity and aspect-level
Entity and aspect-level sentiment classification wants to discover opinions about an entity or it's aspects. For instance we have the following sentence, "The iPhone’s call quality is good, but its battery life is short", two aspects are evaluated: call quality and battery lifetime. The entity is the iPhone. The iPhone's call quality is positive, but the battery duration is negative. It's hard to find and classify these aspects, as there are many ways to expressed by opinion holder positive and negative opinions, metaphors, comparisons and so on.
## Sentiment Lexicon and Its Issues
Some words can be identified as either positive or negative immediately, like good, bad, excellent, terrible and so on. There are also subsentences/phrases that can be identified as positive or negative. An list of such words or sentences is called a sentiment lexample of this is to looicon.
Use of just this is not enough. Below are several k at many renown issues:
1. Words and phrases can haviee different meaning in different contexts. For example, "suck" usually has a negative meaning, but can be positive if put in the right context: "This vacuum cleaner really sucks".
2. Sentences containing sentiment ws and seeords sometimes do not reflect a sentiment. For example, "If this product X contain a _great_ feature Y, I'll buy it". _Great_ does not express a positive or negative opinion on the product X.
3. Sarcastic sentences are hard to deal what the authors meanith. They are mostly found in political discussions, if they are posie.g. "What an awesome product! It stopped working in two days".
4. Sentences without sentiment words can also express opinions. "This laptop consumes a lot of power.", reflects a partially negative opinion about the laptop, negabut the sentence is also objective or neutralas it states a fact.
T## ODpinion lexicon generation
## Aspect-based opinion mining
## O: Add steps from lecture notes
## Sentence lepinion mining of comparativel sentiment classification
Since document sentences
## Opinion spam detection
## Unsupervised search-lebased approach
## Unsupervel sentiment classification is too coarse for most applications we move to sentence level.
A lot of the work done here focuses on subjective sentences in news articles.
## Opinionised lexicon generation
## Aspect-based opinion mining
## Opinion mining of comparative sentences
## Opinion spam detection
## Unsupervised search-based approach
## Unsupervised lexicon-based approachapproach
## Semantic Vector space model
# Articles / Books
## Sentiment Analysis and Opinion Mining
Retrieved from the book _Sentiment Analysis and Opinion Mining_ by Bing Liu, published by Morgan & Claypool Publishers, May 2012.
### Preface
There has been an increase in opinion based data on net, in the form of blog posts, Twitter, Facebook and so on. Therefore there has been more focus on sentiment analysis of data, as this can significantly improve these products.
### Sentiment Analysis: A Fascinating problem
Sentiment analysis, or opinion mining, is the study of peoples opinions and attitudes in relation to products, organizations, people and so on.
Sentiment analysis has a lot of names: opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysys, emotion analysis, review mining.
### Sentiment Analysis Applications
Our opinions affect which decisions we make. This is why opinion mining is popular, since corporations are interested in getting their customers opinions on their products and services. If a company is able to map this they will have a competitive advantage.
With "explosions" of opinion data in social media and similar applications it is easy to gather data for use in decisionmaking. It is not always necessary to create questionnaires to get peoples opinions. It can, however, be difficult to extract meaning from long blogposts and arcticles, and make a summary. This is why we need _Automated Sentiment analysis systems_.
Sentiment analysis has spread to many different areas.
### Sentiment Analysis Research
#### Different Levels of Analysis
Research has mainly been done on three different levels.
__Document-level sentiment classification__ works with entire documents and tries to figure out of the document has a positive or negative view of the subject in question.
__Sentence-level sentiment classification__ works with opinion on a sentence level, and is about figuring out if a sentence is positive, negative or neutral to the relevant subject. This is closely related to _subjectivity classification_, which distinguishes between objective and subjective sentences.
__Entity and aspect-level sentiment classification__ wants to discover opinions about an entity or it's aspects. For instance we have the following sentence, "The iPhone’s call quality is good, but its battery life is short", two aspects are evaluated: call quality and battery lifetime. The entity is the iPhone. The iPhone's call quality is positive, but the battery duration is negative. It's hard to find and classify these aspects, as there are many ways to express positive and negative opinions, metaphors, comparisons and so on.
#### Sentiment Lexicon and Its Issues
Some words can be identified as either positive or negative immediately, like good, bad, excellent, terrible and so on. There are also subsentences/phrases that can be identified as positive or negative. A list of such words or sentences is called a sentiment lexicon.
Use of just this is not enough. Below are several known issues:
1. Words and phrases can have different meaning in different contexts. For example, "suck" usually has a negative meaning, but can be positive if put in the right context: "This vacuum cleaner really sucks".
2. Sentences containing sentiment words sometimes do not reflect a sentiment. For example, "If this product X contain a _great_ feature Y, I'll buy it". _Great_ does not express a positive or negative opinion on the product X.
3. Sarcastic sentences are hard to deal with. They are mostly found in political discussions, e.g. "What an awesome product! It stopped working in two days".
4. Sentences without sentiment words can also express opinions. "This laptop consumes a lot of power.", reflects a partially negative opinion about the laptop, but the sentence is also objective as it states a fact.
The Semantic Web is an extension of the normal Web, that promotes common data formats and exchange protocols. The point of Semantic Web is to make the internet machine readable.
Semantic Web uses languages specifically designed for data:
- Resource Description Framework (RDF)
- Web Ontology Language (OWL)
- Extensive Markup Language (XML)
Together these languages can describe arbitrary things like people, meetings or airplane parts.
An ontology is a formal and explicit definition of types, properties, functions, and relationship between the entities, which result in an abstract model of some phenomena in the world. Typically an ontology consist of classes arranged in a hierarchy. There are several languages that can describe an ontology, but the achieved semantics vary.
RDF is a data-model used to define ontologies. It's a model that is domain-neutral, application-neutral, and it supports internationalization. RDF is an abstract model, and therefore not language-specific. XML is often used to define RDF data, but it is not a necessity.
The core of RDF is to describe resources, which basically are "things" in the world that can be described in words og relations to other resources etc. RDF consists of a lot of triples that together describe something about a particular resource. A triple consist of:
* Definition: (Subject, Predicate, Object)
* Typical: (URI, type, value)
* Example: (lit:J.K.Rowling, lit:wrote, lit:HarryPotter)
A RDF statement maps a value to a resource with use of a predicate. In the figure you can see that _http://www.w3.org/People/EM/contact#me_ has the property _fullName_ defined in _http://www.w3.org/2000/10/swap/pim/contact_, which is set to the value of _Eric Miller_.

Resources are objects, informally referred to as "things". Examples are authors, apartments, people, hotels and search queries.
Every resource has a URI - a unique resource identifier. This could for example be an ISBN-number for a book, a URL for a web page or coordinates for a location.
The properties describe the relationships between the resources.
For example Anna is a _friend of_ Bruce, Harry Potter and the Philosopher's Stone was _written by_ J.K. Rowling or Oslo is _located in_ Norway.
The properties is marked with italic font. Properties are also described by a URI.
RDF statements assert the properties of resources and are in the form subject-predicate-object, a.k.a. a __triple__.
The subject denotes the resource, the predicate expresses a relationship between the subject and the object.
N-triples, Turle, RDF/XML, RDF/JSON.
N-triples
: It is plain text format for encoding an RDF graph.
Turtle
: Turtle can only serialize valid RDF graphs. It is generally recognized as being more readable and easier to edit manually than its XML counterpart.
RDF/XML
: It expresses RDF graph as XML document.
RDF/JSON
: It expresses RDF graph as JSON document.
RDF Schema (RDFS) enriches the RDF data model, adding vocabulary and associated semantics for:
- Classes and subclasses
- Properties and sub-properties
- Typing of properties
It supports describing simple ontologies, but uses logic-oriented approach and "open world" semantics, meaning anyone can make statements about any resource
RDF does not assume or define semantics, but it can be done in RDFS, using:
- Classes and Properties
- Class Hierarchies and Inheritance
- Property Hierarchies
RDFS adds taxonomies for classes and properties (subClass and subProperty) and restrictions (domain and range constraints on properties).
Following terms are introduced in RDFS (each has a
meaning with respect to the RDF data model):
- terms for classes
- terms for properties
- terms for collections
- special classes
- special properties
## Basics of Sentiment analysis
Sentinemt analysis is important because opinions drive decisions. People will seek other people's opinions when buying something. It is useful to be able to mine opinions on a product, service or topic.
There are two types of evaluation:
- Regular opinions ("This camera is great!")
- Comparisons ("This camera is better than this other camera.")
The basic components of an opinion are:
- Opinion holder (Whoever holds the opinion)
- Entity (The entity on which the opinion is expressed)
- Opinion (The actual opinion being held)
An entity can be divided into subcomponents, like a camera having both a lens and a battery. These subcomponents are called _aspects_.
## Sentiment analysis models
### Model of an entity
### Model of a review
### Model of an opinion
# Recommender systems
## Problem domain
Recommender systems are used to match users with items. This is done to avoid information overload and to assist in sales by guiding, advising and persuading individuals when they are looking to buy a product or a service.
Recommender systems elicit the interests and preferences of an individual, and make recommendations. This has the potential to support and improve the quality of a customers decision.
Different recommender systems require different designs and paradigms, based on what data can be used, implicit and explicit user feedback and domain characteristics.
## Purpose and success criteria
Different perspectives/aspects:
– Depends on domain and purpose
– No holistic evaluation scenario exists
Retrieval perspective:
– Reduce search costs
– Provide "correct" proposals
– Users know in advance what they want
Recommendation perspective:
– Serendipity – idendify items from the Long Tail
– Users did not know about existence
Prediction perspective:
– Predict to what degree users like an item
– Most popular evaluation scenario in research
Interaction perspective
– Give users a "good feeling"
– Educate users about the product domain
– Convince/persuade users - explain
Finally, conversion perspective
– Commercial situations
– Increase "hit", "clickthrough", "lookers to bookers" rates
– Optimize sales margins and profit
## Paradigms of recommender systems
There are many different ways to design a recommendation system. What they have in common is they take input into a _recommendation component_ and uses it to output a _recommendation list_. What varies in the different paradigms is what goes into the recommendation component. One of the inputs to get personalized recommendations is a user profile and contextual parameters.
_Collabarative recommender systems_, or "tell me what is popular among my peers", uses community data.
_Content-based recommender systems_, or "show me more of what I've previously liked", uses the features of products.
_Knowledge based recommender systems_, or "tell me what fits my needs", uses both the features of products and knowledge models to predict what the user needs.
_Hybrid recommender systems_ uses a combination of the above and/or compositions of different mechanics.
## Collaborative filtering
## Content-based flitering