User Models and Profiles (building)
Updated:
- User Information Collection
1.1 Explicit user information collection
1.2 Implicit user information collection
1.3 Techniques for implicit user information collection - Step 1. User identification
- Step 2. User Model Construction
3.1 Building keyword-based user model
3.2 Building graph-based user model
3.3 Building concept-based user model - Summary
User Information Collection
Explicit user information collection
- Information entered by the user, via HTML forms (self-report or self-assessment)
- Data contains:
- Demographics such as bday, marriage, hob, personal status
- Explicit feedback
- Rates some links on a page (e.g. Syskill \&Webert) it recommends other links which they might be interested in
Pros | Cons |
---|---|
|
|
Implicit user information collection
- Constructed based on implicitly collected information – implicit user feedback
- Is collected by the system
- Collected on user’s client machine or application server
- Uses digital traces of UI – comments we write, news we read, items we click, video we watch
- May add additional information about user device
- user is not explicitly aware that we are collecting information
Pros | Cons |
---|---|
|
|
Techniques for implicit user information collection
Technique | Information collected | Information breadth | Pros | Cons |
---|---|---|---|---|
Browser cache | Browsing history | Any Websites |
|
|
Interaction/ desktop agents | Interaction and user activity
|
Any personalised application |
|
|
Logs (web logs, search logs) | Browsing/ search activity
|
Websites/ search engine sites that are logged |
|
|
File transfer from one app to another | Previously stored information
|
Application/ organisation specific | Has to be something that already exists so could not be much | |
Mobile/ wearable sensors | Contextual such as GPS; physiological & psychological states | Anywhere / anytime the user has the devices on |
|
|
Emerging – speech/ comments on social media | Sentiments, viewpoints, interests | Social media platforms |
|
Email dictation – emails now reside somewhere in the server
|
Step 1. User identification
- Once you decide the information collection method, who is the user? How do we identify the user we are building the model for?
-
Crucial for any system that constructs profiles that represent individual users
- Methods for user identification
- Software agents
- Small programs that reside on the user’s computer
- Collect information about the user and share with a server via some protocol
- + 1st reliable - more control over the implementation and protocol used for identification
- - requires user participation to install the software
- Logins
- + Better accuracy and consistency – track across sessions and btwn computers
- + Can access information from different computers
- + Knows who the user is and can control who they are
- + With user consent consented
- 2nd reliable
- - user must create an account via registration, login and log out burden on user
-
Cookies
- Easiest and most widely deployed – transparent to the user
- - Poor accuracy due to multiple users – then it becomes privacy violation
- - if user uses more than one computer, it will create separate user profile
- - if user clears cookie its reset
-
Session IDs
- Activity during the visit is tracked
- + All the browsers are using it
- + Good for searches – look at the session for short time and start recommending (adapting)
- + Doesn’t violate privacy – no need to record bc you are only looking at the current session
- - Not a long-term user model
- Enhanced proxy servers
- - Require that users register their machines with proxy server
- - Generally only able to identify users connecting from only one location, unless they bother registering diff comps with same proxy
- Software agents
Step 2. User model construction
- Next step of constructing a user model Need to think about techniques we are going to use
- We need to take input information about the user – data mining skills.
- Also need to take into account not only what comes in but
what you are going to look in the user
model (user model representation)– which part is related
to my user model
- E.g. modelling emotional state – what info I captured is related to this?
- What’s the model, what comes in, which of the info will give me the final info model?
- Conduct appropriate processing – taking the info and need to derive
processing to come up with a model
- If the model is binary, e.g. if the user is active or inactive – this could become a classifier
- If you are looking for several parameters – might need to do
other processing,
- One way is to overlay the user model – aggregating user model – looking at frequencies or inferences
- Extract the user model! Final outcome
Building keyword-based user model
- Initially created by extracting keywords from web pages collected from some information source (e.g. browsing history) - The user is browsing the web
- Through browsing agent, If the user is clicking the document, then you pull the document
- From these positive feedback documents, look at the text it has
which is what user have possibly read
- Positive feedback document – represents user’s interest
- From these documents, extract keywords of the document and weight them using TF*IDF (Term frequency inverse document frequency)
Input and output
- Input - Unpacking the documents, which is what user has read
- What we want – list of keywords k1,k2,k3,…
Steps
- TF - We unpack the documents then count the frequency for
each word in each of the document
- Some terms will only be in specialised documents more important!
- IDF - Inverse document frequency – count how many of these
documents represent a particular term, for all terms in all
documents
- Which tells the weight of this term in the document space!
- TFIDF Multiply both to find TFIDF
- Title and heading words are identified and weighed more highly - Term with highest TFIDF core terms. And we need smart ways of aggregating these core terms. i.e. based on similarities, based on overlap of the language - then there’s a user model !!
Building graph-based user model
-
Built by collecting explicit positive and negative feedback from users
-
Input: graph
- What are the user interests from the documents that we pulled this
is the POSTIVIE Example of user interests
- We are reliant on having reliable enough method to identify that a document is a positive i.e. how long they stayed, if they’ve shared, etc
-
Reminder: entities to be the nodes, and we have relationships between the nodes. We want to extract this graph
- Graph overlay – overlay the entities that users are interested
in output
- From the document, you need to be looking for concepts from the graph – world knowledge is usually given, so rather than counting the term, in here we look for concepts that are part of this graph need a diff approach !!
- Approach we use is semantic tagging
- There are libraries, and tools
- Take world knowledge from the world model and map it, and go through textural documents, and identify which of the annotated tags/ concepts are mentioned in the document
- Once semantic tagging is done, you have extracted in each document you are counting how often a particular concept has appeared.
- Then you can decide on the overlay. This is the graph-based profile
- First you need the graph as an input, and smartness is
how you do the tagging
- How? Looks through text and cuts it into words or phrases, (uni, bigrams) it maps to the graph. It may not be exact as what is in the graph, so we do approximate tagging. (similarities, synonyms, partial overlays)
Building concept-based user model
- Nodes represent abstract topics considered interesting to the user, rather than specific words or set of words
First method
- We take each document and do semantic tagging to get the overlay
- From then on, you need to come up with aggregated list of
concepts – look for top concepts. May overlay the graph
that are sparse or big.
- Might need to do pre-processing on the graph.
- E.g. Common categories, most frequent concepts to come up with list based the counting on the graph
- Might need to do pre-processing on the graph.
Second method
-
Identified positive documents, then based on that you need to identify what are the common things in these documents
- Can cluster the documents the most similar documents, then from this group, then extract topics for each of the clusters then come up with the top concepts as ur user model
- User modelling component is the red bits
- But the input needs to be reliable !! the positive examples
Summary
- User Information Collection
- Explicit: given by the user
- Implicit: monitoring what the user is doing, collected by the system
- If we do implicit information collection:
- Step 1: Identify the user
- Depends on the data collection
- Step 2: Construct the model
- Keyword-based
- Graph-based
- Concept-based
- We ned to think about how the model is represented and what the input data is.
- Step 1: Identify the user
Leave a comment