Recommender systems (Part 1) – Content-based Recommenders

9 minute read

Updated: October 26, 2020

CBF when user model is based on keywords
CBF when user model is based on Facets, values, ratings
CBF when user model is based on Graphs/ concepts

Recommender systems: Introduction

Problem: Too much content and choices
- You want to direct the user and narrow to one or two arrows and to not overload the user the simple view of what the recommender system is doing

Recommender systems

Families of the recommender system – to give an overview of algorithms of the recommender sys
Content-based recommenders
- Need to know about the content we are recommending Need representation of the content
- Either have to obtain content and have an algo to analyse the content
- E.g. email filer – need to know the content of the email, whether it should be filtered or starred
- E.g. clipping services – how to cut the part of the content – just need to read about the simple things
Knowledge based recommenders
- Also need to know where to direct the user to
- Also brings in external world knowledge – so that you can be more precise about recommending
- If I have additional knowledge of cars, that basis and the knowledge will become content and knowledge-based recommender
- E.g. buying guides for houses - If in addition to that you bring the world knowledge of types of objects, proximity to public services and etc,
Collaborative or social filtering
- It looks at social behaviours what other people has been doing – looks at attention of the others – and recommend what other ppl may like
- E.g. amazon’s suggestion lists, top n offers and promotions
Hybrid recommenders
- Clever way of hybridizing the algorithms
- If you have social data, you combine with content
- Combine content and knowledge etc

Content-based Filtering

So much content, and we want to narrow the choices of the user
If we want to develop content based filter, you need two things in advance to be existing so that you can develop
1. Content
  - Appropriate information about the content so that you can map to the content
2. User profile
  - Need to know who the user is – good description of the user so that you can map the content to the user
Then you can decide how each of the content is gonna map to the user
Not only the user model, it has to be related to the content I’m showing
- E.g. cars – and if you have the user model about the student grade, its irrelevant!! Need to know what kind of car, what brans they are royal to etc
Correspondences between the user profile and what you know about the content
Analogy: give me the content that I would like

Content-based filtering II

Recommendations based on the correlation between the content of the items and the user’s preferences
- E.g. recommend items similar to those I have bought or to my interests
Connection w the prev lectures – user models/ profiles
- Need to know how to extract the user models – explicit and implicit methods
- To come up with a more plausible user profiles or user stereotypes
Description of the content
- It has metadata of the description of the content
- Or think about ways of generating description from content information
  - AI algorithms – classify the content based on key facts – e.g. news about healthy living, politics, etc
  - May need to do that if the news don’t have predefined tags
- Key challenge – what if you have subjective opinions in the content without the metadata. How do we classify them?
  - E.g. post on sns – no metadata that describes what’s in the actual content. And you need to recommend the user the posts
  - E.g. how we book things – review – is subjective content. And recommend two or three reviews. Need to go into the reviews and see what makes a good review.
    - Processing that content and identifying textual features, then decide whether that’s a good review or not.
Need to think these before you go into the algorithm’s core part
Correlation btwn the content and the user

Architecture of CBF

1. When the user interacts through the interface, you have a clever way of obtaining the user model – through explicit by asking the user and implicit by analysing the interaction
2. Then you have to have a description of the content – do I have the description? Is this reliable? Anything else I need?
3. Content filtering - Do you have the areas of the content? There has to be a correspondence
4. Recommender then takes the description of the content and the user model then starts recommending – identifies the one that is most relevant to the user
5. May consider user feedback and to change it, and continue with the recommendation
- Q: HOW can we do the recommendation?

CBF when user model is based on keywords

User model - Assume that our user model is using the keywords (t1,t2,t3, … ) – have list of words from user profile that they are interested in, and we need to find the weights for each of the keywords
- For each of the terms, what are the weights
- what we know about that ONE user in terms of term #2?
  - Wishlist, clicking, etc then we have to identify the weights through TFIDF
- Match u1 with t1 and so on
Item vector
- For each item in the content we have to get a vector that corresponds to that term
- target terms, and how relevant those terms are to the item – for one
  - There are multiple items in one content
  - How close is this item form what the user wants similarity
- item, we have w1, w2 etc – how relevant this item is to the term
How to identify the key terms and come up with numeric values for users and for items how we come up with the weights
The weights the input for the algo for the Content based filtering
Calculating similarity
- The similarities between the two vectors of the same size
- How close two things are together
- Remember cosine similarities

CBF when user model is based on Keywords II

For the first term, it has weight w11, for second term, w12 etc
The items are now the input to the recommender
How close the user is to the items
Items to recommend: sort it by similarities and take top k items

E.g. How can we personalize student news feed? - Keywords

ratings
- 0 – term not relevant
- 1 – somehow relevant
- 2 – highly related

Identify the terms
- (study (t1), sport (t2), well-being(t3), volunteering (t4), IT Services(t5), union(t6), equality & inclusion(t7)
User – how interested the student is in mentioned terms
- U = (1, 0, 1, 2, 1, 0, 1)
Item – how related the news items are to the term; so we take one item PER item for each item vector
1. I1 = (1,0,2,0,0,1,1)
2. I2 = (2, 0, 1,0, 0, 0, 1,0)
3. I3 = ()
So we do similarities between User U and I1, U and I2, U and I3. Then we sort it, take top items that are the most similar match up user with each item vector
- Q: how do you come up with the terms? How do you come up with the user preference?
- A: we do explicit profiling – we ask the users to input the values
- And implicit profiling – what news user is clicking on
- You can improve the User preference that was given by the user, with results of implicit profiling

We start with the simple stereotype, and combine several stereotypes to come up with an inferred user model
User model
- User model is values, facets and ratings that are relevant to the content
- With facets AND values, we can specify it more now
- User model is obtained from stereotypes which gives some facets with values and ratings.
- U = (<u1v1r1>, <u2v2r1>, ..) - values and ratings related to the content
- U = (<study, UG, 0.7>, <study, full-time, 0.7>, <sport, team sport, 0.8>, <well-being, food, 0.5>)
Item representation
- For the item, we need to know whether specific values hold for the facets
- I = (<f1v1i1>, <f2v2i2>, …) - facet, values and item relevance
- I = (<study, UG, 1>, <Study, MSc,0>, <study, phD,0>, <study, full-time,1>)
Q: Now I have the user profile and description of each of the item. How do I match up these two?
- A: Relevance with user and item
Calculate Relevance
- Need some mechanism to calculate the relevance between you and I
- ! rating * item
- So calculate 0.7 *1 + 0.7*0 + …
- You have a representation of the user and the item, and you need a clever way of doing so

E.g. How can we personalize student news feed? – Facet, Values and ratings

We need to decide what the facet and the values are =

CBF when User Model is based on Facets, Values, Ratings (cont)

We have the user U, with user, value and ratings
We have items with facet, values and item relevance
Recommender calculates the relevance between the you and each of the items I1,I2, Im etc
Recommend top k

CBF when user model is based on Graphs/ concepts

There’s a graph representing the domain model
- Concepts and links related to the content
You can have more knowledge about the facets, more value tuned, it can be more spec
- Team sport – what kind of team sport?
It is a taxonomy, explaining various relationships – which gives a hierarchy and it allows to connect things
User model
- you = (u1,u2,un)
- We would know what is relevant to the user blue is relevant
  - We have derived the positive hit of the user
Item model
- I = (c1,c2,cn)
- What I know about the content; What item is the most relevant
I have overlaid the user model on the content model (item model), and now I need to decide what that I overlaid on the graph is related
Cleverness comes in here
- Can think about similarities, similarity could be the part in the graph – if they have the same parent, OR they are similar bc one is a parent and one is a child node
How similar the red dots are to the blue dots

CBF when User Model is based on Graphs/Concepts II

Red – items with concepts
- Could have overlaps between different items
Once you have both, calculate the relevance, sort it then recommend it

Pros and cons of CBF

Pros
- It focuses on just the user
- Fairly easy to implement – once you have managed to obtain user and content description, can easily compare
- ** Can explain to the user why they are recommending this – bc I have this facet and value, and that it is related to the item’s ratings
- Applicable in a range of contexts
  - News
  - YouTube – you watch one video and they show you more of them
  - Travel recommendations
limitations
- Computationally complex - there are too many checks; similarity / relevance
- Filter bubble – it might be too focused; it forces user into that one category
- Cold start – you don’t have a user model to begin with
- Reliability of user profiles – it relies on the user
- Requires content description – facets, and have to do initial tagging; have to extract it automatically somehow

Improve Efficiency: other factors

Novelty/ surprise of an item
- Existence of information that is new to the user (e.g. in learning, video streaming
Proximity
- The number of links it takes to navigate from the current page to the page with the item
Context relevance
- How relevant is the item to the current items the user is interacting with (e.g. in news recommender systems)

Summary: content-based filtering

CBF was the first recommender systems that is used in a number of practical applications
Focus on the user – builds on user profiling \&Need description of content
Main advantage – explanation of recommendations
Main limitation – filter bubble

CBF when user model is based on keywords

CBF when user model is based on Facets, values, ratings

CBF when user model is based on Graphs/ concepts

Leave a comment