Recommender systems: Introduction
- Problem: Too much content and choices
- You want to direct the user and narrow to one or two arrows and to not overload the user the simple view of what the recommender system is doing
Recommender systems
Families of the recommender system – to give an overview of algorithms of the recommender sys
- Content-based recommenders
- Need to know about the content we are recommending Need representation of the content
- Either have to obtain content and have an algo to analyse the content
- E.g. email filer – need to know the content of the email, whether it should be filtered or starred
- E.g. clipping services – how to cut the part of the content – just need to read about the simple things
- Knowledge based recommenders
- Also need to know where to direct the user to
- Also brings in external world knowledge – so that you can be more precise about recommending
- If I have additional knowledge of cars, that basis and the knowledge will become content and knowledge-based recommender
- E.g. buying guides for houses - If in addition to that you bring the world knowledge of types of objects, proximity to public services and etc,
- Collaborative or social filtering
- It looks at social behaviours what other people has been doing – looks at attention of the others – and recommend what other ppl may like
- E.g. amazon’s suggestion lists, top n offers and promotions
- Hybrid recommenders
- Clever way of hybridizing the algorithms
- If you have social data, you combine with content
- Combine content and knowledge etc
Content-based Filtering
- So much content, and we want to narrow the choices of the user
- If we want to develop content based filter, you need two things
in advance to be existing so that you can develop
- Content
- Appropriate information about the content so that you can map to the content
- User profile
- Need to know who the user is – good description of the user so that you can map the content to the user
- Content
- Then you can decide how each of the content is gonna map to the user
- Not only the user model, it has to be related to the content I’m
- E.g. cars – and if you have the user model about the student grade, its irrelevant!! Need to know what kind of car, what brans they are royal to etc
- Correspondences between the user profile and what you know about the content
- Analogy: give me the content that I would like
Content-based filtering II
- Recommendations based on the correlation between the content of the
items and the user’s preferences
- E.g. recommend items similar to those I have bought or to my interests
- Connection w the prev lectures – user models/ profiles
- Need to know how to extract the user models – explicit and implicit methods
- To come up with a more plausible user profiles or user stereotypes
- Description of the content
- It has metadata of the description of the content
- Or think about ways of generating description from content
- AI algorithms – classify the content based on key facts – e.g. news about healthy living, politics, etc
- May need to do that if the news don’t have predefined tags
- Key challenge – what if you have subjective opinions in the
content without the metadata. How do we classify them?
- E.g. post on sns – no metadata that describes what’s in the actual content. And you need to recommend the user the posts
- E.g. how we book things – review – is subjective content.
And recommend two or three reviews. Need to go into the
reviews and see what makes a good review.
- Processing that content and identifying textual features, then decide whether that’s a good review or not.
- Need to think these before you go into the algorithm’s core part
- Correlation btwn the content and the user
Architecture of CBF
- When the user interacts through the interface, you have a clever way of obtaining the user model – through explicit by asking the user and implicit by analysing the interaction
- Then you have to have a description of the content – do I have the description? Is this reliable? Anything else I need?
- Content filtering - Do you have the areas of the content? There has to be a correspondence
- Recommender then takes the description of the content and the user model then starts recommending – identifies the one that is most relevant to the user
- May consider user feedback and to change it, and continue with the recommendation
- Q: HOW can we do the recommendation?
CBF when user model is based on keywords
- User model - Assume that our user model is using the
keywords (t1,t2,t3, … ) – have list of words from user profile
that they are interested in, and we need to find the weights for
each of the keywords
- For each of the terms, what are the weights
what we know about that ONE user in terms of term #2?
- Wishlist, clicking, etc then we have to identify the weights through TFIDF
- Match u1 with t1 and so on
- Item vector
- For each item in the content we have to get a vector that corresponds to that term
target terms, and how relevant those terms are to the item – for one
- There are multiple items in one content
- How close is this item form what the user wants similarity
- item, we have w1, w2 etc – how relevant this item is to the term
- How to identify the key terms and come up with numeric values for users and for items how we come up with the weights
The weights the input for the algo for the Content based filtering
- Calculating similarity
- The similarities between the two vectors of the same size
How close two things are together
- Remember cosine similarities
CBF when user model is based on Keywords II
- For the first term, it has weight w11, for second term, w12 etc
- The items are now the input to the recommender
- How close the user is to the items
- Items to recommend: sort it by similarities and take top k items
E.g. How can we personalize student news feed? - Keywords
- ratings
- 0 – term not relevant
- 1 – somehow relevant
- 2 – highly related
- Identify the terms
- (study (t1), sport (t2), well-being(t3), volunteering (t4), IT Services(t5), union(t6), equality & inclusion(t7)
- User – how interested the student is in mentioned terms
- U = (1, 0, 1, 2, 1, 0, 1)
- Item – how related the news items are to the term; so we take
one item PER item for each item vector
- I1 = (1,0,2,0,0,1,1)
- I2 = (2, 0, 1,0, 0, 0, 1,0)
- I3 = ()
- So we do similarities between User U and I1, U and I2, U and I3.
Then we sort it, take top items that are the most similar match up
user with each item vector
- Q: how do you come up with the terms? How do you come up with the user preference?
- A: we do explicit profiling – we ask the users to input the values
- And implicit profiling – what news user is clicking on
- You can improve the User preference that was given by the user, with results of implicit profiling
CBF when user model is based on Facets, values, ratings
We start with the simple stereotype, and combine several stereotypes to come up with an inferred user model
- User model
- User model is values, facets and ratings that are relevant to the content
- With facets AND values, we can specify it more now
- User model is obtained from stereotypes which gives some facets with values and ratings.
- U = (<u1v1r1>, <u2v2r1>, ..) - values and ratings related to the content
- U = (<study, UG, 0.7>, <study, full-time, 0.7>, <sport, team sport, 0.8>, <well-being, food, 0.5>)
- Item representation
- For the item, we need to know whether specific values hold for the facets
- I = (<f1v1i1>, <f2v2i2>, …) - facet, values and item relevance
- I = (<study, UG, 1>, <Study, MSc,0>, <study, phD,0>, <study, full-time,1>)
- Q: Now I have the user profile and description of each of the item.
How do I match up these two?
- A: Relevance with user and item
- Calculate Relevance
- Need some mechanism to calculate the relevance between you and I
- !
rating * item
- So calculate 0.7 *1 + 0.7*0 + …
- You have a representation of the user and the item, and you need a clever way of doing so
E.g. How can we personalize student news feed? – Facet, Values and ratings
- We need to decide what the facet and the values are
CBF when User Model is based on Facets, Values, Ratings (cont)
- We have the user U, with user, value and ratings
- We have items with facet, values and item relevance
- Recommender calculates the relevance between the you and each of the items I1,I2, Im etc
- Recommend top k
CBF when user model is based on Graphs/ concepts
- There’s a graph representing the domain model
- Concepts and links related to the content
- You can have more knowledge about the facets, more value tuned, it
can be more spec
- Team sport – what kind of team sport?
- It is a taxonomy, explaining various relationships – which gives a hierarchy and it allows to connect things
- User model
- you = (u1,u2,un)
- We would know what is relevant to the user blue is relevant
- We have derived the positive hit of the user
- Item model
- I = (c1,c2,cn)
- What I know about the content; What item is the most relevant
- I have overlaid the user model on the content model (item model), and now I need to decide what that I overlaid on the graph is related
- Cleverness comes in here
- Can think about similarities, similarity could be the part in the graph – if they have the same parent, OR they are similar bc one is a parent and one is a child node
- How similar the red dots are to the blue dots
CBF when User Model is based on Graphs/Concepts II
- Red – items with concepts
- Could have overlaps between different items
- Once you have both, calculate the relevance, sort it then recommend it
Pros and cons of CBF
- Pros
- It focuses on just the user
- Fairly easy to implement – once you have managed to obtain user and content description, can easily compare
- ** Can explain to the user why they are recommending this – bc I have this facet and value, and that it is related to the item’s ratings
- Applicable in a range of contexts
- News
- YouTube – you watch one video and they show you more of them
- Travel recommendations
- limitations
- Computationally complex - there are too many checks; similarity / relevance
- Filter bubble – it might be too focused; it forces user into that one category
- Cold start – you don’t have a user model to begin with
- Reliability of user profiles – it relies on the user
- Requires content description – facets, and have to do initial tagging; have to extract it automatically somehow
Improve Efficiency: other factors
- Novelty/ surprise of an item
- Existence of information that is new to the user (e.g. in learning, video streaming
- Proximity
- The number of links it takes to navigate from the current page to the page with the item
- Context relevance
- How relevant is the item to the current items the user is interacting with (e.g. in news recommender systems)
Summary: content-based filtering
- CBF was the first recommender systems that is used in a number of practical applications
- Focus on the user – builds on user profiling \&Need description of content
- Main advantage – explanation of recommendations
- Main limitation – filter bubble
