Evaluation of user adaptive systems

7 minute read

Updated:

At what stage is the system that you are evaluating?

  • System prototypes - Formative evaluation
    • Does the system work as intended? – algorithms and components of the system
    • Do the components perform adequately? – as expected?
    • Which design to choose (from several prototypes)? – if we have several options, which choice works the best?
  • Once we have a fully developed deployed system – summative evaluation
    • What are the benefits of personalisation?
    • Are there any potential drawbacks?

Layered evaluation

  • image
  • Consider each component of the system and see if each step works adequately → layered evaluation
  • Once all components are validated then we can evaluate the system as a whole
  • Steps
    1. Collection of input data
    2. Interpretation of the collected data
    3. Modelling of the current state of the world
    4. Deciding upon adaptation
    5. Applying adaptation decisions
    6. Evaluating adaptation as a whole

Example- 3Cixty app

  • Main methods of evaluation → 3Cixty
  • App that offers personalised city guide – it uses information about events, places in the city in a form of knowledge graph . and the personalised city guide uses this to suggest to people
  • System collects info about the user in 2 way
    • Explicit – before visiting the city, they can browse through the suggestion and add to Wishlist for the places they want to go to → can set up the user preference beforehand
    • Implicit – during system usage, based on user profile it starts recommending – then system recommends nearby places of the Wishlist. As user uses the system, it is gradually filtered
      • Where the user is, the path from where they are (destination and arrival places)
      • Can get list of recommended items, in terms of distance or can put important parameters in the front
    • As user uses the system, based on their choice and their addition to the Wishlist, it is gradually updated

1. Collection of input data – information about the user

  • Goal – to check the quality of raw input data
  • Criteria
    • Accuracy – is what we are creating about the user accurate on who they are and what they are interested in?
    • Latency – by the time we are using the user model, and then when we apply the model, there’s a lime lapse – so need to consider how the timing is when we collect about the user
    • Sampling rate – when we evaluate the input of the data, rather that data is representative about the user
  • We don’t know if the user is alone → important data missing ! the social aspect of the user is missing → can identify though the user test
  • Implicit data collection methods
    • Data mining – if we have a lot of data, with this you can explore the input data to see whether u’re getting all the possible restos have been picked – or only the most popular places?
    • Cross validation – used when you try to predict (by combining to user interest) to see what ur picking matches the user interest
  • Explicit data collection method
    • Explicit – user selecting the Wishlist and filter
    • *User testing – can show the prototype or with persona users, where we present persona
    • *Simulated user - to test this persona method
  • In this case, u’ll be checking in regards to accuracy (what they entered is what they got), latency (app is used before they visit the city), is there any other data that we are missing

2. User model acquisition

  • How u’re acquiring the user model
  • Goal – check that input data is interpreted correctly
  • Criteria
    • Validity – if interpretation is valid. If the click is indicated as user interest
    • Predictability – when we look at user model, to see if we will be predicting a particular parameter and how valid
    • Scrutability – the accountability and transparency of algorithms. People would like to know how user model acquisition is working and how particular it works → challenging but shouldn’t be ignored
  • Methods
    • **Data mining - most appropriate because we collect a lot of data- explore the data we capture
    • Cross validation – if we are predicting correctly what we want about the user
    • Heuristic evaluation – use for scrutability and validity – what we want this component to be doing, and ensure that the heuristics are met
      • E.g. heuristics = fairness. Have to make sure info is processed fairly so it doesn’t advantage certain people
    • User test – will be moving back to data mining, BUT with a clear indication of user is and the prediction
    • Simulated users – can hire users and give them several scenarios and ask them to perform the task. But is a good compromise before going into real world

3. User model

  • Goal – check that the constructed user model represents the users
  • Prime criteria
    • Validity
    • Predictability – does this predict the user behaviour
    • Scrutability – will look at the user model – is it possible to open the user model and will they understand the user model to validate the parameters the system must use
  • Secondary criteria
    • Consciousness – are we collecting ethically appropriate data? Are these relevant for the user model? Are we doing anything useless?
    • Comprehensiveness – can the user understand that
    • Precision and sensitivity – close to validity, how accurate we are about the characteristics and are there any groups where our system doesn’t work very well
  • Methods
    • Bc we are looking at implicit user modelling, its about data collected about the user → hence data driven evaluation (lec 16)
      • Predictability, and how accurate that is regarding what we predicted about them
    • Cross validation – allows us to reshuffle the data, and we train particular user and test different users
    • Heuristic evaluation – bringing the user aspect of the user model – we will have some criteria and we inspect the user model
      • E.g. system will collect my choices for restos and destination – but if its collecting all my travel, it will evaluate my travel pattern which is unnecessary – with heuristic, you’ll identify that irrelevant data is collected
    • User test – need real user to test the comprehensiveness esp with user with impairments, if they are disadvantaged in any way
    • Simulated users
    • Focus groups – similar to heuristic, and this brings people together to see what people are thinking – scrutability, comprehensiveness etc
    • User as wizard – will help with more technical evaluation – user is imagining them as a system, and offering what the system should be doing in that situation.
      • Based on this it checks the actual system, or can use user suggestion to inspire the algorithm

4. User model application (deciding upon adaptation)

  • Goal – determine whether the adaptation decisions made are optimal and appropriate
  • Criteria
    • Appropriateness– was it appropriate way to apply the user model
    • Acceptance – is this acceptable for the user
    • Predictability – will this adaptation predict how the user will respond?
    • Scrutability – is it easy to understand?
    • Breadth of experience – does it limit anything?
  • Methods
    • Data mining, Cross validation → check for appropriateness & predictability. In what cases adaptation worked?
  • To check for acceptance, scrutability and breadth of experience → need human aspect
    • Heuristic evaluation/ walkthrough -
    • User tests, simulated users
  • Taking user perspective
    • Focus groups
    • User as wizard

5. Applying adaptation decisions (user interface)

  • Goal – determine whether the implementation of the adaptation decisions is optimal
  • Will consider the wishlist, nearby events and places, paths from the place where the user is, list of places of interests that has been suggested
  • Criteria -more at the user perspective – choose 1 or 2
    • Usability – mainly this
    • Timeliness – how timely the path is etc
    • Obtrusiveness – task is interrupted, may find characters presented is obtrusive, language etc
    • Acceptance
    • Predictability
    • Breadth of experience
  • Methods
    • Heuristic evaluation, cognitive walkthrough – can be conducted by experts or developers, assuming they are users. Then they can go through the app to see what was given
      • Timeliness, obtrusiveness
      • Hard to evaluate acceptance
    • User test – usability and acceptance
    • Focus group – can evaluate obtrusiveness and breadth of experience
    • User as wizard – can ask they are the adaptation – the path, the best choice justification

6. Evaluating the system as a whole

  • Goal – to evaluate the utility of the system (benefits to the user and businesses) AND drawbacks !!!!
  • Typical benefits
  • User perspective – through user driven evaluation – AB testing or same with added system
    • *Efficiency (performing tasks quicker) – Whether they’ve found the resto quickly
    • * Effectiveness (performing tasks better) - if they found the best choice
    • User satisfaction (users may be happier with personalised systems)
    • Trust (users may feel more valued) – do they trust what the system is offering
    • Accessibility (all categories of users catered for) – path, do we take into account movement impairments
  • Business perspective
    • Improved revenue (e.g. in e-commerce) – will this added feature improve, will this lead to better learning
    • Customers’ loyalty (e.g. in e-commerce) – will they come and use the system again, will they recommend it?

Typical usability threats

image

Leave a comment