rri-204-2 [project transparency] #37

chrisdburr · 2023-02-13T10:14:46Z

The following root file has been created:

Title: Project Transparency
Module: Explainability (SAFE-D Module)
Skills Track: RRI
Section: 2

Tasks

Link to file: https://github.com/alan-turing-institute/turing-commons/blob/drafts/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

ClauFischer · 2023-02-15T11:02:14Z

@chrisdburr I think this sentence may be a bit confusing. It suggests that what has changed is people's purchasing behaviour (which may have happened), but what we have been exploring is the change in recommendations by the site's recommendation algorithm. We haven't yet established that this recommendations have then changed people's actual purchasing patterns. Unless I am missing something, I would suggest changing to something like "... that explains the change in the distribution of the system's recommendations".

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 48 in 45bc1be

    
           Here, it would be easy enough to identify that the variable `season` is an important feature used by the model that explains the change in people's purchasing behaviour.

@chrisdburr Again this seems to be another case where we are not keeping a clear enough distinction between the model's behaviour and the customers' final purchasing behaviours (which are of course correlated). I would suggest changing the sentence below to something like "Here, it is the customers' actual behaviour which has changed drastically (they are now purchasing less holiday packs). However, it is sensible for the data analysts to assume that there has been another underlying shift in the data distribution (similar to the seasonal shift above), which has changed the model's behaviour in a way that affects customer's final behaviour."

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 52 in 45bc1be

    
           Again, this may seem like another case where there is a need to explain the model's behaviour in terms of an underlying shift in the data distribution—as with the seasonal shift above.

@chrisdburr I have added an explanatory sentence after this:

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 59 in 45bc1be

Again, there is no fault with the model's parameters here, but the change in the data's distribution this time is also not a meaningful one.

@chrisdburr I think something may be missing to this phrase:

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 106 in 45bc1be

    
           - Determining the problem the system is designed to address: this task includes information about why the problem is important and why the technical description (e.g. translation of the set of input variable into target variables is adequate for the problem at hand)

chrisdburr · 2023-02-16T08:45:32Z

@ClauFischer, please take a look at this revised section when you have a chance:

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Lines 48 to 67 in 2a4f039

    
           Consider the following scenario.[^ambiata] 
        
           A team of data analysts who work for a travel booking website are asked to explain a model has drastically changed its predictions about customer purchasing behaviour. 
        
           Perhaps the model is recommending significantly more trips to beach resorts now instead of ski trips. 
        
           Here, if the features used by the model were investigated it would be easy enough to identify that `season` is a feature with high importance. 
        
           It is well known that customers alter their purchasing behaviour between seasons (e.g. Winter, Summer).[^example] 
        
           From this we could explain the change in the model's predictions, as a result of a significnat change in the data distribution, which itself is a representation of a change in the underlying phenomena (i.e. changing seasons). 
        
           Simple enough. 
        
           But now let's assume that there is another change in customer behaviour, but this time a significant drop in conversion rate (i.e. the ratio of the number of people who view, say, a holiday deal, to the number who actually purchase the holiday) suddenly drops. 
        
           That is, customers are not just booking different holidays, they are not booking as many holidays at all. 
        
           Again, this may seem like another case where there is a need to explain the model's behaviour in terms of an underlying shift in the data distribution, which is in turn representative of some change in the underlying phenomena. 
        
           However, this time, let's pretend that the problem turns out to be a fault with a third-party piece of software, used as a dependency in the team's data pipeline, which is now causing the data about a user's `location` to be incorrectly recorded. 
        
           As it turns out, the company's model has learned that those who live in affluent neighbourhoods are more likely to purchase more expensive packages, and the company's recommendation system uses this to show customers holidays that are in their predicted price range, or dynamically alter the price of holiday packages based on their estimated"willingness-to-pay"—two ethically dubious practices known as personalised and dynamic pricing[^pricing]. 
        
           However, due to the aforementioned fault in the data pipeline, all customers are now being shown the same, more expensive, holiday deals because their `postcodes` are all being recorded as all being located in affluent neighbourhoods. 
        
           As such, fewer customers are purchasing their packages, because they cannot afford them, and the conversion rate has dropped. 
        
           Again, there is no fault with the model (or its parameters). 
        
           Rather, the target of any explanation lies in the data and the generative mechanisms responsible for producing the data. 
        
           The model is still making the same predictions, but the predictions are now incorrect.

ClauFischer · 2023-02-16T10:32:18Z

@chrisdburr:

Here it says recommending when it should only be predicting at this stage.

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 50 in a9e0b01

Perhaps the model is recommending significantly more trips to beach resorts now instead of ski trips.

I think that if we are changing to a recommender system now we should say something about it being a different model (that it's now a recommender system and not only a predictive system).

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 58 in a9e0b01

    
           Again, this may seem like another case where there is a need to explain the model's behaviour in terms of an underlying shift in the data distribution, which is in turn representative of some change in the underlying phenomena.

Perhaps a footnote should be added to say that there may be fault in the fact that the model is discriminating based on locations, even though there is no error in the way the model is operating (it's the data pipeline). Although, on the other hand, you have already mentioned that the practice is ethically-dubious.

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 65 in a9e0b01

Again, there is no fault with the model (or its parameters).

Let me know what you think about these suggestions and I can draft some amendments and send them for your revision 😃

chrisdburr · 2023-02-23T09:23:46Z

I think I've answered all your comments. Main changes are as follows:

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 113 in bd0125f

    
           - Determining the Problem the System is Designed to Address: this task includes information about why the problem is important and why the technical description (e.g. translation of the set of input variable into target variables is adequate for the problem at hand). For instance, why a set of features about a candidate are adequate measures for assessing their `suitability for a job role`. Aside from the technical "solution" to the problem, there is also a social dimension that needs to be justified, such as why an automated system is appropriate for use in hiring decisions (e.g. the system is not biased against protected groups).

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Lines 48 to 68 in d27af9c

    
           Consider the following scenario.[^ambiata] 
        
           A team of data analysts who work for a travel booking website are asked to explain why a model has altered its predictions about customer purchasing behaviour. 
        
           This time, the model is used to drive a recommender system, which shows holiday packages to customers based on its predictions about which are most likely to be purchased. 
        
           Perhaps the system is recommending significantly more trips to beach resorts, whereas previously it was recommending ski trips. 
        
           Here, if the features used by the model were investigated it would be easy enough to identify that `season` is a feature with high importance for the model. 
        
           It is well known that customers alter their purchasing behaviour between seasons (e.g. Winter, Summer).[^example] 
        
           From this we could explain the change in the model's predictions, as a result of a significant change in the data distribution, which itself is a representation of a change in the underlying phenomena (i.e. changing seasons). 
        
           Simple enough. 
        
           But now let's assume that there is another change, which results in a significant drop in conversion rate (i.e. the ratio of the number of people who view, say, a holiday deal, to the number who actually purchase the holiday) suddenly drops. 
        
           That is, customers are not just booking different holidays, they are not booking as many holidays at all. 
        
           Again, this may seem like another case where there is a need to explain the model's behaviour in terms of an underlying shift in the data distribution, which is in turn representative of some change in the underlying phenomena. 
        
           However, this time, let's pretend that the problem turns out to be a fault with a third-party piece of software, used as a dependency in the team's data pipeline, which is now causing the data about a user's `location` to be incorrectly recorded. 
        
           As it turns out, the company's model has learned that those who live in affluent neighbourhoods are more likely to purchase more expensive packages, and the company's recommendation system uses this to show customers holidays that are in their predicted price range, or dynamically alter the price of holiday packages based on their estimated "willingness-to-pay"—two ethically dubious practices known as personalised and dynamic pricing[^pricing]. 
        
           However, due to the aforementioned fault in the data pipeline, all customers are now being shown the same, more expensive, holiday deals because their `postcodes` are all being recorded as all being located in affluent neighbourhoods. 
        
           As such, fewer customers are purchasing their packages, because they cannot afford them, and the conversion rate has dropped. 
        
           Again, there is no fault with the model (or its parameters). 
        
           Rather, the target of any explanation lies in the data and the generative mechanisms responsible for producing the data. 
        
           The model is still making the same predictions, but the predictions are now incorrect and the recommender system is now unable to recommend the correct holiday packages to customers.

ClauFischer · 2023-02-24T11:46:46Z

@chrisdburr Is this paper (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=36708ab24406aa3ec931fb9ba3f4a6cd7c3bd4b6) the one you want to refer to in this line

turing-commons/drafts/rri-skillstrack/rri-modules/root-files/rri-204-2.md

Line 164 in ce85f44

    
           [^pricing]: This example refers to a practice known as 'personalised pricing', or sometimes 'price discrimination'. Neither are new practices (see [here](https://www.washingtonpost.com/archive/politics/2000/09/27/on-the-web-price-tags-blur/14daea51-3a64-488f-8e6b-c1a3654773da/)), but the widespread use of algorithmic techniques is enabling more dynamic and hyper-personalised forms of both personalised pricing and price discrimination (see [this article](https://www.washingtonpost.com/archive/politics/2000/09/27/on-the-web-price-tags-blur/14daea51-3a64-488f-8e6b-c1a3654773da/)).

?

Right now, both links take you to the Washington Post article. I searched through our conversations and the paper I just posted was the one I sent as more academic. If it is not, let me know please so I can link to the correct paper.

chrisdburr · 2023-02-24T14:13:24Z

I think I was intending for the second link to be the Guardian article you shared: https://www.theguardian.com/global/2017/nov/20/dynamic-personalised-pricing

I guess I didn't copy/paste the link properly. Sorry.

chrisdburr added the new root file New root file created for review and conversion label Feb 13, 2023

chrisdburr assigned chrisdburr and ClauFischer Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rri-204-2 [project transparency] #37

rri-204-2 [project transparency] #37

chrisdburr commented Feb 13, 2023 •

edited by ClauFischer

Loading

ClauFischer commented Feb 15, 2023 •

edited by chrisdburr

Loading

chrisdburr commented Feb 16, 2023

ClauFischer commented Feb 16, 2023 •

edited by chrisdburr

Loading

chrisdburr commented Feb 23, 2023

ClauFischer commented Feb 24, 2023

chrisdburr commented Feb 24, 2023

rri-204-2 [project transparency] #37

rri-204-2 [project transparency] #37

Comments

chrisdburr commented Feb 13, 2023 • edited by ClauFischer Loading

Tasks

ClauFischer commented Feb 15, 2023 • edited by chrisdburr Loading

chrisdburr commented Feb 16, 2023

ClauFischer commented Feb 16, 2023 • edited by chrisdburr Loading

chrisdburr commented Feb 23, 2023

ClauFischer commented Feb 24, 2023

chrisdburr commented Feb 24, 2023

chrisdburr commented Feb 13, 2023 •

edited by ClauFischer

Loading

ClauFischer commented Feb 15, 2023 •

edited by chrisdburr

Loading

ClauFischer commented Feb 16, 2023 •

edited by chrisdburr

Loading