Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compounding Factor (KL divergence) Computation in results.ipynb #1

Open
Bernhard-Steindl opened this issue Mar 5, 2024 · 0 comments

Comments

@Bernhard-Steindl
Copy link

Bernhard-Steindl commented Mar 5, 2024

Hello! 😊

Thank you very much for sharing the code you used for your paper on gender fairness in music recommendations. 🙏🏻
I have one question regarding the computation of the Compounding Factor metric resp. the Kullback-Leibler Divergence between the Population Distribution and the Metric Scores Distribution.

In the notebooks/results.ipynb file you define two functions in one cell:

def kl_event_diff(p, q):
    return - (p.loc['m'] * np.log2(p.loc['m']/q.loc['m'])) + (p.loc['f'] * np.log2(p.loc['f']/q.loc['f']))
def kl_divergence(p, q):
    return (p.loc['m'] * np.log2(p.loc['m']/q.loc['m'])) + (p.loc['f'] * np.log2(p.loc['f']/q.loc['f']))

The only difference between the two functions is that the first summand is preceded by a minus sign in the kl_event_diff function.
Later you just use the kl_event_diff function for computing the KL Divergence, and the kl_divergence function is not used at all in your notebook.

kl_diffs = groups_percentages_tmp.combine(compounding_factors_tmp,overwrite=False,func=kl_event_diff)

I could find the “Score Dist. (M/F)” values you used in your paper in tables also in the notebook output.
But, I could not find the “CompFct” values from your tables in the Jupyter notebook output.

The output for the Compounding Factor in the Jupyter notebook indicate that you indeed used the kl_event_diff function, instead of the kl_divergence function.
However, only if I use the formula of the function kl_divergence, I get the same values for "CompFact" as in your paper.

I wonder if the kl_event_diff function was inadvertently used in the result.ipynb, and if I am correct that the function kl_divergence was actually used for data analysis and reporting in the paper.
Is there a reason why the kl_event_diff function is used in the file instead of the function kl_divergence?

I guess the kl_divergence should actually be used for computing the Compounding Factor metric.

${\displaystyle CompFct^\mu = KL\Bigl(B||C^\mu\Bigl)}$
${\displaystyle KL\Bigl(P||Q\Bigl)=\sum _{x\in X}p(x)\log \left({p(x) \over q(x)}\right)}$

I appreciate your response. Thank you! ☺️
Best regards,
Bernhard


Example for computing CompFct for NDCG@10 for model ALS and STANDARD scenario (Paper Table 4).

Population distribution (B) = group percentages:
P = [m= 0.778929, f= 0.221071] (from the notebook output)
P = [m= 0.779, f= 0.221] (from the paper, rounded to 3 decimal digits)

ALS Score Distribution (Standard scenario):
Q = [m= 0.811982, f= 0.188018] (from the notebook output)
Q = [m= 0.812, f= 0.188] (from the paper, rounded to 3 decimal digits)

CompFct = ( P[m] * LOG2(P[m] / Q[m]) ) + ( P[f] * LOG2(P[f] / Q[f]) ) =

( 0.778929 * log2(0.778929 / 0.811982) ) + ( 0.221071 * log2(0.221071 / 0.188018) ) = 
= (-0.0467014006470215) + (0.0516508073193989) =
= 0.0049494066723774 ~= 0.005

The CompFct value 0.005 is written in the paper and the formula corresponds to kl_divergence.
But, if I were to use the kl_event_diff formula the result would be 0,09835220797.

Table 4 NDCG@10 CompFct results Image source:

Melchiorre, A.B., Rekabsaz, N., Parada-Cabaleiro, E., Brandl, S., Lesota, O., Schedl, M., 2021. Investigating gender fairness of recommendation algorithms in the music domain. Information Processing & Management 58, 102666. https://doi.org/10.1016/j.ipm.2021.102666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant