Welcome To Our Project
Applications of Fairness:

Label Bias and Recovery of Ground Truth

By: Lina Battikha and Sai Poornasree Balamurugan


Mentor: Babak Salimi

Introduction

Bias in artificial intelligence is a serious issue, particularly for minority groups, as models can unintentionally learn unfair patterns from sensitive features like gender or race. This project focuses on latent variable discovery to mitigate such biases by identifying hidden, unbiased outcome labels that are not influenced by sensitive attributes. The importance of latent variables extends to clustering, distribution shift, and fairness, with our work specifically targeting fairness in tabular datasets. By analyzing patterns among observed features, we aim to promote fair decision-making by ensuring that sensitive attributes do not lead to biased outcomes.

Latent variable modeling involves the identification and characterization of hidden variables that influence the distribution and outcomes of observed data. These variables are not explicitly provided in the dataset but can be inferred by analyzing patterns and relationships among the observed features. By uncovering latent variables, researchers gain insights into underlying structures and dependencies within the data, enabling more accurate modeling, improved predictions, and a deeper understanding of complex systems.

Methods

Our approach uses a similar encoder-decoder methodology as used in the Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders [3] while establishing independence between the sensitive feature and fair observed label as mentioned in the Group Fairness by Probabilistic Modeling with Latent Fair Decisions paper [1]. Given a dataset that has an identifiable sensitive feature S, an observed biased label Y, and non-confounding features X, we treat Y as the proxy variable that will be used in the model to recover the true fair label Y_f.


The model consists of three main components:
  1. Encoder: Forward pass of retrieving pseudo_Y, given X. pseudo_Y will be the intermediate label that will be used to establish independence with S. It works to avoid direct dependence on S.
  2. Adversarial Branch: This branch tests whether pseudo_Y retains information about S by attempting to predict S from pseudo_Y. If successful, it indicates dependence. To counteract this, the encoder learns to remove S-related information using a Gradient Reversal Layer (GRL), which flips the gradient to prevent encoding S in pseudo_Y. While the adversary minimizes the loss between predicted and true S, the GRL forces the encoder to maximize this loss, reducing dependencies. This adversarial training occurs during backpropagation, with the encoder and adversary working against each other. If pseudo_Y becomes independent of S, the adversary ultimately fails, achieving the desired outcome.
  3. Decoder: The decoder will try to predict Y given S and pseudo_Y. While we are trying to establish independence between Y and S by removing unfair dependencies of S on Y, some fair dependencies can still exist. For this reason, pseudo_Y and S are being used to predict Y so that the output includes accurate and fair results.

There are two scenarios which we applied this model to. The first scenario is where both the sensitive feature and the observed label are binary, meaning the values in each column are either 1 or 0. The second scenario is where the sensitive feature is binary (1 or 0), while the observed label is multi-class (not limited to 1 or 0).


The graph below is a representation the underlying causal graph of our model. We know that Y_biased is of the same distribution of Y_fair, however, with its probabilities affected by S (the senstive feature); we are able to recover Y_fair by establishing independence between S and Y_biased.




Datasets Implemented

We evaluate our model on both synthetic and real-world datasets.

  1. Binary Synthetic Dataset:A controlled dataset with 5,000 rows and 30 features, used to assess model performance under ideal conditions. It includes a sensitive binary feature S, a feature matrix X influenced by S, and a target variable Y derived from X with noise. Fair labels are unbiased, while unfair labels introduce bias, making Y = 1 more likely when S = 1.

  2. UCI Adult dataset: Census-based dataset with 32,561 rows and 5 non-sensitive features, used for income classification (>50K or ≤50K). It includes information about age, education, occupation, work class, marital status, and weekly hours worked, among other demographic and employment variables. The sensitive feature is gender (1 = male, 0 = female). It reflects real-world income disparities and is widely used in fairness research.

  3. COMPAS dataset: Criminal justice data used to determine the probability of recidivism (reoffending) based on demographic and criminal history criteria may be found in the COMPAS dataset, which is gathered by ProPublica [4]. The dataset has 7,214 rows and 5 non-confounding features, predicting recidivism (1 = reoffend, 0 = not). The sensitive feature is race (1 = African-American, 0 = otherwise). It highlights racial biases in predictive policing and algorithmic fairness debates.

  4. UCI Drug (Cannabis) Consumption Dataset This dataset is originally from UCI Machine Learning Repository [2] with 1,884 rows and 10 non-confounding features, predicting cannabis use across four levels. It includes information about how frequently an individual uses cannabis, along with personal information like gender, ethnicity, and country. The sensitive feature is education level (1 = higher education, 0 = no higher education). The use of this dataset is to predict how likely an individual is to consume cannabis.

Evaluation Metrics

Once we obtained the fair latent variable from the encoder-decoder model, we evaluated its performance using logistic regression. A baseline model was trained on X and observed Y, while another was trained on preprocessed X and fair labels (pseudo_Y). We evaluated performance using the following metrics:

  1. Demographic Parity Difference: Measures the absolute difference in class label proportions across sensitive groups, ideally close to 0 to ensure fairness.
  2. Area Under the Curve (AUC): Evaluates the model's ability to distinguish between classes, using a one-vs-rest approach for multi-class cases.
  3. Accuracy: The proportion of correctly predicted labels, providing an overall performance measure.

These metrics help quantify fairness and performance in the learned fair representations.

Results


Synthetic Dataset (Binary)


Fair model was able to retrieve high accuracy while significantly reducing demographic parity difference, compared to the injected bias scenario. The baseline scenarios serves as reference to show that the model effectively removes label bias and recovers fair labels. Observed label is binary.


UCI Adult Dataset (Binary)


On real-world data, the fair latent variable model reduces demographic parity difference while maintaining nearly the same accuracy. This demonstrates model’s ability to mitigate bias while preserving performance. Observed label is binary.


COMPAS Dataset (Binary)


Fair latent variable model effectively removes bias, reducing demographic parity difference, while maintaining the same AUC score. This indicates that model is able to balance fairness with predictive capabilities. Observed label is binary.


Canabis Consumption Dataset (Multiclass)


In a setting where the observed label is multi-class, the model reduces demographic parity difference, indicating the mitigation of label bias. However, there is a drop in the area under the curve (AUC) score, highlighting a common fairness-performance trade-off.


Discussion

We tested our model on both synthetic and real-world datasets for binary and multi-class labels to asses its efficacy and robustness. We evaluated fairness using demographic parity difference and performance using AUC or accuracy.

Binary Data Results
Our model effectively reduced demographic parity while maintain high performance. In synthetic data, it successfully removed injected bias, aligning accuracy and demographic parity difference with the original fair data. For real-world dataset:
UCI Adults: Maintained accuracy while reducing demograph parity by 50%.
COMPAS: Maintained AUC while lowering demographic parity difference by 30%.

Multi-Class Data Results
The fairness-performance trade-off was more pronounced. For the Cannabis Consumption dataset, demographic parity difference decreased by 33%, but AUC dropped by 13%, indicated the need for improvements in the multi-class settings.

Our model enables recovery of the fair label's distribution by enforcing independence between sensitive features and observed labels. However, this often comes at the cost of reduced performance, especially in multi-class cases.

EXPLORE ALL THE RESULTS YOURSELF! Find all our code HERE

Limitations & Future Work

Limitations: A key limitation was the limited availability of diverse, up-to-date datasets for evaluating fairness algorithms. While commonly used datasets serve as benchmarks, they may not fully capture real-world complexities. Additionally, identifying sensitive features is often challenging, and handling multiple sensitive attributes within a dataset adds further complexity.

Future Work: While our work this quarter demonstrated the model's efficacy on both synthetic and real-world datasets, future work needs to focus on improving its robustness. One key area is enhancing performance when the sensitive feature is binary, but the observed label is multi-class. More testing is needed on datasets with varying class counts, as our current multi-class evaluations were limited to four classes. Additionally, the model should be extended to handle cases where the sensitive attribute is not binary, for both binary and multi-class observed labels.
Another important step is evaluating Y_pred. While we focused on pseudo_Y to assess independence, real-world cases may involve fair dependencies between the sensitive feature and Y. Once independence is well established, evaluating Y_pred can help balance fairness and accuracy. Scalability should also be explored by testing on larger datasets beyond our current limit of ~32,000 rows to ensure reliability in real-world applications. Finally, it is crucial to develop guidelines on when to prioritize independence over accuracy and how to adjust the lambda value accordingly. These improvements will help refine the model for broader and more practical applications.

Our work focuses on developing a fairness-aware algorithm that discovers latent variables to mitigate biases in predictive models. By leveraging an encoder-decoder framework with adversarial biasing, we ensure that sensitive attributes do not unfairly influence predictions while preserving meaningful patterns in the data. Our approach is evaluated on well-established fairness datasets, demonstrating its ability to produce fair and unbiased labels. As data-driven decision-making continues to shape critical domains like healthcare and criminal justice, our method provides a robust solution for addressing structural biases and enhancing the trustworthiness of AI systems.

Refrences

[1] Choi, Y., Dang, M., & Van den Broeck, G. (2021). Group Fairness by Probabilistic Modeling with Latent Fair Decisions. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 12051-12059. Paper Link
[2] Khadija, Obey. 2022. “Drug Consumptions UCI Dataset.” Dataset Link
[3] Prashant, Parjanya Prajakta, Seyedeh Baharan Khatami, Bruno Ribeiro, and Babak Salimi. 2025. “Scalable Out-of-Distribution Robustness in the Presence of Unobserved Confounders.” Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS). Paper Link
[4] ProPublica. 2016. “COMPAS Recidivism Analysis.” Data Link
[5] UCI Machine Learning Repository. 1996. “Adult Data Set.” Data Link