Datasets for Trustworthy ML
A curated collection of datasets commonly used for research and education in trustworthy machine learning.
Fairness Benchmarks
Traditional ML Datasets
Adult Income Dataset
UCI ML Repository | 48K samples | Tabular
COMPAS Recidivism
ProPublica | 7K samples | Tabular
Modern Fairness Benchmarks
Folktables
Stanford | Millions of samples | Census data
from folktables import ACSDataSource, ACSIncome
data_source = ACSDataSource(survey_year='2018', horizon='1-Year', survey='person')
- Best for: Large-scale fairness evaluation, intersectionality
CelebA
CUHK | 200K images | Face attributes
Robustness Benchmarks
Adversarial Robustness
CIFAR-10/100
University of Toronto | 60K images | Object recognition
ImageNet
Stanford | 14M images | Object recognition
Distribution Shift
WILDS
Stanford | Multiple domains | Distribution shift
ImageNet-C
UC Berkeley | 15 corruption types | Corrupted images
# Download from authors' website
import numpy as np
corrupt_data = np.load('imagenet_c/gaussian_noise/5/')
Privacy Benchmarks
Federated Learning
LEAF
CMU | Multiple tasks | Federated setting
# Use LEAF data loaders
from leaf.data_utils import read_data
clients, groups, data = read_data('femnist')
FLamby
Inria | Medical data | Cross-silo FL
from flamby.datasets.fed_heart_disease import FedHeartDisease
dataset = FedHeartDisease(center=0, train=True)
Differential Privacy
Adult + DP Mechanisms
Google | Various | DP training examples
from tensorflow_privacy.privacy.optimizers import dp_optimizer
optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
l2_norm_clip=1.0, noise_multiplier=1.1, learning_rate=0.01)
Interpretability Datasets
Feature Attribution
Boston Housing
UCI | 506 samples | Regression
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True) # Note: deprecated due to ethical concerns
Wine Quality
UCI | 6K samples | Regression/Classification
Computer Vision
ImageNet + Attribution
Google | 14M images | Saliency evaluation
Specialized Domains
Natural Language Processing
BOLD
Amazon | Text generation | Bias evaluation
Winogender
Johns Hopkins | Coreference | Gender bias
Healthcare
MIMIC-III
MIT | 40K patients | Clinical records
Finance
German Credit
UCI | 1K samples | Credit risk
Dataset Usage Guidelines
Fairness Analysis
- Identify protected attributes in the dataset
- Define fairness metrics appropriate for the task
- Evaluate intersectional effects across multiple attributes
- Consider historical bias in data collection
Robustness Testing
- Start with clean accuracy as baseline
- Apply systematic attacks with increasing strength
- Test multiple threat models (white-box, black-box)
- Evaluate on distribution shifts relevant to deployment
Privacy Evaluation
- Implement membership inference attacks as baseline
- Measure privacy-utility trade-offs across ε values
- Test reconstruction attacks where applicable
- Validate privacy guarantees with formal analysis
Ethical Considerations
Data Usage
- Consent: Ensure appropriate consent for research use
- Bias: Acknowledge limitations and potential biases
- Privacy: Follow data protection regulations (GDPR, etc.)
- Attribution: Cite original data sources appropriately
Sensitive Attributes
- Protected characteristics: Handle race, gender, etc. with care
- Intersectionality: Consider multiple overlapping identities
- Historical context: Understand societal biases in data
- Representation: Ensure diverse and inclusive datasets
Dataset Deprecations
Some datasets (e.g., Boston Housing) have been deprecated due to ethical concerns. Always check for recommended alternatives and consider the ethical implications of your dataset choices.
Last updated: December 2024