Federated Learning with Unannotated Data

Undergraduate Thesis

Abstract

Federated Learning (FL) enables machine learning at edge devices without sharing private user data with untrusted third parties. While traditional FL relies on supervised methods and annotated data, our approach focuses on learning representations from unlabelled data for downstream tasks.

We introduce IS-FedVAE, a novel FL framework using importance sampling to federate a global Variational AutoEncoder (VAE). This framework learns a global latent space distribution from local edge distributions. The representations are evaluated using a linear probe, showing superior performance compared to state-of-the-art unsupervised FL baselines. IS-FedVAE is scalable, robust to heterogeneity, and efficient across varying numbers of clients and local epochs.

Key Highlights:

  • Innovative Algorithm Design: Utilized VAEs to model clients' unlabelled, non-IID data as Gaussian distributions.
  • Advanced Aggregation Technique: Employed mean-field formulation and importance sampling for precise loss computation.
  • Performance Excellence: Achieved faster convergence and higher accuracy, surpassing SoTA FL baselines.

This work, published in ICASSP 2024, combines cutting-edge machine learning with robust mathematics, advancing FL for personalized and unsupervised tasks.

Paper
Github
Poster
Thesis