快速开始#

In this page, we introduce the provided quick start demos. And the start scripts for FL simulation system with FedLab in different scenario. We implement FedAvg algorithm with MLP network and partitioned MNIST dataset across clients.

Source code can be seen in fedlab/examples/.

下载数据集#

FedLab提供了对常用数据集的划分方法。 此外联邦划分的基线数据集LEAF [2] 也实现在FedLab benchmarks库中, 并从TensorFlow迁移到PyTorch。

Codes related to dataset download process are available at fedlab_benchamrks/datasets/{dataset name}.

  1. 下载MNIST/CIFAR10

$ cd fedlab_benchamrks/datasets/{mnist or cifar10}/
$ python download_{dataset}.py
  1. 联邦划分

运行如下Python脚本来生成划分文件。

$ python {dataset}_partition.py

数据划分脚本的源码:

import torchvision
from fedlab.utils.functional import save_dict
from fedlab.utils.dataset.slicing import noniid_slicing, random_slicing

trainset = torchvision.datasets.CIFAR10(root=root, train=True, download=True)
# trainset = torchvision.datasets.MNIST(root=root, train=True, download=True)

data_indices = noniid_slicing(trainset, num_clients=100, num_shards=200)
save_dict(data_indices, "cifar10_noniid.pkl")

data_indices = random_slicing(trainset, num_clients=100)
save_dict(data_indices, "cifar10_iid.pkl")

data_indices is a dict mapping from client id to data indices(list) of raw dataset. FedLab provides random partition and non-I.I.D. partition methods, in which the noniid partition method is totally re-implementation in paper FedAvg.

  1. LEAF数据集处理

请按照 FedLab benchmark 来学习如何生成LEAF相关的数据划分。

运行FedLab demos#

FedLab 提供标准异步和同步联邦的实现供用户使用或学习。本页仅介绍不同场景下的FedAvg的实现。

我们强烈建议用户尝试阅读源码去了解FedLab的架构和工作机制(我们认为我们的代码非常友好)。

1. 单机单进程#

Source code is under fedlab/examples/standalone-mnist. This is a standard usage of SerialTrainer which allows users to simulate a group of clients with a single process.

$ python standalone.py --total_client 100 --com_round 3 --sample_ratio 0.1 --batch_size 100 --epochs 5 --lr 0.02

or

$ bash launch_eg.sh

Run command above to start a single process simulating FedAvg algorithm with 100 clients with 10 communication round in total, with 10 clients sampled randomly at each round .

2. Cross-process#

Source code is under fedlab/examples/cross-process-mnist

启动一个联邦模拟系统:1个server和2个clients。

$ bash launch_eg.sh

The content of launch_eg.sh is:

python server.py --ip 127.0.0.1 --port 3001 --world_size 3 --round 3 &

python client.py --ip 127.0.0.1 --port 3001 --world_size 3 --rank 1 &

python client.py --ip 127.0.0.1 --port 3001 --world_size 3 --rank 2  &

wait

Cross-process scenario allows users deploy their FL system in computer cluster. Although in this case, we set the address of server as localhost. Then three process will communicate with each other following standard FL procedure.

备注

Due to the rank of torch.distributed is unique for every process. Therefore, we use rank represent client id in this scenario.

3. Cross-process with SerialTrainer#

SerialTrainer uses less computer resources (single process) to simulate multiple clients. Cross-pross is suit for computer cluster deployment, simulating data-center FL system. In our experiment, the world size of torch.distributed can’t more than 50 (Denpends on clusters), otherwise, the socket will crash, which limited the client number of FL simulation.

为了提高伸缩性, FedLab实现了Scale场景,将 SerialTrainer 模块与 ClientManager 模块协同起来,让单进程可模拟多个client的同时,支持多进程的多机通信。

Source codes are available in fedlab_benchamrks/algorithm/fedavg/scale/{experiment setting name}.

以mnist-cnn作为样例来介绍这一块内容。 在这个demo中, 我们设置world_size=11(1*server,11*clients),每个ClientManager下代理了1个承担10个client的本地训练任务的SerialTrainer。 数据划分方法与FedAvg论文中一致。 在这个场景下,我们 仅用了11个进程来模拟了一个有100个client的联邦学习系统

可以通过两个终端来启动这个系统(本例中我们仍然使用localhost。但在我们的测试实验中,在正常的网络配置下,多机场景仍然可以正常启动)。

  1. server (终端 1)

$ python server.py --ip 127.0.0.1 --port 3002 --world_size 11
  1. server (终端 2)

$ bash start_clt.sh 11 1 10 # launch clients from rank 1 to rank 10 with world_size 11

脚本 start_clt.sh 的内容如下:

for ((i=$2; i<=$3; i++))
do
{
    echo "client ${i} started"
    python client.py --world_size $1 --rank ${i} &
    sleep 2s # wait for gpu resources allocation
}
done
wait

4. Hierachical#

Hierarchical mode for FedLab is designed for situation tasks on multiple computer clusters (in different LAN) or the real-world scenes. To enable the inter-connection for different computer clusters, FedLab develops Scheduler as middle-server process to connect client groups. Each Scheduler manages the communication between the global server and clients in a client group. And server can communicate with clients in different LAN via corresponding Scheduler. The computation mode of a client group for each scheduler can be either standalone or cross-process.

The demo of Hierachical with hybrid client (standalone and serial trainer) is given in fedlab/examples/hierarchical-hybrid-mnist.

Run all scripts together:

$ bash launch_eg.sh

Run scripts seperately:

# Top server in terminal 1
$ bash launch_topserver_eg.sh

# Scheduler1 + Ordinary trainer with 1 client + Serial trainer with 10 clients in terminal 2:
bash launch_cgroup1_eg.sh

# Scheduler2 + Ordinary trainer with 1 client + Serial trainer with 10 clients in terminal 3:
$ bash launch_cgroup2_eg.sh