FedLab概述#

引言#

联邦学习最早由Google提出，是近几年一个新兴的机器学习研究领域。联邦学习旨在在分布式机器学习过程中保护个人数据隐私，尤其是金融、智能医疗以及边缘计算方面。不同于传统的存在数据中心的分布式机器学习，联邦学习中的参与者利用本地数据训练本地模型，然后利用特定策略与其他参与者协作得到最终模型，从而避免了直接分享数据的行为。

To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from repetitive implementation of basic FL setting, we introduce highly customizable framework FedLab in this work. FedLab is builded on the top of torch.distributed modules and provides the necessary modules for FL simulation, including communication, compression, model optimization, data partition and other functional modules. FedLab users can build FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, FL algorithm benchmark implemented in FedLab are also presented.

更多细节请参考我们的完整论文。

概述#

FedLab提供了联邦学习中的两种基本角色：Server 和 Client。每个 Server/Client 包含两个部件，分别是 NetworkManager 和 ParameterHandler/Trainer。

NetworkManager module manages message process task, which provides interfaces to customize communication agreements and compression.
ParameterHandler is responsible for backend computation in Server; and Trainer is in charge of backend computation in Client.

Server#

Server 中的 NetworkManager 和 ParameterServerHandler 的关系如下图所示。NetworkManager 处理消息并调用 ParameterServerHandler.on_receive() 方法，而 ParameterServerHandler 用于执行server上具体的计算以及模型训练过程（比如模型聚合）,和全局模型的更新。

Client#

Client 的设计与结构和 Server 非常相似，即 NetworkManager 负责消息处理以及和server的网络通信，Trainer 负责client的本地训练过程。

通信#

FedLab 内置了同步联邦和异步联邦两种通信模式，相对应的 NetworkManager 的通信逻辑如下图

同步联邦：每一轮训练由server开启，即server先随机采样client，并将全局模型广播给被选中的client；client收到后，执行本地训练，结束后向server同步本地信息；
异步联邦 [1]：每一轮由client开启，server等待；client向server发送请求； server收到请求并下发模型信息；client收到信息后执行本地训练，并与server同步；

实验场景#

FedLab 支持单机和多机模拟大规模联邦系统。 standalone 模式用于单机模拟。而 cross-machine 和 hierarchical 模式支持多机器和多集群通信模拟联邦系统。

单机（单进程）#

FedLab 实现了 SerialTrainer 串行训练器，支持在计算资源有限的情况下模拟多个联邦节点。仅占用一个进程资源的情况下，模拟多个联邦节点计算。逻辑上是一个接一个的计算任务串行执行。

Cross-process#

FedLab enables FL simulation tasks to be deployed on multiple processes with correct network configuration (these processes can be run on single or multiple machines). More flexibly in parallel, SerialTrainer can replace the regular Trainer directly. Users can balance the calculation burden among processes by choosing different Trainer. In practice, machines with more computation resources can be assigned with more workload of calculation.

备注

All machines must be in the same network (LAN or WAN) for cross-process deployment.

分层架构#

Hierarchical mode for FedLab is designed for situation tasks on multiple computer clusters (in different LAN) or the real-world scenes. To enable the inter-connection for different computer clusters, FedLab develops Scheduler as middle-server process to connect client groups. Each Scheduler manages the communication between the global server and clients in a client group. And server can communicate with clients in different LAN via corresponding Scheduler. The computation mode of a client group for each scheduler can be either standalone or cross-process.

一个由K个用户组的分层联邦系统如下图。

Benchmarks#

FedLab also contains data partition settings [2], and implementations of FL algorithms [3]. For more information please see our FedLab-benchmarks repo. More benchmarks and FL algorithms demos are coming.