FedLab提供了联邦学习模拟需要的模块,包括通信模块、压缩模块、模型优化模块、数据划分模块,以及其他功能性模块。用户可以像玩乐高积木一样用不同的自定义模块来搭建联邦学习模拟环境。
FedLab概述#
引言#
联邦学习最早由Google提出,是近几年一个新兴的机器学习研究领域。联邦学习旨在在分布式机器学习过程中保护个人数据隐私,尤其是金融、智能医疗以及边缘计算方面。不同于传统的存在数据中心的分布式机器学习,联邦学习中的参与者利用本地数据训练本地模型,然后利用特定策略与其他参与者协作得到最终模型,从而避免了直接分享数据的行为。
To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from repetitive implementation of basic FL setting, we introduce highly customizable framework FedLab in this work. FedLab is builded on the top of torch.distributed modules and provides the necessary modules for FL simulation, including communication, compression, model optimization, data partition and other functional modules. FedLab users can build FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, FL algorithm benchmark implemented in FedLab are also presented.
更多细节请参考我们的 完整论文 。
概述#
FedLab提供了联邦学习中的两种基本角色:Server
和 Client
。每个 Server
/Client
包含两个部件,分别是 NetworkManager
和 ParameterHandler
/Trainer
。
NetworkManager
module manages message process task, which provides interfaces to customize communication agreements and compression.ParameterHandler
is responsible for backend computation inServer
; andTrainer
is in charge of backend computation inClient
.
Server#
Server
中的 NetworkManager
和 ParameterServerHandler
的关系如下图所示。NetworkManager
处理消息并调用 ParameterServerHandler.on_receive()
方法,而 ParameterServerHandler
用于执行server上具体的计算以及模型训练过程(比如模型聚合),和全局模型的更新。
Client#
Client
的设计与结构和 Server
非常相似,即 NetworkManager
负责消息处理以及和server的网络通信,Trainer
负责client的本地训练过程。
通信#
FedLab 内置了同步联邦和异步联邦两种通信模式,相对应的 NetworkManager
的通信逻辑如下图
同步联邦:每一轮训练由server开启,即server先随机采样client,并将全局模型广播给被选中的client;client收到后,执行本地训练,结束后向server同步本地信息;
异步联邦 [1]:每一轮由client开启,server等待;client向server发送请求; server收到请求并下发模型信息;client收到信息后执行本地训练,并与server同步;
实验场景#
FedLab 支持单机和多机模拟大规模联邦系统。 standalone 模式用于单机模拟。而 cross-machine 和 hierarchical 模式支持多机器和多集群通信模拟联邦系统。
单机(单进程)#
FedLab 实现了 SerialTrainer
串行训练器,支持在计算资源有限的情况下模拟多个联邦节点。仅占用一个进程资源的情况下,模拟多个联邦节点计算。 逻辑上是一个接一个的计算任务串行执行。
Cross-process#
FedLab enables FL simulation tasks to be deployed on multiple processes with correct network configuration (these processes can be run on single or multiple machines). More flexibly in parallel, SerialTrainer
can replace the regular Trainer
directly. Users can balance the calculation burden among processes by choosing different Trainer
. In practice, machines with more computation resources can be assigned with more workload of calculation.
备注
All machines must be in the same network (LAN or WAN) for cross-process deployment.
分层架构#
Hierarchical mode for FedLab is designed for situation tasks on multiple computer clusters (in different LAN) or the real-world scenes. To enable the inter-connection for different computer clusters, FedLab develops Scheduler
as middle-server process to connect client groups. Each Scheduler
manages the communication between the global server and clients in a client group. And server can communicate with clients in different LAN via corresponding Scheduler
. The computation mode of a client group for each scheduler can be either standalone or cross-process.
一个由K个用户组的分层联邦系统如下图。
Benchmarks#
FedLab also contains data partition settings [2], and implementations of FL algorithms [3]. For more information please see our FedLab-benchmarks repo. More benchmarks and FL algorithms demos are coming.
安装 & 设置#
可以通过源代码或者pip来安装FedLab。
源代码#
从GitHub上下载FedLab最新版本的源代码:
$ git clone git@github.com:SMILELab-FL/FedLab.git
$ cd FedLab
安装依赖:
$ pip install -r requirements.txt
Pip#
用pip安装FedLab的稳定版本:
$ pip install fedlab==$version$
数据集下载#
FedLab提供了联邦学习研究中常用的数据集。
数据集的下载脚本可以在 fedlab_benchmarks/datasets 中找到。如需获得数据集的具体细节,请参考 README.md.
使用指南#
FedLab 标准化了联邦学习模拟的过程,包括同步算法、异步算法 [1]、通信压缩 [4]。FedLab 提供了模块化工具以及联邦学习的标准化实现来简化联邦学习研究。
Distributed Communication#
Initialize distributed network#
FedLab uses torch.distributed as point-to-point communication tools. The communication backend is Gloo as default. FedLab processes send/receive data through TCP network connection. Here is the details of how to initialize the distributed network.
You need to assign right ethernet to DistNetwork
, making sure torch.distributed
network initialization works. DistNetwork
is for quickly network configuration, which you can create one as follows:
from fedlab.core.network import DistNetwork
world_size = 10
rank = 0 # 0 for server, other rank for clients
ethernet = None
server_ip = '127.0.0.1'
server_port = 1234
network = DistNetwork(address=(server_ip, server_port), world_size, rank, ethernet)
network.init_network_connection() # call this method to start connection.
network.close_network_connection() # call this method to shutdown connection.
The
(server_ip, server_port)
is the address of server. please be aware of that the rank of server is 0 as default.Make sure
world_size
is the same across process.Rank should be different (from
0
toworld_size-1
).world_size = 1 (server) + client number.
The ethernet is None as default. torch.distributed will try finding the right ethernet automatically.
The
ethernet_name
must be checked (usingifconfig
). Otherwise, network initialization would fail.
If the automatically detected interface does not work, users are required to assign a right network interface for Gloo, by assigning in code or setting the environment variables GLOO_SOCKET_IFNAME
, for example export GLOO_SOCKET_IFNAME=eth0
or os.environ['GLOO_SOCKET_IFNAME'] = "eth0"
.
备注
Check the available ethernet:
$ ifconfig
Point-to-point communication#
In recent update, we hide the communication details from user and provide simple APIs. DistNetwork
now provies two basic communication APIs: send()
and recv()
. These APIs suppor flexible pytorch tensor communication.
Sender process:
network = DistNetwork(address=(server_ip, server_port), world_size, rank, ethernet)
network.init_network_connection()
network.send(content, message_code, dst)
network.close_network_connection()
Receiver process:
network = DistNetwork(address=(server_ip, server_port), world_size, rank, ethernet)
network.init_network_connection()
sender_rank, message_code, content = network.recv(src)
#################################
# #
# local process with content. #
# #
#################################
network.close_network_connection()
备注
- Currently, following restrictions need to be noticed:
Tensor list:
send()
accepts a python list with tensors.Data type:
send()
doesn’t accept tensors of different data type. In other words, FedLab force all appended tensors to be the same data type as the first appended tensor. Torch data types like [torch.int8, torch.int16, torch.int32, torch.int64, torch.float16, torch.float32, torch.float64] are supported.
Further understanding of FedLab communication#
FedLab pack content into a pre-defined package data structure. send()
and recv()
are implemented like:
def send(self, content=None, message_code=None, dst=0):
"""Send tensor to process rank=dst"""
pack = Package(message_code=message_code, content=content)
PackageProcessor.send_package(pack, dst=dst)
def recv(self, src=None):
"""Receive tensor from process rank=src"""
sender_rank, message_code, content = PackageProcessor.recv_package(
src=src)
return sender_rank, message_code, content
Create package#
The basic communication unit in FedLab is called package. The communication module of FedLab is in fedlab/core/communicator. Package
defines the basic data structure of network package. It contains header and content.
p = Package()
p.header # A tensor with size = (5,).
p.content # A tensor with size = (x,).
Currently, you can create a network package from following methods:
initialize with tensor
tensor = torch.Tensor(size=(10,))
package = Package(content=tensor)
initialize with tensor list
tensor_sizes = [10, 5, 8]
tensor_list = [torch.rand(size) for size in tensor_sizes]
package = Package(content=tensor_list)
append a tensor to exist package
tensor = torch.Tensor(size=(10,))
package = Package(content=tensor)
new_tensor = torch.Tensor(size=(8,))
package.append_tensor(new_tensor)
append a tensor list to exist package
tensor_sizes = [10, 5, 8]
tensor_list = [torch.rand(size) for size in tensor_sizes]
package = Package()
package.append_tensor_list(tensor_list)
Two static methods are provided by Package to parse header and content:
p = Package()
Package.parse_header(p.header) # necessary information to describe the package
Package.parse_content(p.slices, p.content) # tensor list associated with the tensor sequence appended into.
Send package#
The point-to-point communicating agreements is implemented in PackageProcessor module. PackageProcessor is a static class to manage package sending/receiving procedure.
User can send a package to a process with rank=0 (the parameter dst must be assigned):
p = Package()
PackageProcessor.send_package(package=p, dst=0)
or, receive a package from rank=0 (set the parameter src=None to receive package from any other process):
sender_rank, message_code, content = PackageProcessor.recv_package(src=0)
Communication Strategy#
Communication strategy is implemented by (ClientManager,ServerManager) pair collaboratively.
The prototype of NetworkManager
is defined in fedlab.core.network_manager
, which is also a subclass of torch.multiprocessing.process
.
Typically, standard implementations is shown in fedlab.core.client.manager
and fedlab.core.server.manager
. NetworkManager
manages network operation and control flow procedure.
Base class definition shows below:
class NetworkManager(Process):
"""Abstract class
Args:
newtork (DistNetwork): object to manage torch.distributed network communication.
"""
def __init__(self, network):
super(NetworkManager, self).__init__()
self._network = network
def run(self):
"""
Main Process:
1. Initialization stage.
2. FL communication stage.
3. Shutdown stage, then close network connection.
"""
self.setup()
self.main_loop()
self.shutdown()
def setup(self, *args, **kwargs):
"""Initialize network connection and necessary setups.
Note:
At first, ``self._network.init_network_connection()`` is required to be called.
Overwrite this method to implement system setup message communication procedure.
"""
self._network.init_network_connection()
def main_loop(self, *args, **kwargs):
"""Define the actions of communication stage."""
raise NotImplementedError()
def shutdown(self, *args, **kwargs):
"""Shut down stage"""
self._network.close_network_connection()
FedLab provides 2 standard communication pattern implementations: synchronous and asynchronous. And we encourage users create new FL communication pattern for their own algorithms.
You can customize process flow by: 1. create a new class inherited from corresponding class in our standard implementations; 2. overwrite the functions in target stage. To sum up, communication strategy can be customized by overwriting as the note below mentioned.
备注
setup()
defines the network initialization stage. Can be used for FL algorithm initialization.main_loop()
is the main process of client and server. User need to define the communication strategy for both client and server manager.shutdown()
defines the shutdown stage.
Importantly, ServerManager and ClientManager should be defined and used as a pair. The control flow and information agreements should be compatible. FedLab provides standard implementation for typical synchronous and asynchronous, as depicted below.
Synchronous mode#
Synchronous communication involves SynchronousServerManager
and PassiveClientManager
. Communication procedure is shown as follows.
Asynchronous mode#
Asynchronous is given by ServerAsynchronousManager
and ClientActiveManager
. Communication
procedure is shown as follows.
Customization#
Initialization stage#
Initialization stage is represented by manager.setup()
function.
User can customize initialization procedure as follows(use ClientManager as example):
from fedlab.core.client.manager import PassiveClientManager
class CustomizeClientManager(PassiveClientManager):
def __init__(self, trainer, network):
super().__init__(trainer, network)
def setup(self):
super().setup()
*****************************
* *
* Write Code Here *
* *
*****************************
Communication stage#
After Initialization Stage, user can define main_loop()
to define main process for server and client. To standardize
FedLab’s implementation, here we give the main_loop()
of PassiveClientManager
: and SynchronousServerManager
for example.
Client part:
def main_loop(self):
"""Actions to perform when receiving new message, including local training
Main procedure of each client:
1. client waits for data from server (PASSIVELY)
2. after receiving data, client trains local model.
3. client synchronizes with server actively.
"""
while True:
sender_rank, message_code, payload = self._network.recv(src=0)
if message_code == MessageCode.Exit:
break
elif message_code == MessageCode.ParameterUpdate:
self._trainer.local_process(payload=payload)
self.synchronize()
else:
raise ValueError("Invalid MessageCode {}.".format(message_code))
Server Part:
def main_loop(self):
"""Actions to perform in server when receiving a package from one client.
Server transmits received package to backend computation handler for aggregation or others
manipulations.
Loop:
1 activate clients.
2 listen for message from clients -> transmit received parameters to server backend.
Note:
Communication agreements related: user can overwrite this function to customize
communication agreements. This method is key component connecting behaviors of
:class:`ParameterServerBackendHandler` and :class:`NetworkManager`.
Raises:
Exception: Unexpected :class:`MessageCode`.
"""
while self._handler.stop_condition() is not True:
activate = threading.Thread(target=self.activate_clients)
activate.start()
while True:
sender_rank, message_code, payload = self._network.recv()
if message_code == MessageCode.ParameterUpdate:
if self._handler.iterate_global_model(sender_rank, payload=paylaod):
break
else:
raise Exception(
raise ValueError("Invalid MessageCode {}.".format(message_code))
Shutdown stage#
shutdown()
will be called when main_loop()
finished. You can define the actions for client and server seperately.
Typically in our implementation, shutdown stage is started by server. It will send a message with MessageCode.Exit
to
inform client to stop its main loop.
Codes below is the actions of SynchronousServerManager
in shutdown stage.
def shutdown(self):
self.shutdown_clients()
super().shutdown()
def shutdown_clients(self):
"""Shut down all clients.
Send package to every client with :attr:`MessageCode.Exit` to client.
"""
for rank in range(1, self._network.world_size):
print("stopping clients rank:", rank)
self._network.send(message_code=MessageCode.Exit, dst=rank)
Federated Optimization#
Standard FL Optimization contains two parts: 1. local train in client; 2. global aggregation in server. Local train and aggregation procedure are customizable in FedLab. You need to define ClientTrainer
and ServerHandler
.
Since ClientTrainer
and ServerHandler
are required to manipulate PyTorch Model. They are both inherited from ModelMaintainer
.
class ModelMaintainer(object):
"""Maintain PyTorch model.
Provide necessary attributes and operation methods. More features with local or global model
will be implemented here.
Args:
model (torch.nn.Module): PyTorch model.
cuda (bool): Use GPUs or not.
device (str, optional): Assign model/data to the given GPUs. E.g., 'device:0' or 'device:0,1'. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
"""
def __init__(self,
model: torch.nn.Module,
cuda: bool,
device: str = None) -> None:
self.cuda = cuda
if cuda:
# dynamic gpu acquire.
if device is None:
self.device = get_best_gpu()
else:
self.device = device
self._model = deepcopy(model).cuda(self.device)
else:
self._model = deepcopy(model).cpu()
def set_model(self, parameters: torch.Tensor):
"""Assign parameters to self._model."""
SerializationTool.deserialize_model(self._model, parameters)
@property
def model(self) -> torch.nn.Module:
"""Return :class:`torch.nn.module`."""
return self._model
@property
def model_parameters(self) -> torch.Tensor:
"""Return serialized model parameters."""
return SerializationTool.serialize_model(self._model)
@property
def model_gradients(self) -> torch.Tensor:
"""Return serialized model gradients."""
return SerializationTool.serialize_model_gradients(self._model)
@property
def shape_list(self) -> List[torch.Tensor]:
"""Return shape of model parameters.
Currently, this attributes used in tensor compression.
"""
shape_list = [param.shape for param in self._model.parameters()]
return shape_list
Client local training#
The basic class of ClientTrainer is shown below, we encourage users define local training process following our code pattern:
class ClientTrainer(ModelMaintainer):
"""An abstract class representing a client trainer.
In FedLab, we define the backend of client trainer show manage its local model.
It should have a function to update its model called :meth:`local_process`.
If you use our framework to define the activities of client, please make sure that your self-defined class
should subclass it. All subclasses should overwrite :meth:`local_process` and property ``uplink_package``.
Args:
model (torch.nn.Module): PyTorch model.
cuda (bool): Use GPUs or not.
device (str, optional): Assign model/data to the given GPUs. E.g., 'device:0' or 'device:0,1'. Defaults to ``None``.
"""
def __init__(self,
model: torch.nn.Module,
cuda: bool,
device: str = None) -> None:
super().__init__(model, cuda, device)
self.client_num = 1 # default is 1.
self.dataset = FedDataset() # or Dataset
self.type = ORDINARY_TRAINER
def setup_dataset(self):
"""Set up local dataset ``self.dataset`` for clients."""
raise NotImplementedError()
def setup_optim(self):
"""Set up variables for optimization algorithms."""
raise NotImplementedError()
@property
@abstractmethod
def uplink_package(self) -> List[torch.Tensor]:
"""Return a tensor list for uploading to server.
This attribute will be called by client manager.
Customize it for new algorithms.
"""
raise NotImplementedError()
@abstractclassmethod
def local_process(self, payload: List[torch.Tensor]):
"""Manager of the upper layer will call this function with accepted payload
In synchronous mode, return True to end current FL round.
"""
raise NotImplementedError()
def train(self):
"""Override this method to define the training procedure. This function should manipulate :attr:`self._model`."""
raise NotImplementedError()
def validate(self):
"""Validate quality of local model."""
raise NotImplementedError()
def evaluate(self):
"""Evaluate quality of local model."""
raise NotImplementedError()
Overwrite
ClientTrainer.local_process()
to define local procedure. Typically, you need to implement standard training pipeline of PyTorch.Attributes
model
andmodel_parameters
is is associated withself._model
. Please make sure the functionlocal_process()
will manipulateself._model
.
A standard implementation of this part is in :class:`SGDClientTrainer`.
Server global aggregation#
Calculation tasks related with PyTorch should be define in ServerHandler part. In FedLab, our basic class of Handler is defined in ServerHandler
.
class ServerHandler(ModelMaintainer):
"""An abstract class representing handler of parameter server.
Please make sure that your self-defined server handler class subclasses this class
Example:
Read source code of :class:`SyncServerHandler` and :class:`AsyncServerHandler`.
Args:
model (torch.nn.Module): PyTorch model.
cuda (bool): Use GPUs or not.
device (str, optional): Assign model/data to the given GPUs. E.g., 'device:0' or 'device:0,1'. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
"""
def __init__(self,
model: torch.nn.Module,
cuda: bool,
device: str = None) -> None:
super().__init__(model, cuda, device)
@property
@abstractmethod
def downlink_package(self) -> List[torch.Tensor]:
"""Property for manager layer. Server manager will call this property when activates clients."""
raise NotImplementedError()
@property
@abstractmethod
def if_stop(self) -> bool:
""":class:`NetworkManager` keeps monitoring this attribute, and it will stop all related processes and threads when ``True`` returned."""
return False
@abstractmethod
def setup_optim(self):
"""Override this function to load your optimization hyperparameters."""
raise NotImplementedError()
@abstractmethod
def global_update(self, buffer):
raise NotImplementedError()
@abstractmethod
def load(self, payload):
"""Override this function to define how to update global model (aggregation or optimization)."""
raise NotImplementedError()
@abstractmethod
def evaluate(self):
"""Override this function to define the evaluation of global model."""
raise NotImplementedError()
User can define server aggregation strategy by finish following functions:
You can overwrite
_update_global_model()
to customize global procedure._update_global_model()
is required to manipulate global model parameters (self._model).Summarised FL aggregation strategies are implemented in
fedlab.utils.aggregator
.
A standard implementation of this part is in SyncParameterServerHandler.
联邦化数据集和DataPartitioner#
考虑到现实真实情况的复杂性,联邦学习需要应对各种数据分布的场景,包括iid和non-iid场景。在实验中往往需要根据client数量以及具体分布和划分策略对完整的训练数据集和测试数据集进行划分。尽管当前公开的数据集已存在一些划分策略,如何维护数据划分相关数据结构仍旧非常麻烦,以及尽管用了相同分布,不同复现给出的具体代码实现划分实现仍旧存在差异。FedLab为用户提供了 fedlab.utils.dataset.partition.DataPartitioner
来使用一些预划分的数据集或是你自己的数据。给定划分策略,DataPartitioner
可以维护每个client在划分后拥有的样本序号。同时,FedLab提供了已被用于联邦学习但是PyTorch的 torchvision.datasets
尚未提供的一些数据集。
视觉数据#
CIFAR10#
FedLab为部分数据集(比如CIFAR10)提供了一些预定义的数据划分方案,以及为特定数据划分方案提供的函数实现。他们可以被用作你的联邦学习算法的测试基准设置。
CIFAR10Partitioner
的教程:CIFAR10教程。
CIFAR100#
CIFAR100Partitioner
的notebook教程:CIFAR100教程.
FMNIST#
FMNIST(FashionMNIST)划分的notebook教程:FMNIST教程.
MNIST#
MNIST和FMNIST非常相似,请参考 FMNIST教程。
SVHN#
SVHN的数据划分教程:SVHN教程。
CelebA#
CelebA的数据划分:CelebA教程。
FEMNIST#
FEMNIST的数据划分:FEMNIST教程。
文本数据#
Shakespeare#
Shakespeare数据集的数据划分:Shakespeare教程。
Sent140#
Sent140数据划分:Sent140教程。
Reddit#
Reddit数据划分:Reddit教程。
列表数据#
Adult#
Adult数据集出自 LIBSVM Data。数据原始来源于 UCI/Adult. FedLab不仅提供了Adult对应的 Dataset
,还提供了对应的 DataPartitioner
。Adult的notebook教程:Adult教程。
Covtype#
Covtype数据集出自 LIBSVM Data。数据原始来源于 UCI/Covtype. FedLab不仅提供了Covtype对应的 Dataset
,还提供了对应的 DataPartitioner
。Covtype的notebook教程:Covtype教程。
RCV1#
RCV1数据集出自 LIBSVM Data。数据原始来源于 UCI/RCV1. FedLab不仅提供了RCV1对应的 Dataset
,还提供了对应的 DataPartitioner
。RCV1的notebook教程:RCV1教程。
合成数据#
FCUBE#
FCUBE是为联邦学习设计的一种合成数据集。FedLab提供了对应的 Dataset
和 DataPartitioner
。FCUBE教程:FCUBE教程.
LEAF-Synthetic#
LEAF-Synthetic是由LEAF提出的联邦化数据集。client数量、类数量和特征维度均可由用户自定义。
更多细节请参考 LEAF-Synthetic。
Deploy FedLab Process in a Docker Container#
Why docker?#
The communication APIs of FedLab is built on torch.distributed. In cross-process scene, when multiple FedLab processes are deployed on the same machine, GPU memory buckets will be created automatically however which are not used in our framework. We can start the FedLab processes in different docker containers to avoid triggering GPU memory buckets (to save GPU memory).
Setup docker environment#
In this section, we introduce how to setup a docker image for FedLab program. Here we provide the Dockerfile for building a FedLab image. Our FedLab environment is based on PytTorch. Therefore, we just need install FedLab on the provided PytTorch image.
Dockerfile:
# This is an example of fedlab installation via Dockerfile
# replace the value of TORCH_CONTAINER with pytorch image that satisfies your cuda version
# you can find it in https://hub.docker.com/r/pytorch/pytorch/tags
ARG TORCH_CONTAINER=1.5-cuda10.1-cudnn7-runtime
FROM pytorch/pytorch:${TORCH_CONTAINER}
RUN pip install --upgrade pip \
& pip uninstall -y torch torchvision \
& conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ \
& conda config --set show_channel_urls yes \
& mkdir /root/tmp/
# replace with the correct install command, which you can find in https://pytorch.org/get-started/previous-versions/
RUN conda install -y pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
# pip install fedlab
RUN TMPDIR=/root/tmp/ pip install -i https://pypi.mirrors.ustc.edu.cn/simple/ fedlab
Dockerfile for different platforms#
The steps of modifying Dockerfile for different platforms:
Step 1: Find an appropriate base pytorch image for your platform from dockerhub https://hub.docker.com/r/pytorch/pytorch/tags. Then, replace the value of TORCH_CONTAINER in demo dockerfile.
Step 2: To install specific PyTorch version, you need to choose a correct install command, which can be find in https://pytorch.org/get-started/previous-versions/. Then, modify the 16-th command in demo dockerfile.
Step 3: Build the images for your own platform by running the command below in the dir of Dockerfile.
$ docker build -t image_name .
警告
Using “–gpus all” and “–network=host” when start a docker container:
$ docker run -itd --gpus all --network=host b23a9c46cd04(image name) /bin/bash
If you are not in China area, it is ok to remove line 11,12 and “-i https://pypi.mirrors.ustc.edu.cn/simple/” in line 19.
Finally: Run your FedLab process in the different started containers.
示例#
快速开始#
In this page, we introduce the provided quick start demos. And the start scripts for FL simulation system with FedLab in different scenario. We implement FedAvg algorithm with MLP network and partitioned MNIST dataset across clients.
Source code can be seen in fedlab/examples/.
下载数据集#
FedLab提供了对常用数据集的划分方法。 此外联邦划分的基线数据集LEAF [2] 也实现在FedLab benchmarks库中, 并从TensorFlow迁移到PyTorch。
Codes related to dataset download process are available at fedlab_benchamrks/datasets/{dataset name}
.
下载MNIST/CIFAR10
$ cd fedlab_benchamrks/datasets/{mnist or cifar10}/
$ python download_{dataset}.py
联邦划分
运行如下Python脚本来生成划分文件。
$ python {dataset}_partition.py
数据划分脚本的源码:
import torchvision
from fedlab.utils.functional import save_dict
from fedlab.utils.dataset.slicing import noniid_slicing, random_slicing
trainset = torchvision.datasets.CIFAR10(root=root, train=True, download=True)
# trainset = torchvision.datasets.MNIST(root=root, train=True, download=True)
data_indices = noniid_slicing(trainset, num_clients=100, num_shards=200)
save_dict(data_indices, "cifar10_noniid.pkl")
data_indices = random_slicing(trainset, num_clients=100)
save_dict(data_indices, "cifar10_iid.pkl")
data_indices
is a dict
mapping from client id to data indices(list) of raw dataset.
FedLab provides random partition and non-I.I.D. partition methods, in which the noniid partition method is totally re-implementation in paper FedAvg.
LEAF数据集处理
请按照 FedLab benchmark 来学习如何生成LEAF相关的数据划分。
运行FedLab demos#
FedLab 提供标准异步和同步联邦的实现供用户使用或学习。本页仅介绍不同场景下的FedAvg的实现。
我们强烈建议用户尝试阅读源码去了解FedLab的架构和工作机制(我们认为我们的代码非常友好)。
1. 单机单进程#
Source code is under
fedlab/examples/standalone-mnist.
This is a standard usage of SerialTrainer
which allows users to simulate a group of
clients with a single process.
$ python standalone.py --total_client 100 --com_round 3 --sample_ratio 0.1 --batch_size 100 --epochs 5 --lr 0.02
or
$ bash launch_eg.sh
Run command above to start a single process simulating FedAvg algorithm with 100 clients with 10 communication round in total, with 10 clients sampled randomly at each round .
2. Cross-process#
Source code is under fedlab/examples/cross-process-mnist
启动一个联邦模拟系统:1个server和2个clients。
$ bash launch_eg.sh
The content of launch_eg.sh
is:
python server.py --ip 127.0.0.1 --port 3001 --world_size 3 --round 3 &
python client.py --ip 127.0.0.1 --port 3001 --world_size 3 --rank 1 &
python client.py --ip 127.0.0.1 --port 3001 --world_size 3 --rank 2 &
wait
Cross-process scenario allows users deploy their FL system in computer cluster. Although in this case, we set the address of server as localhost. Then three process will communicate with each other following standard FL procedure.
备注
Due to the rank of torch.distributed is unique for every process. Therefore, we use rank represent client id in this scenario.
3. Cross-process with SerialTrainer#
SerialTrainer
uses less computer resources (single process) to simulate multiple clients. Cross-pross is suit for computer cluster deployment, simulating data-center FL system. In our experiment, the world size of torch.distributed
can’t more than 50 (Denpends on clusters), otherwise, the socket will crash, which limited the client number of FL simulation.
为了提高伸缩性, FedLab实现了Scale场景,将 SerialTrainer
模块与 ClientManager
模块协同起来,让单进程可模拟多个client的同时,支持多进程的多机通信。
Source codes are available in fedlab_benchamrks/algorithm/fedavg/scale/{experiment setting name}.
以mnist-cnn作为样例来介绍这一块内容。 在这个demo中, 我们设置world_size=11(1*server,11*clients),每个ClientManager下代理了1个承担10个client的本地训练任务的SerialTrainer。 数据划分方法与FedAvg论文中一致。 在这个场景下,我们 仅用了11个进程来模拟了一个有100个client的联邦学习系统 。
可以通过两个终端来启动这个系统(本例中我们仍然使用localhost。但在我们的测试实验中,在正常的网络配置下,多机场景仍然可以正常启动)。
server (终端 1)
$ python server.py --ip 127.0.0.1 --port 3002 --world_size 11
server (终端 2)
$ bash start_clt.sh 11 1 10 # launch clients from rank 1 to rank 10 with world_size 11
脚本 start_clt.sh
的内容如下:
for ((i=$2; i<=$3; i++))
do
{
echo "client ${i} started"
python client.py --world_size $1 --rank ${i} &
sleep 2s # wait for gpu resources allocation
}
done
wait
4. Hierachical#
Hierarchical mode for FedLab is designed for situation tasks on multiple computer clusters (in different LAN) or the real-world scenes. To enable the inter-connection for different computer clusters, FedLab develops Scheduler
as middle-server process to connect client groups. Each Scheduler
manages the communication between the global server and clients in a client group. And server can communicate with clients in different LAN via corresponding Scheduler
. The computation mode of a client group for each scheduler can be either standalone or cross-process.
The demo of Hierachical with hybrid client (standalone and serial trainer) is given in fedlab/examples/hierarchical-hybrid-mnist.
Run all scripts together:
$ bash launch_eg.sh
Run scripts seperately:
# Top server in terminal 1
$ bash launch_topserver_eg.sh
# Scheduler1 + Ordinary trainer with 1 client + Serial trainer with 10 clients in terminal 2:
bash launch_cgroup1_eg.sh
# Scheduler2 + Ordinary trainer with 1 client + Serial trainer with 10 clients in terminal 3:
$ bash launch_cgroup2_eg.sh
LEAF的PyTorch版本#
FedLab将TensorFlow版本的LEAF数据集迁移到了PyTorch框架下,并提供了相应数据集的dataloader的实现脚本,统一的接口在 fedlab_benchmarks/leaf/dataloader.py
。
本文介绍在FedLab中leaf数据集的使用流程。
LEAF数据集说明#
LEAF benchmark 包含了celeba, femnist, reddit, sent140, shakespeare, synthetic 六类数据集的联邦设置。参考 leaf-readme.md ,以下给出六类数据集的简介、总用户数和对应任务类别。
FEMNIST
概述: 图像数据集。
详情: 共有62个不同类别(10个数字,26个小写字母,26个大写字母); 每张图像是 \(28 imes 28\) 像素(可选择全部处理为 \(128 imes 128\) 像素); 共有3500位用户。
任务: 图像分类
Sentiment140
概述: 推特推文文本数据集
详情: 共660120位用户
任务: 情感分析
Shakespeare
概述: 莎士比亚作品集对白文本数据集。
详情: 共1129位用户(后续根据序列长度减少到660位,详情查看 bug 。 )
任务: 下一字符预测
Celeba
概述: 基于大规模名人面孔属性数据集的图像数据集: Large-scale CelebFacesAttributes Dataset 。
详情: 共9343位用户(排除了样本数小于等于5的名人)
任务: 图像识别(微笑检测)
合成数据集
概述: 提出了一个生成具有挑战性的合成联合数据集的过程,高级目标是创建真实模型依赖于各设备的设备。可参阅论文 LEAF: A Benchmark for Federated Settings 查看整个生成过程的描述。
详情: 用户可以自定义设备数量、类别数量和维度数量等
任务: 分类
Reddit
概述: 对 pushshift.io 发布的2017年12月的Reddit数据进行了预处理。
详情: 共1,660,820位用户,总评论56,587,343条。
任务: 下一单词预测
使用leaf下载数据集#
为方便用户使用leaf,fedlab将leaf六类数据集的下载、处理脚本整合到
fedlab_benchmarks/datasets/data
中,该文件夹存储各类数据集的下载脚本。
leaf数据集文件夹内的文件结构:
/FedLab/fedlab_benchmarks/datasets/{leaf_dataset_name}
├── {other_useful_preprocess_util}
├── prerpocess.sh
├── stats.sh
└── README.md
preprocess.sh
: 对数据集进行下载和处理。stats.sh
: 对preprocess.sh
处理后所有数据(存储于./data/all_data/all_data.json
)进行信息统计。README.md
: 对该数据集的下载和处理过程进行了详细说明,包含了参数说明和注意事项。用户可直接运行脚本
create_datasets_and_save.sh
来下载,处理和存储相关的数据集。划分的数据集会被pickle模块序列化并存储到文件中。这个.sh脚本是preprocess.sh的样例用法。用户可以修改create_datasets_and_save.sh
的源码从而自定义划分策略。
preprocess.sh 使用样例:
cd fedlab_benchmarks/datasets/data/femnist
bash preprocess.sh -s niid --sf 0.05 -k 0 -t sample
cd fedlab_benchmarks/datasets/data/shakespeare
bash preprocess.sh -s niid --sf 0.2 -k 0 -t sample -tf 0.8
cd fedlab_benchmarks/datasets/data/sent140
bash ./preprocess.sh -s niid --sf 0.05 -k 3 -t sample
cd fedlab_benchmarks/datasets/data/celeba
bash ./preprocess.sh -s niid --sf 0.05 -k 5 -t sample
cd fedlab_benchmarks/datasets/data/synthetic
bash ./preprocess.sh -s niid --sf 1.0 -k 5 -t sample --tf 0.6
# for reddit, see its README.md to download preprocessed dataset manually
通过对 preprocess.sh
设定参数,实现对原始数据的采样、划分等处理,各数据集文件夹下的README.md均提供了脚本参数示例和解释,常见参数有:
-s
表示采样方式,取值有iid和niid两种选择,表示是否使用i.i.d方式进行采样。--sf
表示采样数据比例,取值为小数,默认为0.1。-k
表示采样时所要求的用户最少样本数目,筛选掉拥有过少样本的用户,若取值为0表示不进行样本数目的筛选。-t
表示划分训练集测试集的方式,取值为’user’则划分用户到训练-测试集合,取值为’sample’则划分每个用户的数据到训练-测试集合中。--tf
表示训练集的数据占比,取值为小数,默认为0.9,表示训练集:测试集=9:1。
目前FedLab的Leaf实验需要提供训练数据和测试数据,因此需要对 preprocess.sh
提供相关的数据训练集-测试集划分参数,默认划分比例为0.9。
若需要重新获取数据或划分数据,需要先删除各数据集下的data文件夹再运行相关脚本进行数据下载和处理。
pickle序列化存储Dataset#
为加速用户读取数据,fedlab提供了将原始数据处理为DataSet并存储为pickle文件的方法。通过读取数据处理后的pickle文件可获得各客户端对应数据的Dataset。
设定参数并运行 create_pickle_dataset.py
,使用样例如下:
cd fedlab_benchmarks/leaf/process_data
python create_pickle_dataset.py --data_root "../../datasets" --save_root "./pickle_dataset" --dataset_name "shakespeare"
参数说明:
data_root
:存储leaf数据集的root路径,该路径包含leaf各数据集;若使用fedlab所提供的Fedlab_benchmarks/datasets/
下载leaf数据,则data_root可设置为该路径,示例给出了该路径的相对地址。save_root
:存储处理后DataSet的pickle文件地址,各数据集DataSet将另存为{save_root}/{dataset_name}/{train,test}
;示例则在当前路径下创建pickle_dataset
文件夹存储所有的pickle dataset文件。dataset_name
:指定要处理的leaf数据集名称,有{feminist, Shakespeare, celeba, sent140, synthetic, reddit}六种选择。
Dataloader加载数据集#
leaf数据集由 dataloader.py
加载(位于 fedlab_benchmarks/leaf/dataloader.py
),所有返回数据类型均为pytorch Dataloader 。
通过调用该接口并指明数据集名称,即可获得相应的Dataloader。
使用样例
from leaf.dataloader import get_LEAF_dataloader
def get_femnist_shakespeare_dataset(args):
if args.dataset == 'femnist' or args.dataset == 'shakespeare':
trainloader, testloader = get_LEAF_dataloader(dataset=args.dataset,
client_id=args.rank)
else:
raise ValueError("Invalid dataset:", args.dataset)
return trainloader, testloader
运行实验#
当前LEAF数据集所进行的实验为FedAvg的cross machine场景下的单机多进程设置。
通过运行fedlab_benchmarks/fedavg/cross_machine/LEAF_test.sh可快速执行LEAF数据集下FedAvg的模拟实验。
加入FedLab开源#
报告bugs#
我们使用GitHub issues来跟进所有bugs以及新的功能请求。如果您发现了bug或者希望FedLab提供新的功能,请在GitHub上开新的issues方便我们后续跟进开发。
如果您在使用FedLab框架时发现了各种问题,请向 Bug Tracker 提交表单。当然我们也欢迎大家积极提出新功能需求或者发起pull request。
贡献代码#
我们欢迎大家通过 Pull Request 向FedLab贡献代码。当您贡献代码的时候我们将默认您允许自己提交的代码遵从 Apache License, Version 2.0 。
我们鼓励您向FedLab提交贡献,可以是功能改进或是联邦学习算法的实现。我们推荐的向FedLab提交贡献的流程是先fork FedLab的主仓库,clone到本地,然后在新的分支上进行开发。步骤如下所示:
点击’Fork’按钮来fork项目仓库。如果想要贡献新功能,请fork FedLab框架仓库,如果想要贡献新的联邦学习的算法实现,请fork FedLab-benchmarks仓库.
将你fork得到的仓库clone到本地:
$ git clone git@github.com:YourLogin/FedLab.git $ cd FedLab
创建新的分支来保存您的代码改动:
$ git checkout -b my-feature
在您自己的分支上开发新功能。
$ git add modified_files $ git commit
Pull Request清单#
请让新功能开发分支中遵循以下文件结构,或者按照新的需求创建新的文件。
fedlab ├── core │ ├── communicator # communication module │ ├── client # client related implementations │ └── server # server related implementations │ └── hierarchical # hierarchical communication pattern modules └── utils # functional modules └── dataset # functional modules about dataset
代码应该提供使用 unittest.TestCase 编写的测试样例,并且保证所有的本地测试均能通过:
$ python test_bench.py
所有公开提交的代码应包含有效的说明文档以及使用样例。说明文档和代码应遵循谷歌Python风格指南:中文版 | English。
参考文献#
- 1
Cong Xie, Sanmi Koyejo, and Indranil Gupta. Asynchronous federated optimization. arXiv preprint arXiv:1903.03934, 2019.
- 2
Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečn\`y, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. Leaf: a benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
- 3
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, 1273–1282. PMLR, 2017.
- 4
Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J Dally. Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887, 2017.
- 5
Durmus Alp Emre Acar, Yue Zhao, Ramon Matas, Matthew Mattina, Paul Whatmough, and Venkatesh Saligrama. Federated learning based on dynamic regularization. In International Conference on Learning Representations. 2020.
- 6
Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khazaeni. Bayesian nonparametric federated learning of neural networks. In International Conference on Machine Learning, 7252–7261. PMLR, 2019.
- 7
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. Federated learning on non-iid data silos: an experimental study. arXiv preprint arXiv:2102.02079, 2021.
- 8
Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khazaeni. Federated learning with matched averaging. arXiv preprint arXiv:2002.06440, 2020.
API Reference#
This page contains auto-generated API reference documentation 1.
fedlab#
contrib#
algorithm#
basic_client#
Module Contents#
Client backend handler, this class provides data process method to upper layer. |
|
Deprecated |
- class SGDClientTrainer(model: torch.nn.Module, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.core.client.trainer.ClientTrainer
Client backend handler, this class provides data process method to upper layer.
- 参数
model (torch.nn.Module) – PyTorch model.
cuda (bool, optional) – use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – :object of
Logger
.
- property uplink_package#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- setup_dataset(dataset)#
Set up local dataset
self.dataset
for clients.
- setup_optim(epochs, batch_size, lr)#
Set up local optimization configuration.
- local_process(payload, id)#
Manager of the upper layer will call this function with accepted payload
In synchronous mode, return True to end current FL round.
- train(model_parameters, train_loader) None #
Client trains its local model on local dataset.
- 参数
model_parameters (torch.Tensor) – Serialized model parameters.
- class SGDSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.core.client.trainer.SerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- property uplink_package#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- setup_dataset(dataset)#
Override this function to set up local dataset for clients
- setup_optim(epochs, batch_size, lr)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
- train(model_parameters, train_loader)#
Single round of local training for one client.
备注
Overwrite this method to customize the PyTorch training pipeline.
- 参数
model_parameters (torch.Tensor) – serialized model parameters.
train_loader (torch.utils.data.DataLoader) –
torch.utils.data.DataLoader
for this client.
basic_server#
Module Contents#
Synchronous Parameter Server Handler. |
|
Asynchronous Parameter Server Handler |
- class SyncServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.core.server.handler.ServerHandler
Synchronous Parameter Server Handler.
Backend of synchronous parameter server: this class is responsible for backend computing in synchronous server.
Synchronous parameter server will wait for every client to finish local training process before the next FL round.
Details in paper: http://proceedings.mlr.press/v54/mcmahan17a.html
- 参数
model (torch.nn.Module) – Model used in this federation.
global_round (int) – stop condition. Shut down FL system when global round is reached.
sample_ratio (float) – The result of
sample_ratio * num_clients
is the number of clients for every FL round.cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
logger (Logger, optional) – object of
Logger
.
- property downlink_package: List[torch.Tensor]#
Property for manager layer. Server manager will call this property when activates clients.
- property if_stop#
NetworkManager
keeps monitoring this attribute, and it will stop all related processes and threads whenTrue
returned.
- property num_clients_per_round#
- sample_clients()#
Return a list of client rank indices selected randomly. The client ID is from
0
toself.num_clients -1
.
- global_update(buffer)#
- load(payload: List[torch.Tensor]) bool #
Update global model with collected parameters from clients.
备注
Server handler will call this method when its
client_buffer_cache
is full. User can overwrite the strategy of aggregation to apply onmodel_parameters_list
, and useSerializationTool.deserialize_model()
to load serialized parameters after aggregation intoself._model
.- 参数
payload (list[torch.Tensor]) – A list of tensors passed by manager layer.
- class AsyncServerHandler(model: torch.nn.Module, global_round: int, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.core.server.handler.ServerHandler
Asynchronous Parameter Server Handler
Update global model immediately after receiving a ParameterUpdate message Paper: https://arxiv.org/abs/1903.03934
- 参数
model (torch.nn.Module) – Global model in server
global_round (int) – stop condition. Shut down FL system when global round is reached.
cuda (bool) – Use GPUs or not.
device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
logger (Logger, optional) – Object of
Logger
.
- property if_stop#
NetworkManager
keeps monitoring this attribute, and it will stop all related processes and threads whenTrue
returned.
- property downlink_package#
Property for manager layer. Server manager will call this property when activates clients.
- setup_optim(alpha, strategy='constant', a=10, b=4)#
Setup optimization configuration.
- 参数
alpha (float) – Weight used in async aggregation.
strategy (str, optional) – Adaptive strategy.
constant
,hinge
andpolynomial
is optional. Default:constant
.. Defaults to ‘constant’.a (int, optional) – Parameter used in async aggregation.. Defaults to 10.
b (int, optional) – Parameter used in async aggregation.. Defaults to 4.
- global_update(buffer)#
- load(payload: List[torch.Tensor]) bool #
Override this function to define how to update global model (aggregation or optimization).
- adapt_alpha(receive_model_time)#
update the alpha according to staleness
ditto#
Module Contents#
Ditto server acts the same as fedavg server. |
|
Deprecated |
- class DittoServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
Ditto server acts the same as fedavg server.
- class DittoSerialClientTrainer(model, num, cuda=False, device=None, logger=None, personal=True)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- property uplink_package#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- setup_dataset(dataset)#
Override this function to set up local dataset for clients
- setup_optim(epochs, batch_size, lr)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
- train(global_model_parameters, local_model_parameters, train_loader)#
Single round of local training for one client.
备注
Overwrite this method to customize the PyTorch training pipeline.
- 参数
model_parameters (torch.Tensor) – serialized model parameters.
train_loader (torch.utils.data.DataLoader) –
torch.utils.data.DataLoader
for this client.
fedavg#
Module Contents#
FedAvg server handler. |
|
Federated client with local SGD solver. |
|
Federated client with local SGD solver. |
- class FedAvgServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
FedAvg server handler.
- class FedAvgClientTrainer(model: torch.nn.Module, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDClientTrainer
Federated client with local SGD solver.
- class FedAvgSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Federated client with local SGD solver.
feddyn#
Module Contents#
FedAvg server handler. |
|
Deprecated |
- class FedDynServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
FedAvg server handler.
- setup_optim(alpha)#
Override this function to load your optimization hyperparameters.
- global_update(buffer)#
- class FedDynSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- setup_dataset(dataset)#
Override this function to set up local dataset for clients
- setup_optim(epochs, batch_size, lr, alpha)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
- train(id, model_parameters, train_loader)#
Single round of local training for one client.
备注
Overwrite this method to customize the PyTorch training pipeline.
- 参数
model_parameters (torch.Tensor) – serialized model parameters.
train_loader (torch.utils.data.DataLoader) –
torch.utils.data.DataLoader
for this client.
fednova#
Module Contents#
FedAvg server handler. |
|
Federated client with local SGD solver. |
- class FedNovaServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
FedAvg server handler.
- setup_optim(option='weighted_scale')#
Override this function to load your optimization hyperparameters.
- global_update(buffer)#
- class FedNovaSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Federated client with local SGD solver.
- local_process(payload, id_list)#
Define the local main process.
fedprox#
Module Contents#
FedProx server handler. |
|
Federated client with local SGD with proximal term solver. |
|
Deprecated |
- class FedProxServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
FedProx server handler.
- class FedProxClientTrainer(model: torch.nn.Module, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDClientTrainer
Federated client with local SGD with proximal term solver.
- setup_optim(epochs, batch_size, lr, mu)#
Set up local optimization configuration.
- local_process(payload, id)#
Manager of the upper layer will call this function with accepted payload
In synchronous mode, return True to end current FL round.
- train(model_parameters, train_loader, mu) None #
Client trains its local model on local dataset.
- 参数
model_parameters (torch.Tensor) – Serialized model parameters.
- class FedProxSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- setup_optim(epochs, batch_size, lr, mu)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
- train(model_parameters, train_loader, mu) None #
Client trains its local model on local dataset.
- 参数
model_parameters (torch.Tensor) – Serialized model parameters.
ifca#
Module Contents#
Synchronous Parameter Server Handler. |
|
Deprecated |
- class IFCAServerHander(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger=None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
Synchronous Parameter Server Handler.
Backend of synchronous parameter server: this class is responsible for backend computing in synchronous server.
Synchronous parameter server will wait for every client to finish local training process before the next FL round.
Details in paper: http://proceedings.mlr.press/v54/mcmahan17a.html
- 参数
model (torch.nn.Module) – Model used in this federation.
global_round (int) – stop condition. Shut down FL system when global round is reached.
sample_ratio (float) – The result of
sample_ratio * num_clients
is the number of clients for every FL round.cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
logger (Logger, optional) – object of
Logger
.
- property downlink_package#
Property for manager layer. Server manager will call this property when activates clients.
- setup_optim(share_size, k, init_parameters)#
_summary_
- 参数
share_size (_type_) – _description_
k (_type_) – _description_
init_parameters (_type_) – _description_
- global_update(buffer)#
- class IFCASerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- setup_dataset(dataset)#
Override this function to set up local dataset for clients
- setup_optim(epochs, batch_size, lr)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
powerofchoice#
Module Contents#
Synchronous Parameter Server Handler. |
|
Deprecated |
- class PowerofchoicePipeline(handler: fedlab.core.server.handler.ServerHandler, trainer: fedlab.core.client.trainer.SerialClientTrainer)#
Bases:
fedlab.core.standalone.StandalonePipeline
- main()#
- class Powerofchoice(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
Synchronous Parameter Server Handler.
Backend of synchronous parameter server: this class is responsible for backend computing in synchronous server.
Synchronous parameter server will wait for every client to finish local training process before the next FL round.
Details in paper: http://proceedings.mlr.press/v54/mcmahan17a.html
- 参数
model (torch.nn.Module) – Model used in this federation.
global_round (int) – stop condition. Shut down FL system when global round is reached.
sample_ratio (float) – The result of
sample_ratio * num_clients
is the number of clients for every FL round.cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
logger (Logger, optional) – object of
Logger
.
- setup_optim(d)#
Override this function to load your optimization hyperparameters.
- sample_candidates()#
- sample_clients(candidates, losses)#
Return a list of client rank indices selected randomly. The client ID is from
0
toself.num_clients -1
.
- class PowerofchoiceSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- evaluate(id_list, model_parameters)#
Evaluate quality of local model.
qfedavg#
Module Contents#
qFedAvg server handler. |
|
Federated client with modified upload package and local SGD solver. |
- class qFedAvgServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
qFedAvg server handler.
- global_update(buffer)#
- class qFedAvgClientTrainer(model: torch.nn.Module, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDClientTrainer
Federated client with modified upload package and local SGD solver.
- property uplink_package#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- setup_optim(epochs, batch_size, lr, q)#
Set up local optimization configuration.
scaffold#
Module Contents#
FedAvg server handler. |
|
Deprecated |
- class ScaffoldServerHandler(model: torch.nn.Module, global_round: int, sample_ratio: float, cuda: bool = False, device: str = None, logger: fedlab.utils.Logger = None)#
Bases:
fedlab.contrib.algorithm.basic_server.SyncServerHandler
FedAvg server handler.
- property downlink_package#
Property for manager layer. Server manager will call this property when activates clients.
- setup_optim(lr)#
Override this function to load your optimization hyperparameters.
- global_update(buffer)#
- class ScaffoldSerialClientTrainer(model, num_clients, cuda=False, device=None, logger=None, personal=False)#
Bases:
fedlab.contrib.algorithm.basic_client.SGDSerialClientTrainer
Deprecated Train multiple clients in a single process.
Customize
_get_dataloader()
or_train_alone()
for specific algorithm design in clients.- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
logger (Logger, optional) – Object of
Logger
.personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- setup_optim(epochs, batch_size, lr)#
Set up local optimization configuration.
- local_process(payload, id_list)#
Define the local main process.
- train(id, model_parameters, global_c, train_loader)#
Single round of local training for one client.
备注
Overwrite this method to customize the PyTorch training pipeline.
- 参数
model_parameters (torch.Tensor) – serialized model parameters.
train_loader (torch.utils.data.DataLoader) –
torch.utils.data.DataLoader
for this client.
compressor#
compressor#
Module Contents#
Helper class that provides a standard way to create an ABC using |
quantization#
Module Contents#
Quantization compressor. |
- class QSGDCompressor(n_bit, random=True, cuda=False)#
Bases:
fedlab.contrib.compressor.compressor.Compressor
Quantization compressor.
A implementation for paper https://proceedings.neurips.cc/paper/2017/file/6c340f25839e6acdc73414517203f5f0-Paper.pdf.
Alistarh, Dan, et al. “QSGD: Communication-efficient SGD via gradient quantization and encoding.” Advances in Neural Information Processing Systems 30 (2017): 1709-1720. Thanks to git repo: https://github.com/xinyandai/gradient-quantization
- 参数
- compress(tensor)#
Compress a tensor with quantization :param tensor: [description] :type tensor: [type]
- 返回
The normalization number. signs (torch.Tensor): Tensor that indicates the sign of coresponding number. quantized_intervals (torch.Tensor): Quantized tensor that each item in [0, 2**n_bit -1].
- 返回类型
norm (torch.Tensor)
- decompress(signature)#
Decompress tensor :param signature: [norm, signs, quantized_intervals], returned by :func:
compress
. :type signature: list- 返回
Raw tensor represented by signature.
- 返回类型
topk#
Module Contents#
Compressor for federated communication |
- class TopkCompressor(compress_ratio)#
Bases:
fedlab.contrib.compressor.compressor.Compressor
Compressor for federated communication Top-k gradient or weights selection :param compress_ratio: compress ratio :type compress_ratio: float
- compress(tensor)#
compress tensor into (values, indices) :param tensor: tensor :type tensor: torch.Tensor
- 返回
(values, indices)
- 返回类型
- decompress(values, indices, shape)#
decompress tensor
dataset#
adult#
Module Contents#
Adult dataset from LIBSVM Data. |
- class Adult(root, train=True, transform=None, target_transform=None, download=False)#
Bases:
torch.utils.data.Dataset
Adult dataset from LIBSVM Data.
- 参数
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- url = https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/#
- train_file_name = a9a#
- test_file_name = a9a.t#
- num_classes = 2#
- num_features = 123#
- download()#
- _local_file_existence()#
- __getitem__(index)#
- __len__()#
basic_dataset#
Module Contents#
Base dataset iterator |
|
For data subset with different augmentation for different client. |
|
For data subset with different augmentation for different client. |
|
- class BaseDataset(x, y)#
Bases:
torch.utils.data.Dataset
Base dataset iterator
- __len__()#
- __getitem__(index)#
- class Subset(dataset, indices, transform=None, target_transform=None)#
Bases:
torch.utils.data.Dataset
For data subset with different augmentation for different client.
- 参数
dataset (Dataset) – The whole Dataset
indices (List[int]) – Indices of sub-dataset to achieve from
dataset
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- __getitem__(index)#
Get item
- 参数
index (int) – index
- 返回
(image, target) where target is index of the target class.
- __len__()#
- class CIFARSubset(dataset, indices, transform=None, target_transform=None, to_image=True)#
Bases:
Subset
For data subset with different augmentation for different client.
- 参数
dataset (Dataset) – The whole Dataset
indices (List[int]) – Indices of sub-dataset to achieve from
dataset
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
celeba#
Module Contents#
covtype#
Module Contents#
Covtype binary dataset from LIBSVM Data. |
- class Covtype(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
Covtype binary dataset from LIBSVM Data.
- 参数
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 54#
- url = https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/covtype.libsvm.binary.bz2#
- source_file_name = covtype.libsvm.binary.bz2#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
fcube#
Module Contents#
FCUBE data set. |
- class FCUBE(root, train=True, generate=True, transform=None, target_transform=None, num_samples=4000)#
Bases:
torch.utils.data.Dataset
FCUBE data set.
From paper Federated Learning on Non-IID Data Silos: An Experimental Study.
- 参数
root (str) – Root for data file.
train (bool, optional) – Training set or test set. Default as
True
.generate (bool, optional) – Whether to generate synthetic dataset. If
True
, then generate new synthetic FCUBE data even existed. Default asTrue
.transform (callable, optional) – A function/transform that takes in an
numpy.ndarray
and returns a transformed version.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
num_samples (int, optional) – Total number of samples to generate. We suggest to use 4000 for training set, and 1000 for test set. Default is
4000
for trainset.
- train_files#
- test_files#
- num_clients = 4#
- _generate_train()#
- _generate_test()#
- _save_data()#
- __len__()#
femnist#
Module Contents#
partitioned_cifar#
Module Contents#
|
- class PartitionCIFAR(root, path, dataname, num_clients, download=True, preprocess=False, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- 参数
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
dataname (str) – “cifar10” or “cifar100”
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, download=True)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
partitioned_mnist#
Module Contents#
|
- class PartitionedMNIST(root, path, num_clients, download=True, preprocess=False, partition='iid', dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- 参数
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
partition (str, optional) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(partition='iid', dir_alpha=None, verbose=True, seed=None, download=True, transform=None, target_transform=None)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
pathological_mnist#
Module Contents#
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com |
- class PathologicalMNIST(root, path, num_clients=100, shards=200, download=True, preprocess=False)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com
- 参数
- preprocess(download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Load subdataset for client with client ID
cid
from local file.
rcv1#
Module Contents#
RCV1 binary dataset from LIBSVM Data. |
- class RCV1(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
RCV1 binary dataset from LIBSVM Data.
- 参数
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 47236#
- url = https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2#
- source_file_name = rcv1_train.binary.bz2#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
rotated_cifar10#
Module Contents#
Rotate CIFAR10 and patrition them. |
- class RotatedCIFAR10(root, save_dir, num_clients)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate CIFAR10 and patrition them.
- 参数
- preprocess(shards, thetas=[0, 180])#
_summary_
- 参数
shards (_type_) – _description_
thetas (list, optional) – _description_. Defaults to [0, 180].
- get_dataset(id, type='train')#
Get dataset class
- 参数
- 抛出
- get_data_loader(id, batch_size=None, type='train')#
rotated_mnist#
Module Contents#
Rotate MNIST and partition them. |
- class RotatedMNIST(root, path, num)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate MNIST and partition them.
- 参数
- preprocess(thetas=[0, 90, 180, 270], download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Get dataset class
- 参数
- 抛出
- get_data_loader(id, batch_size=None, type='train')#
sent140#
Module Contents#
- BASE_DIR#
- class Sent140Dataset(client_id: int, client_str: str, data: list, targets: list, is_to_tokens: bool = True, tokenizer: fedlab.contrib.dataset.utils.Tokenizer = None)#
Bases:
torch.utils.data.Dataset
- _process_data_target()#
process client’s data and target
- _data2token()#
- encode(vocab: fedlab.contrib.dataset.utils.Vocab, fix_len: int)#
transform token data to indices sequence by Vocab :param vocab: vocab for data_token :type vocab: fedlab_benchmark.leaf.nlp_utils.util.vocab :param fix_len: max length of sentence :type fix_len: int
- 返回
list of integer list for data_token, and a list of tensor target
- __encode_tokens(tokens, pad_idx) torch.Tensor #
encode fix_len length for token_data to get indices list in self.vocab if one sentence length is shorter than fix_len, it will use pad word for padding to fix_len if one sentence length is longer than fix_len, it will cut the first max_words words :param tokens: data after tokenizer :type tokens: list[str]
- 返回
integer list of indices with fix_len length for tokens input
- __len__()#
- __getitem__(item)#
shakespeare#
Module Contents#
- class ShakespeareDataset(client_id: int, client_str: str, data: list, targets: list)#
Bases:
torch.utils.data.Dataset
- _build_vocab()#
according all letters to build vocab Vocabulary re-used from the Federated Learning for Text Generation tutorial. https://www.tensorflow.org/federated/tutorials/federated_learning_for_text_generation :returns: all letters vocabulary list and length of vocab list
- _process_data_target()#
process client’s data and target
- __sentence_to_indices(sentence: str)#
Returns list of integer for character indices in ALL_LETTERS :param sentence: input sentence :type sentence: str
Returns: a integer list of character indices
- __letter_to_index(letter: str)#
Returns index in ALL_LETTERS of given letter :param letter: input letter :type letter: char/str[0]
Returns: int index of input letter
- __len__()#
- __getitem__(index)#
Package Contents#
Base dataset iterator |
|
For data subset with different augmentation for different client. |
|
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com |
|
Rotate MNIST and partition them. |
|
Rotate CIFAR10 and patrition them. |
|
|
|
|
|
FCUBE data set. |
|
Covtype binary dataset from LIBSVM Data. |
|
RCV1 binary dataset from LIBSVM Data. |
- class FedDataset#
Bases:
object
- preprocess()#
Define the dataset partition process
- abstract get_dataset(id, type='train')#
Get dataset class
- 参数
- 抛出
- abstract get_dataloader(id, batch_size, type='train')#
Get data loader
- __len__()#
- class BaseDataset(x, y)#
Bases:
torch.utils.data.Dataset
Base dataset iterator
- __len__()#
- __getitem__(index)#
- class Subset(dataset, indices, transform=None, target_transform=None)#
Bases:
torch.utils.data.Dataset
For data subset with different augmentation for different client.
- 参数
dataset (Dataset) – The whole Dataset
indices (List[int]) – Indices of sub-dataset to achieve from
dataset
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- __getitem__(index)#
Get item
- 参数
index (int) – index
- 返回
(image, target) where target is index of the target class.
- __len__()#
- class PathologicalMNIST(root, path, num_clients=100, shards=200, download=True, preprocess=False)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com
- 参数
- preprocess(download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class RotatedMNIST(root, path, num)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate MNIST and partition them.
- 参数
- preprocess(thetas=[0, 90, 180, 270], download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Get dataset class
- 参数
- 抛出
- get_data_loader(id, batch_size=None, type='train')#
- class RotatedCIFAR10(root, save_dir, num_clients)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate CIFAR10 and patrition them.
- 参数
- preprocess(shards, thetas=[0, 180])#
_summary_
- 参数
shards (_type_) – _description_
thetas (list, optional) – _description_. Defaults to [0, 180].
- get_dataset(id, type='train')#
Get dataset class
- 参数
- 抛出
- get_data_loader(id, batch_size=None, type='train')#
- class PartitionCIFAR(root, path, dataname, num_clients, download=True, preprocess=False, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- 参数
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
dataname (str) – “cifar10” or “cifar100”
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, download=True)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class PartitionedMNIST(root, path, num_clients, download=True, preprocess=False, partition='iid', dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- 参数
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
partition (str, optional) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(partition='iid', dir_alpha=None, verbose=True, seed=None, download=True, transform=None, target_transform=None)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class FCUBE(root, train=True, generate=True, transform=None, target_transform=None, num_samples=4000)#
Bases:
torch.utils.data.Dataset
FCUBE data set.
From paper Federated Learning on Non-IID Data Silos: An Experimental Study.
- 参数
root (str) – Root for data file.
train (bool, optional) – Training set or test set. Default as
True
.generate (bool, optional) – Whether to generate synthetic dataset. If
True
, then generate new synthetic FCUBE data even existed. Default asTrue
.transform (callable, optional) – A function/transform that takes in an
numpy.ndarray
and returns a transformed version.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
num_samples (int, optional) – Total number of samples to generate. We suggest to use 4000 for training set, and 1000 for test set. Default is
4000
for trainset.
- train_files#
- test_files#
- num_clients = 4#
- _generate_train()#
- _generate_test()#
- _save_data()#
- __len__()#
- class Covtype(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
Covtype binary dataset from LIBSVM Data.
- 参数
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 54#
- url = https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/covtype.libsvm.binary.bz2#
- source_file_name = covtype.libsvm.binary.bz2#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
- class RCV1(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
RCV1 binary dataset from LIBSVM Data.
- 参数
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 47236#
- url = https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2#
- source_file_name = rcv1_train.binary.bz2#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
core#
client#
manager#
Module Contents#
Base class for ClientManager. |
|
Passive communication |
|
Active communication |
- class ClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.model_maintainer.ModelMaintainer)#
Bases:
fedlab.core.network_manager.NetworkManager
Base class for ClientManager.
ClientManager
defines client activation in different communication stages.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ModelMaintainer) – Subclass of
ClientTrainer
orSerialClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.
- setup()#
Initialization stage.
ClientManager
reports number of clients simulated by current client process.
- class PassiveClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.model_maintainer.ModelMaintainer, logger: fedlab.utils.Logger = None)#
Bases:
ClientManager
Passive communication
NetworkManager
for client in synchronous FL pattern.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ModelMaintainer) – Subclass of
ClientTrainer
orSerialClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform when receiving a new message, including local training.
- Main procedure of each client:
client waits for data from server (PASSIVELY).
after receiving data, client start local model training procedure.
client synchronizes with server actively.
- synchronize()#
Synchronize with server.
- class ActiveClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.client.trainer.ClientTrainer, logger: fedlab.utils.Logger = None)#
Bases:
ClientManager
Active communication
NetworkManager
for client in asynchronous FL pattern.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ClientTrainer) – Subclass of
ClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform on receiving new message, including local training.
client requests data from server (ACTIVELY).
after receiving data, client will train local model.
client will synchronize with server actively.
- request()#
Client request.
- synchronize()#
Synchronize with server.
trainer#
Module Contents#
An abstract class representing a client trainer. |
|
Base class. Simulate multiple clients in sequence in a single process. |
- class ClientTrainer(model: torch.nn.Module, cuda: bool, device: str = None)#
Bases:
fedlab.core.model_maintainer.ModelMaintainer
An abstract class representing a client trainer.
In FedLab, we define the backend of client trainer show manage its local model. It should have a function to update its model called
local_process()
.If you use our framework to define the activities of client, please make sure that your self-defined class should subclass it. All subclasses should overwrite
local_process()
and propertyuplink_package
.- 参数
model (torch.nn.Module) – PyTorch model.
cuda (bool) – Use GPUs or not.
device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to
None
.
- abstract property uplink_package: List[torch.Tensor]#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- abstract setup_dataset()#
Set up local dataset
self.dataset
for clients.
- abstract setup_optim()#
Set up variables for optimization algorithms.
- abstract classmethod local_process(payload: List[torch.Tensor])#
Manager of the upper layer will call this function with accepted payload
In synchronous mode, return True to end current FL round.
- abstract train()#
Override this method to define the training procedure. This function should manipulate
self._model
.
- abstract validate()#
Validate quality of local model.
- abstract evaluate()#
Evaluate quality of local model.
- class SerialClientTrainer(model: torch.nn.Module, num_clients: int, cuda: bool, device: str = None, personal: bool = False)#
Bases:
fedlab.core.model_maintainer.SerialModelMaintainer
Base class. Simulate multiple clients in sequence in a single process.
- 参数
model (torch.nn.Module) – Model used in this federation.
num_clients (int) – Number of clients in current trainer.
cuda (bool) – Use GPUs or not. Default:
False
.device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None.
personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- abstract property uplink_package: List[List[torch.Tensor]]#
Return a tensor list for uploading to server.
This attribute will be called by client manager. Customize it for new algorithms.
- abstract setup_dataset()#
Override this function to set up local dataset for clients
- abstract setup_optim()#
- abstract classmethod local_process(id_list: list, payload: List[torch.Tensor])#
Define the local main process.
- abstract train()#
Override this method to define the algorithm of training your model. This function should manipulate
self._model
- abstract evaluate()#
Evaluate quality of local model.
- abstract validate()#
Validate quality of local model.
Package Contents#
Base class for ClientManager. |
|
Active communication |
|
Passive communication |
- ORDINARY_TRAINER = 0#
- SERIAL_TRAINER = 1#
- class ClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.model_maintainer.ModelMaintainer)#
Bases:
fedlab.core.network_manager.NetworkManager
Base class for ClientManager.
ClientManager
defines client activation in different communication stages.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ModelMaintainer) – Subclass of
ClientTrainer
orSerialClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.
- setup()#
Initialization stage.
ClientManager
reports number of clients simulated by current client process.
- class ActiveClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.client.trainer.ClientTrainer, logger: fedlab.utils.Logger = None)#
Bases:
ClientManager
Active communication
NetworkManager
for client in asynchronous FL pattern.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ClientTrainer) – Subclass of
ClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform on receiving new message, including local training.
client requests data from server (ACTIVELY).
after receiving data, client will train local model.
client will synchronize with server actively.
- request()#
Client request.
- synchronize()#
Synchronize with server.
- class PassiveClientManager(network: fedlab.core.network.DistNetwork, trainer: fedlab.core.model_maintainer.ModelMaintainer, logger: fedlab.utils.Logger = None)#
Bases:
ClientManager
Passive communication
NetworkManager
for client in synchronous FL pattern.- 参数
network (DistNetwork) – Network configuration and interfaces.
trainer (ModelMaintainer) – Subclass of
ClientTrainer
orSerialClientTrainer
. Provideslocal_process()
anduplink_package
. Define local client training procedure.logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform when receiving a new message, including local training.
- Main procedure of each client:
client waits for data from server (PASSIVELY).
after receiving data, client start local model training procedure.
client synchronizes with server actively.
- synchronize()#
Synchronize with server.
communicator#
FedLab communication API
package#
Module Contents#
A basic network package data structure used in FedLab. Everything is Tensor in FedLab. |
- supported_torch_dtypes#
- class Package(message_code: fedlab.utils.message_code.MessageCode = None, content: List[torch.Tensor] = None)#
Bases:
object
A basic network package data structure used in FedLab. Everything is Tensor in FedLab.
备注
slice_size_i = tensor_i.shape[0]
, that is, every element in slices indicates the size of a sub-Tensor in content.Package
maintains 3 variables:header
:torch.Tensor([sender_rank, recv_rank, content_size, message_code, data_type])
slices
:list[slice_size_1, slice_size_2]
content
:torch.Tensor([tensor_1, tensor_2, ...])
- 参数
message_code (MessageCode) – Message code
content (torch.Tensor, optional) – Tensors contained in this package.
- append_tensor(tensor: torch.Tensor)#
Append new tensor to
Package.content
- 参数
tensor (torch.Tensor) – Tensor to append in content.
- append_tensor_list(tensor_list: List[torch.Tensor])#
Append a list of tensors to
Package.content
.- 参数
tensor_list (list[torch.Tensor]) – A list of tensors to append to
Package.content
.
- to(dtype)#
- static parse_content(slices, content)#
Parse package content into a list of tensors
- 参数
slices (list[int]) – A list containing number of elements of each tensor. Each number is used as offset in parsing process.
content (torch.Tensor) –
Package.content
, a 1-D tensor composed of several 1-D tensors and their corresponding offsets. For more details aboutPackage
.
- 返回
A list of 1-D tensors parsed from
content
- 返回类型
- static parse_header(header)#
Parse header to get information of current package.
- 参数
header (torch.Tensor) –
Package.header
, a 1-D tensor composed of 4 elements:torch.Tensor([sender_rank, recv_rank, slice_size, message_code, data_type])
.
:param For more details about
Package
.:- 返回
A tuple containing 5 elements:
(sender_rank, recv_rank, slice_size, message_code, data_type)
.- 返回类型
processor#
Module Contents#
Provide more flexible distributed tensor communication functions based on |
- class PackageProcessor#
Bases:
object
Provide more flexible distributed tensor communication functions based on
torch.distributed.send()
andtorch.distributed.recv()
.PackageProcessor
defines the details of point-to-point package communication.EVERYTHING is
torch.Tensor
in FedLab.- static send_package(package, dst)#
Three-segment tensor communication pattern based on
torch.distributed
- Pattern is shown as follows:
1.1 sender: send a header tensor containing
slice_size
to receiver1.2 receiver: receive the header, and get the value of
slice_size
and create a buffer for incoming slices of content2.1 sender: send a list of slices indicating the size of every content size.
2.2 receiver: receive the slices list.
3.1 sender: send a content tensor composed of a list of tensors.
3.2 receiver: receive the content tensor, and parse it to obtain slices list using parser function
- static recv_package(src=None)#
Three-segment tensor communication pattern based on
torch.distributed
- Pattern is shown as follows:
1.1 sender: send a header tensor containing
slice_size
to receiver1.2 receiver: receive the header, and get the value of
slice_size
and create a buffer for incoming slices of content2.1 sender: send a list of slices indicating the size of every content size.
2.2 receiver: receive the slices list.
3.1 sender: send a content tensor composed of a list of tensors.
3.2 receiver: receive the content tensor, and parse it to obtain slices list using parser function
Package Contents#
|
|
|
- HEADER_SENDER_RANK_IDX = 0#
- HEADER_RECEIVER_RANK_IDX = 1#
- HEADER_SLICE_SIZE_IDX = 2#
- HEADER_MESSAGE_CODE_IDX = 3#
- HEADER_DATA_TYPE_IDX = 4#
- DEFAULT_RECEIVER_RANK#
- DEFAULT_SLICE_SIZE = 0#
- DEFAULT_MESSAGE_CODE_VALUE = 0#
- HEADER_SIZE = 5#
- INT8 = 0#
- INT16 = 1#
- INT32 = 2#
- INT64 = 3#
- FLOAT16 = 4#
- FLOAT32 = 5#
- FLOAT64 = 6#
- dtype_torch2flab(torch_type)#
- dtype_flab2torch(fedlab_type)#
server#
hierarchical#
connector#
Abstract class for basic Connector, which is a sub-module of |
|
Connect with server. |
|
Connect with clients. |
- class Connector(network: fedlab.core.network.DistNetwork, write_queue: torch.multiprocessing.Queue, read_queue: torch.multiprocessing.Queue)#
Bases:
fedlab.core.network_manager.NetworkManager
Abstract class for basic Connector, which is a sub-module of
Scheduler
.Connector inherits
NetworkManager
, maintaining two Message Queue. One is for sending messages to collaborator, the other is for read messages from others.备注
Connector is a basic component for scheduler, Example code can be seen in
scheduler.py
.- 参数
network (DistNetwork) – Manage
torch.distributed
network communication.write_queue (torch.multiprocessing.Queue) – Message queue to write.
read_queue (torch.multiprocessing.Queue) – Message queue to read.
- abstract process_meessage_queue()#
Define the procedure of dealing with message queue.
- class ServerConnector(network: fedlab.core.network.DistNetwork, write_queue: torch.multiprocessing.Queue, read_queue: torch.multiprocessing.Queue, logger: fedlab.utils.Logger = None)#
Bases:
Connector
Connect with server.
this process will act like a client.
This class is a part of middle server which used in hierarchical structure.
- 参数
network (DistNetwork) – Network configuration and interfaces.
write_queue (torch.multiprocessing.Queue) – Message queue to write.
read_queue (torch.multiprocessing.Queue) – Message queue to read.
logger (Logger, optional) – object of
Logger
. Defaults to None.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- main_loop()#
Define the actions of communication stage.
- process_meessage_queue()#
client -> server directly transport.
- class ClientConnector(network: fedlab.core.network.DistNetwork, write_queue: torch.multiprocessing.Queue, read_queue: torch.multiprocessing.Queue, logger: fedlab.utils.Logger = None)#
Bases:
Connector
Connect with clients.
This class is a part of middle server which used in hierarchical structure.
- 参数
network (DistNetwork) – Network configuration and interfaces.
write_queue (torch.multiprocessing.Queue) – Message queue to write.
read_queue (torch.multiprocessing.Queue) – Message queue to read.
logger (Logger, optional) – object of
Logger
. Defaults to None.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- main_loop()#
Define the actions of communication stage.
- process_meessage_queue()#
Process message queue
Strategy of processing message from server.
scheduler#
Middle Topology for hierarchical communication pattern. |
- class Scheduler(net_upper: fedlab.core.network.DistNetwork, net_lower: fedlab.core.network.DistNetwork)#
Middle Topology for hierarchical communication pattern.
Scheduler uses message queues to decouple connector modules.
- 参数
net_upper (DistNetwork) – Distributed network manager of server from upper level.
net_lower (DistNetwork) – Distributed network manager of clients from lower level.
- run()#
Package Contents#
Connect with clients. |
|
Connect with server. |
|
Middle Topology for hierarchical communication pattern. |
- class ClientConnector(network: fedlab.core.network.DistNetwork, write_queue: torch.multiprocessing.Queue, read_queue: torch.multiprocessing.Queue, logger: fedlab.utils.Logger = None)#
Bases:
Connector
Connect with clients.
This class is a part of middle server which used in hierarchical structure.
- 参数
network (DistNetwork) – Network configuration and interfaces.
write_queue (torch.multiprocessing.Queue) – Message queue to write.
read_queue (torch.multiprocessing.Queue) – Message queue to read.
logger (Logger, optional) – object of
Logger
. Defaults to None.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- main_loop()#
Define the actions of communication stage.
- process_meessage_queue()#
Process message queue
Strategy of processing message from server.
- class ServerConnector(network: fedlab.core.network.DistNetwork, write_queue: torch.multiprocessing.Queue, read_queue: torch.multiprocessing.Queue, logger: fedlab.utils.Logger = None)#
Bases:
Connector
Connect with server.
this process will act like a client.
This class is a part of middle server which used in hierarchical structure.
- 参数
network (DistNetwork) – Network configuration and interfaces.
write_queue (torch.multiprocessing.Queue) – Message queue to write.
read_queue (torch.multiprocessing.Queue) – Message queue to read.
logger (Logger, optional) – object of
Logger
. Defaults to None.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- main_loop()#
Define the actions of communication stage.
- process_meessage_queue()#
client -> server directly transport.
- class Scheduler(net_upper: fedlab.core.network.DistNetwork, net_lower: fedlab.core.network.DistNetwork)#
Middle Topology for hierarchical communication pattern.
Scheduler uses message queues to decouple connector modules.
- 参数
net_upper (DistNetwork) – Distributed network manager of server from upper level.
net_lower (DistNetwork) – Distributed network manager of clients from lower level.
- run()#
handler#
Module Contents#
An abstract class representing handler of parameter server. |
- class ServerHandler(model: torch.nn.Module, cuda: bool, device: str = None)#
Bases:
fedlab.core.model_maintainer.ModelMaintainer
An abstract class representing handler of parameter server.
Please make sure that your self-defined server handler class subclasses this class
Example
Read source code of
SyncServerHandler
andAsyncServerHandler
.- 参数
model (torch.nn.Module) – PyTorch model.
cuda (bool) – Use GPUs or not.
device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
- abstract property downlink_package: List[torch.Tensor]#
Property for manager layer. Server manager will call this property when activates clients.
- abstract property if_stop: bool#
NetworkManager
keeps monitoring this attribute, and it will stop all related processes and threads whenTrue
returned.
- abstract setup_optim()#
Override this function to load your optimization hyperparameters.
- abstract global_update(buffer)#
- abstract load(payload)#
Override this function to define how to update global model (aggregation or optimization).
- abstract evaluate()#
Override this function to define the evaluation of global model.
manager#
Module Contents#
Base class of ServerManager. |
|
Synchronous communication |
|
Asynchronous communication network manager for server |
- DEFAULT_SERVER_RANK = 0#
- class ServerManager(network: fedlab.core.network.DistNetwork, handler: fedlab.core.server.handler.ServerHandler, mode: str = 'LOCAL')#
Bases:
fedlab.core.network_manager.NetworkManager
Base class of ServerManager.
- 参数
network (DistNetwork) – Network configuration and interfaces.
handler (ServerHandler) – Performe global model update procedure.
- setup()#
Initialization Stage.
Server accept local client num report from client manager.
Init a coordinator for client_id -> rank mapping.
- class SynchronousServerManager(network: fedlab.core.network.DistNetwork, handler: fedlab.core.server.handler.ServerHandler, mode: str = 'LOCAL', logger: fedlab.utils.Logger = None)#
Bases:
ServerManager
Synchronous communication
This is the top class in our framework which is mainly responsible for network communication of SERVER!. Synchronously communicate with clients following agreements defined in
main_loop()
.- 参数
network (DistNetwork) – Network configuration and interfaces.
handler (ServerHandler) – Backend calculation handler for parameter server.
logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform in server when receiving a package from one client.
Server transmits received package to backend computation handler for aggregation or others manipulations.
- Loop:
activate clients for current training round.
listen for message from clients -> transmit received parameters to server handler.
备注
Communication agreements related: user can overwrite this function to customize communication agreements. This method is key component connecting behaviors of
ServerHandler
andNetworkManager
.- 抛出
Exception – Unexpected
MessageCode
.
- shutdown()#
Shutdown stage.
- activate_clients()#
Activate subset of clients to join in one FL round
Manager will start a new thread to send activation package to chosen clients’ process rank. The id of clients are obtained from
handler.sample_clients()
. And their communication ranks are are obtained via coordinator.
- shutdown_clients()#
Shutdown all clients.
Send package to each client with
MessageCode.Exit
.备注
Communication agreements related: User can overwrite this function to define package for exiting information.
- class AsynchronousServerManager(network: fedlab.core.network.DistNetwork, handler: fedlab.core.server.handler.ServerHandler, logger: fedlab.utils.Logger = None)#
Bases:
ServerManager
Asynchronous communication network manager for server
This is the top class in our framework which is mainly responsible for network communication of SERVER!. Asynchronously communicate with clients following agreements defined in
mail_loop()
.- 参数
network (DistNetwork) – Network configuration and interfaces.
handler (ServerHandler) – Backend computation handler for parameter server.
logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Communication agreements of asynchronous FL.
Server receive ParameterRequest from client. Send model parameter to client.
Server receive ParameterUpdate from client. Transmit parameters to queue waiting for aggregation.
- 抛出
ValueError – invalid message code.
- shutdown()#
Shutdown stage.
Close the network connection in the end.
- updater_thread()#
Asynchronous communication maintain a message queue. A new thread will be started to keep monitoring message queue.
- shutdown_clients()#
Shutdown all clients.
Send package to clients with
MessageCode.Exit
.
Package Contents#
Synchronous communication |
|
Asynchronous communication network manager for server |
- class SynchronousServerManager(network: fedlab.core.network.DistNetwork, handler: fedlab.core.server.handler.ServerHandler, mode: str = 'LOCAL', logger: fedlab.utils.Logger = None)#
Bases:
ServerManager
Synchronous communication
This is the top class in our framework which is mainly responsible for network communication of SERVER!. Synchronously communicate with clients following agreements defined in
main_loop()
.- 参数
network (DistNetwork) – Network configuration and interfaces.
handler (ServerHandler) – Backend calculation handler for parameter server.
logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Actions to perform in server when receiving a package from one client.
Server transmits received package to backend computation handler for aggregation or others manipulations.
- Loop:
activate clients for current training round.
listen for message from clients -> transmit received parameters to server handler.
备注
Communication agreements related: user can overwrite this function to customize communication agreements. This method is key component connecting behaviors of
ServerHandler
andNetworkManager
.- 抛出
Exception – Unexpected
MessageCode
.
- shutdown()#
Shutdown stage.
- activate_clients()#
Activate subset of clients to join in one FL round
Manager will start a new thread to send activation package to chosen clients’ process rank. The id of clients are obtained from
handler.sample_clients()
. And their communication ranks are are obtained via coordinator.
- shutdown_clients()#
Shutdown all clients.
Send package to each client with
MessageCode.Exit
.备注
Communication agreements related: User can overwrite this function to define package for exiting information.
- class AsynchronousServerManager(network: fedlab.core.network.DistNetwork, handler: fedlab.core.server.handler.ServerHandler, logger: fedlab.utils.Logger = None)#
Bases:
ServerManager
Asynchronous communication network manager for server
This is the top class in our framework which is mainly responsible for network communication of SERVER!. Asynchronously communicate with clients following agreements defined in
mail_loop()
.- 参数
network (DistNetwork) – Network configuration and interfaces.
handler (ServerHandler) – Backend computation handler for parameter server.
logger (Logger, optional) – Object of
Logger
.
- main_loop()#
Communication agreements of asynchronous FL.
Server receive ParameterRequest from client. Send model parameter to client.
Server receive ParameterUpdate from client. Transmit parameters to queue waiting for aggregation.
- 抛出
ValueError – invalid message code.
- shutdown()#
Shutdown stage.
Close the network connection in the end.
- updater_thread()#
Asynchronous communication maintain a message queue. A new thread will be started to keep monitoring message queue.
- shutdown_clients()#
Shutdown all clients.
Send package to clients with
MessageCode.Exit
.
coordinator#
Module Contents#
Deal with the mapping relation between client id and process rank in FL system. |
- class Coordinator(setup_dict: dict, mode: str = 'LOCAL')#
Bases:
object
Deal with the mapping relation between client id and process rank in FL system.
- Note
Server Manager creates a Coordinator following: 1. init network connection. 2. client send local group info (the number of client simulating in local) to server. 4. server receive all info and init a server Coordinator.
- 参数
setup_dict (dict) – A dict like {rank:client_num …}, representing the map relation between process rank and client id.
mode (str, optional) – “GLOBAL” and “LOCAL”. Coordinator will map client id to (rank, global id) or (rank, local id) according to mode. For example, client id 51 is in a machine which has 1 manager and serial trainer simulating 10 clients. LOCAL id means the index of its 10 clients. Therefore, global id 51 will be mapped into local id 1 (depending on setting).
- property total#
- map_id(id)#
a map function from client id to (rank,local id)
- 参数
id (int) – client id
- 返回
rank in distributed group and local id.
- 返回类型
rank, id
- map_id_list(id_list: list)#
a map function from id_list to dict{rank:local id}
This can be very useful in Scale modules.
- switch()#
- __call__(info)#
model_maintainer#
Module Contents#
Maintain PyTorch model. |
|
"Maintain PyTorch model. |
- class ModelMaintainer(model: torch.nn.Module, cuda: bool, device: str = None)#
Bases:
object
Maintain PyTorch model.
Provide necessary attributes and operation methods. More features with local or global model will be implemented here.
- 参数
model (torch.nn.Module) – PyTorch model.
cuda (bool) – Use GPUs or not.
device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest memory as default.
- property model: torch.nn.Module#
Return
torch.nn.module
.
- property model_parameters: torch.Tensor#
Return serialized model parameters.
- property model_gradients: torch.Tensor#
Return serialized model gradients.
- property shape_list: List[torch.Tensor]#
Return shape of model parameters.
Currently, this attributes used in tensor compression.
- set_model(parameters: torch.Tensor)#
Assign parameters to self._model.
- class SerialModelMaintainer(model: torch.nn.Module, num_clients: int, cuda: bool, device: str = None, personal: bool = False)#
Bases:
ModelMaintainer
“Maintain PyTorch model.
Provide necessary attributes and operation methods. More features with local or global model will be implemented here.
- 参数
model (torch.nn.Module) – PyTorch model.
num_clients (int) – The number of independent models.
cuda (bool) – Use GPUs or not.
device (str, optional) – Assign model/data to the given GPUs. E.g., ‘device:0’ or ‘device:0,1’. Defaults to None. If device is None and cuda is True, FedLab will set the gpu with the largest idle memory as default.
personal (bool, optional) – If Ture is passed, SerialModelMaintainer will generate the copy of local parameters list and maintain them respectively. These paremeters are indexed by [0, num-1]. Defaults to False.
- set_model(parameters: torch.Tensor = None, id: int = None)#
Assign parameters to self._model.
备注
parameters and id can not be None at the same time. If id is None, this function load the given parameters. If id is not None, this function load the parameters of given id first and the parameters attribute will be ignored.
- 参数
parameters (torch.Tensor, optional) – Model parameters. Defaults to None.
id (int, optional) – Load the model parameters of client id. Defaults to None.
network#
Module Contents#
Manage |
- type2byte#
- class DistNetwork(address: tuple, world_size: int, rank: int, ethernet: str = None, dist_backend: str = 'gloo')#
Bases:
object
Manage
torch.distributed
network.- 参数
address (tuple) – Address of this server in form of
(SERVER_ADDR, SERVER_IP)
world_size (int) – the size of this distributed group (including server).
rank (int) – the rank of process in distributed group.
ethernet (str) – the name of local ethernet. User could check it using command ifconfig.
dist_backend (str or torch.distributed.Backend) –
backend
oftorch.distributed
. Valid values includempi
,gloo
, andnccl
. Default:gloo
.
- init_network_connection()#
Initialize
torch.distributed
communication group
- close_network_connection()#
Destroy current
torch.distributed
process group
- send(content=None, message_code=None, dst=0, count=True)#
Send tensor to process rank=dst
- recv(src=None, count=True)#
Receive tensor from process rank=src
- broadcast_send(content=None, message_code=None, dst=None, count=True)#
- broadcast_recv(src=None, count=True)#
- __str__()#
Return str(self).
network_manager#
Module Contents#
Abstract class. |
- class NetworkManager(network: fedlab.core.network.DistNetwork)#
Bases:
torch.multiprocessing.Process
Abstract class.
- 参数
network (DistNetwork) – object to manage torch.distributed network communication.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- abstract main_loop()#
Define the actions of communication stage.
- shutdown()#
Shutdown stage.
Close the network connection in the end.
standalone#
Module Contents#
- class StandalonePipeline(handler: fedlab.core.server.handler.ServerHandler, trainer: fedlab.core.client.trainer.SerialClientTrainer)#
Bases:
object
- main()#
- evaluate()#
Package Contents#
Manage |
|
Abstract class. |
- class DistNetwork(address: tuple, world_size: int, rank: int, ethernet: str = None, dist_backend: str = 'gloo')#
Bases:
object
Manage
torch.distributed
network.- 参数
address (tuple) – Address of this server in form of
(SERVER_ADDR, SERVER_IP)
world_size (int) – the size of this distributed group (including server).
rank (int) – the rank of process in distributed group.
ethernet (str) – the name of local ethernet. User could check it using command ifconfig.
dist_backend (str or torch.distributed.Backend) –
backend
oftorch.distributed
. Valid values includempi
,gloo
, andnccl
. Default:gloo
.
- init_network_connection()#
Initialize
torch.distributed
communication group
- close_network_connection()#
Destroy current
torch.distributed
process group
- send(content=None, message_code=None, dst=0, count=True)#
Send tensor to process rank=dst
- recv(src=None, count=True)#
Receive tensor from process rank=src
- broadcast_send(content=None, message_code=None, dst=None, count=True)#
- broadcast_recv(src=None, count=True)#
- __str__()#
Return str(self).
- class NetworkManager(network: fedlab.core.network.DistNetwork)#
Bases:
torch.multiprocessing.Process
Abstract class.
- 参数
network (DistNetwork) – object to manage torch.distributed network communication.
- run()#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup()#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- abstract main_loop()#
Define the actions of communication stage.
- shutdown()#
Shutdown stage.
Close the network connection in the end.
models#
cnn#
CNN model in pytorch .. rubric:: References
[1] Reddi S, Charles Z, Zaheer M, et al. Adaptive Federated Optimization. ICML 2020. https://arxiv.org/pdf/2003.00295.pdf
Module Contents#
Used for EMNIST experiments in references[1] |
|
from torch tutorial |
|
- class CNN_FEMNIST(only_digits=False)#
Bases:
torch.nn.Module
Used for EMNIST experiments in references[1] :param only_digits: If True, uses a final layer with 10 outputs, for use with the
digits only MNIST dataset (http://yann.lecun.com/exdb/mnist/). If selfalse, uses 62 outputs for selfederated Extended MNIST (selfEMNIST) EMNIST: Extending MNIST to handwritten letters: https://arxiv.org/abs/1702.05373 Defaluts to True
- 返回
A torch.nn.Module.
- forward(x)#
- class CNN_MNIST#
Bases:
torch.nn.Module
- forward(x)#
- class CNN_CIFAR10#
Bases:
torch.nn.Module
from torch tutorial https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
- forward(x)#
- class AlexNet_CIFAR10(num_classes=10)#
Bases:
torch.nn.Module
- forward(x)#
mlp#
Module Contents#
Used for celeba experiment |
|
- class MLP_CelebA#
Bases:
torch.nn.Module
Used for celeba experiment
- forward(x)#
- class MLP(input_size, output_size)#
Bases:
torch.nn.Module
- forward(x)#
rnn#
RNN model in pytorch .. rubric:: References
[1] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agueray Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017. https://arxiv.org/abs/1602.05629 [2] Reddi S, Charles Z, Zaheer M, et al. Adaptive Federated Optimization. ICML 2020. https://arxiv.org/pdf/2003.00295.pdf
Module Contents#
- class RNN_Shakespeare(vocab_size=80, embedding_dim=8, hidden_size=256)#
Bases:
torch.nn.Module
- forward(input_seq)#
- class LSTMModel(vocab_size, embedding_dim, hidden_size, num_layers, output_dim, pad_idx=0, using_pretrained=False, embedding_weights=None, bid=False)#
Bases:
torch.nn.Module
- forward(input_seq: torch.Tensor)#
Package Contents#
from torch tutorial |
|
Used for EMNIST experiments in references[1] |
|
Used for celeba experiment |
- class CNN_CIFAR10#
Bases:
torch.nn.Module
from torch tutorial https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
- forward(x)#
- class CNN_FEMNIST(only_digits=False)#
Bases:
torch.nn.Module
Used for EMNIST experiments in references[1] :param only_digits: If True, uses a final layer with 10 outputs, for use with the
digits only MNIST dataset (http://yann.lecun.com/exdb/mnist/). If selfalse, uses 62 outputs for selfederated Extended MNIST (selfEMNIST) EMNIST: Extending MNIST to handwritten letters: https://arxiv.org/abs/1702.05373 Defaluts to True
- 返回
A torch.nn.Module.
- forward(x)#
- class CNN_MNIST#
Bases:
torch.nn.Module
- forward(x)#
- class RNN_Shakespeare(vocab_size=80, embedding_dim=8, hidden_size=256)#
Bases:
torch.nn.Module
- forward(input_seq)#
- class MLP(input_size, output_size)#
Bases:
torch.nn.Module
- forward(x)#
- class MLP_CelebA#
Bases:
torch.nn.Module
Used for celeba experiment
- forward(x)#
utils#
dataset#
functional#
Module Contents#
|
Splice the sample index list given number of each client. |
|
Assign same sample sample for each client. |
|
Assign different sample number for each client using Log-Normal distribution. |
|
Assign different sample number for each client using Dirichlet distribution. |
|
Partition data indices in IID way given sample numbers for each clients. |
|
Non-iid partition based on Dirichlet distribution. The method is from "hetero-dir" partition of |
|
Non-iid partition used in FedAvg paper. |
|
Non-iid Dirichlet partition. |
|
Label-skew:quantity-based partition. |
Feature-distribution-skew:synthetic partition. |
|
|
Return sample count for all clients in |
|
Slice a dataset for non-IID. |
|
Slice a dataset randomly and equally for IID. |
- split_indices(num_cumsum, rand_perm)#
Splice the sample index list given number of each client.
- balance_split(num_clients, num_samples)#
Assign same sample sample for each client.
- 参数
- 返回
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- 返回类型
- lognormal_unbalance_split(num_clients, num_samples, unbalance_sgm)#
Assign different sample number for each client using Log-Normal distribution.
Sample numbers for clients are drawn from Log-Normal distribution.
- 参数
- 返回
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- 返回类型
- dirichlet_unbalance_split(num_clients, num_samples, alpha)#
Assign different sample number for each client using Dirichlet distribution.
Sample numbers for clients are drawn from Dirichlet distribution.
- 参数
- 返回
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- 返回类型
- homo_partition(client_sample_nums, num_samples)#
Partition data indices in IID way given sample numbers for each clients.
- 参数
client_sample_nums (numpy.ndarray) – Sample numbers for each clients.
num_samples (int) – Number of samples.
- 返回
{ client_id: indices}
.- 返回类型
- hetero_dir_partition(targets, num_clients, num_classes, dir_alpha, min_require_size=None)#
Non-iid partition based on Dirichlet distribution. The method is from “hetero-dir” partition of Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging.
This method simulates heterogeneous partition for which number of data points and class proportions are unbalanced. Samples will be partitioned into \(J\) clients by sampling \(p_k \sim \text{Dir}_{J}({\alpha})\) and allocating a \(p_{p,j}\) proportion of the samples of class \(k\) to local client \(j\).
Sample number for each client is decided in this function.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
num_classes (int) – Number of classes in samples.
dir_alpha (float) – Parameter alpha for Dirichlet distribution.
min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
.
- 返回
{ client_id: indices}
.- 返回类型
- shards_partition(targets, num_clients, num_shards)#
Non-iid partition used in FedAvg paper.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
num_shards (int) – Number of shards in partition.
- 返回
{ client_id: indices}
.- 返回类型
- client_inner_dirichlet_partition(targets, num_clients, num_classes, dir_alpha, client_sample_nums, verbose=True)#
Non-iid Dirichlet partition.
The method is from The method is from paper Federated Learning Based on Dynamic Regularization. This function can be used by given specific sample number for all clients
client_sample_nums
. It’s different fromhetero_dir_partition()
.- 参数
targets (list or numpy.ndarray) – Sample targets.
num_clients (int) – Number of clients for partition.
num_classes (int) – Number of classes in samples.
dir_alpha (float) – Parameter alpha for Dirichlet distribution.
client_sample_nums (numpy.ndarray) – A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.verbose (bool, optional) – Whether to print partition process. Default as
True
.
- 返回
{ client_id: indices}
.- 返回类型
- label_skew_quantity_based_partition(targets, num_clients, num_classes, major_classes_num)#
Label-skew:quantity-based partition.
For details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.
- fcube_synthetic_partition(data)#
Feature-distribution-skew:synthetic partition.
Synthetic partition for FCUBE dataset. This partition is from Federated Learning on Non-IID Data Silos: An Experimental Study.
- 参数
data (np.ndarray) – Data of dataset
FCUBE
.- 返回
{ client_id: indices}
.- 返回类型
- samples_num_count(client_dict, num_clients)#
Return sample count for all clients in
client_dict
.
- noniid_slicing(dataset, num_clients, num_shards)#
Slice a dataset for non-IID.
- 参数
dataset (torch.utils.data.Dataset) – Dataset to slice.
num_clients (int) – Number of client.
num_shards (int) – Number of shards.
Notes
The size of a shard equals to
int(len(dataset)/num_shards)
. Each client will getint(num_shards/num_clients)
shards.- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }
- random_slicing(dataset, num_clients)#
Slice a dataset randomly and equally for IID.
- Args:
dataset (torch.utils.data.Dataset): a dataset for slicing. num_clients (int): the number of client.
- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }
partition#
Module Contents#
Base class for data partition in federated learning. |
|
CIFAR10 data partitioner. |
|
CIFAR100 data partitioner. |
|
Basic data partitioner. |
|
Data partitioner for vision data. |
|
Data partitioner for MNIST. |
|
Data partitioner for FashionMNIST. |
|
Data partitioner for SVHN. |
|
FCUBE data partitioner. |
|
Data partitioner for Adult. |
|
Data partitioner for RCV1. |
|
Data partitioner for Covtype. |
- class DataPartitioner#
Bases:
abc.ABC
Base class for data partition in federated learning.
Examples of
DataPartitioner
:BasicPartitioner
,CIFAR10Partitioner
.Details and tutorials of different data partition and datasets, please check Federated Dataset and DataPartitioner.
- abstract _perform_partition()#
- abstract __getitem__(index)#
- abstract __len__()#
- class CIFAR10Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
CIFAR10 data partitioner.
Partition CIFAR10 given specific client number. Currently 6 supported partition schemes can be achieved by passing different combination of parameters in initialization:
balance=None
partition="dirichlet"
: non-iid partition used in Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging. Refer tofedlab.utils.dataset.functional.hetero_dir_partition()
for more information.partition="shards"
: non-iid method used in FedAvg paper. Refer tofedlab.utils.dataset.functional.shards_partition()
for more information.
balance=True
: “Balance” refers to FL scenario that sample numbers for different clients are the same. Refer tofedlab.utils.dataset.functional.balance_partition()
for more information.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
balance=False
: “Unbalance” refers to FL scenario that sample numbers for different clients are different. For unbalance method, sample number for each client is drown from Log-Normal distribution with varianceunbalanced_sgm
. Whenunbalanced_sgm=0
, partition is balanced. Refer tofedlab.utils.dataset.functional.lognormal_unbalance_partition()
for more information. The method is from paper Federated Learning Based on Dynamic Regularization.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
For detail usage, please check Federated Dataset and DataPartitioner.
- 参数
targets (list or numpy.ndarray) – Targets of dataset for partition. Each element is in range of [0, 1, …, 9].
num_clients (int) – Number of clients for data partition.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int, optional) – Random seed. Default as
None
.
- num_classes = 10#
- _perform_partition()#
- __getitem__(index)#
Obtain sample indices for client
index
.
- __len__()#
Usually equals to number of clients.
- class CIFAR100Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
CIFAR10Partitioner
CIFAR100 data partitioner.
This is a subclass of the
CIFAR10Partitioner
. For details, please check Federated Dataset and DataPartitioner.- num_classes = 100#
- class BasicPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
Basic data partitioner.
Basic data partitioner, supported partition:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study and Federated Dataset and DataPartitioner.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int) – Random seed. Default as
None
.
- 返回
{ client_id: indices}
.- 返回类型
- num_classes = 2#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class VisionPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
BasicPartitioner
Data partitioner for vision data.
Supported partition for vision data:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.seed (int) – Random seed. Default as
None
.
- 返回
{ client_id: indices}
.- 返回类型
- num_classes = 10#
- class MNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for MNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner.- num_features = 784#
- class FMNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for FashionMNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 784#
- class SVHNPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for SVHN.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 1024#
- class FCUBEPartitioner(data, partition)#
Bases:
DataPartitioner
FCUBE data partitioner.
FCUBE is a synthetic dataset for research in non-IID scenario with feature imbalance. This dataset and its partition methods are proposed in Federated Learning on Non-IID Data Silos: An Experimental Study.
Supported partition methods for FCUBE:
feature-distribution-skew:synthetic
IID
For more details, please refer to Section (IV-B-b) of original paper. For detailed usage, please check Federated Dataset and DataPartitioner.
- 参数
data (numpy.ndarray) – Data of dataset
FCUBE
.partition (str) – Partition type. Only supports ‘synthetic’ and ‘iid’.
- num_classes = 2#
- num_clients = 4#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class AdultPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Adult.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 123#
- num_classes = 2#
- class RCV1Partitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for RCV1.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 47236#
- num_classes = 2#
- class CovtypePartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Covtype.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 54#
- num_classes = 2#
Package Contents#
Base class for data partition in federated learning. |
|
Basic data partitioner. |
|
Data partitioner for vision data. |
|
CIFAR10 data partitioner. |
|
CIFAR100 data partitioner. |
|
Data partitioner for FashionMNIST. |
|
Data partitioner for MNIST. |
|
Data partitioner for SVHN. |
|
FCUBE data partitioner. |
|
Data partitioner for Adult. |
|
Data partitioner for RCV1. |
|
Data partitioner for Covtype. |
- class DataPartitioner#
Bases:
abc.ABC
Base class for data partition in federated learning.
Examples of
DataPartitioner
:BasicPartitioner
,CIFAR10Partitioner
.Details and tutorials of different data partition and datasets, please check Federated Dataset and DataPartitioner.
- abstract _perform_partition()#
- abstract __getitem__(index)#
- abstract __len__()#
- class BasicPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
Basic data partitioner.
Basic data partitioner, supported partition:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study and Federated Dataset and DataPartitioner.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int) – Random seed. Default as
None
.
- 返回
{ client_id: indices}
.- 返回类型
- num_classes = 2#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class VisionPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
BasicPartitioner
Data partitioner for vision data.
Supported partition for vision data:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.
- 参数
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.seed (int) – Random seed. Default as
None
.
- 返回
{ client_id: indices}
.- 返回类型
- num_classes = 10#
- class CIFAR10Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
CIFAR10 data partitioner.
Partition CIFAR10 given specific client number. Currently 6 supported partition schemes can be achieved by passing different combination of parameters in initialization:
balance=None
partition="dirichlet"
: non-iid partition used in Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging. Refer tofedlab.utils.dataset.functional.hetero_dir_partition()
for more information.partition="shards"
: non-iid method used in FedAvg paper. Refer tofedlab.utils.dataset.functional.shards_partition()
for more information.
balance=True
: “Balance” refers to FL scenario that sample numbers for different clients are the same. Refer tofedlab.utils.dataset.functional.balance_partition()
for more information.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
balance=False
: “Unbalance” refers to FL scenario that sample numbers for different clients are different. For unbalance method, sample number for each client is drown from Log-Normal distribution with varianceunbalanced_sgm
. Whenunbalanced_sgm=0
, partition is balanced. Refer tofedlab.utils.dataset.functional.lognormal_unbalance_partition()
for more information. The method is from paper Federated Learning Based on Dynamic Regularization.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
For detail usage, please check Federated Dataset and DataPartitioner.
- 参数
targets (list or numpy.ndarray) – Targets of dataset for partition. Each element is in range of [0, 1, …, 9].
num_clients (int) – Number of clients for data partition.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int, optional) – Random seed. Default as
None
.
- num_classes = 10#
- _perform_partition()#
- __getitem__(index)#
Obtain sample indices for client
index
.
- __len__()#
Usually equals to number of clients.
- class CIFAR100Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
CIFAR10Partitioner
CIFAR100 data partitioner.
This is a subclass of the
CIFAR10Partitioner
. For details, please check Federated Dataset and DataPartitioner.- num_classes = 100#
- class FMNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for FashionMNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 784#
- class MNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for MNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner.- num_features = 784#
- class SVHNPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for SVHN.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 1024#
- class FCUBEPartitioner(data, partition)#
Bases:
DataPartitioner
FCUBE data partitioner.
FCUBE is a synthetic dataset for research in non-IID scenario with feature imbalance. This dataset and its partition methods are proposed in Federated Learning on Non-IID Data Silos: An Experimental Study.
Supported partition methods for FCUBE:
feature-distribution-skew:synthetic
IID
For more details, please refer to Section (IV-B-b) of original paper. For detailed usage, please check Federated Dataset and DataPartitioner.
- 参数
data (numpy.ndarray) – Data of dataset
FCUBE
.partition (str) – Partition type. Only supports ‘synthetic’ and ‘iid’.
- num_classes = 2#
- num_clients = 4#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class AdultPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Adult.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 123#
- num_classes = 2#
- class RCV1Partitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for RCV1.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 47236#
- num_classes = 2#
- class CovtypePartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Covtype.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 54#
- num_classes = 2#
aggregator#
Module Contents#
Define the algorithm of parameters aggregation |
- class Aggregators#
Bases:
object
Define the algorithm of parameters aggregation
- static fedavg_aggregate(serialized_params_list, weights=None)#
FedAvg aggregator
Paper: http://proceedings.mlr.press/v54/mcmahan17a.html
- 参数
serialized_params_list (list[torch.Tensor])) – Merge all tensors following FedAvg.
weights (list, numpy.array or torch.Tensor, optional) – Weights for each params, the length of weights need to be same as length of
serialized_params_list
- 返回
torch.Tensor
- static fedasync_aggregate(server_param, new_param, alpha)#
FedAsync aggregator
functional#
Module Contents#
Record metrics information |
|
|
|
Evaluate classify task model accuracy. |
|
Read config from json_file to get config for user_name |
Return gpu ( |
|
|
Generate data partition report for clients in |
- setup_seed(seed)#
- evaluate(model, criterion, test_loader)#
Evaluate classify task model accuracy.
- 返回
(loss.sum, acc.avg)
- read_config_from_json(json_file: str, user_name: str)#
Read config from json_file to get config for user_name
- 参数
- 返回
a tuple with ip, port, world_size, rank about user with user_name
Examples
read_config_from_json(‘../../../tests/data/config.json’, ‘server’)
Notes
config.json example as follows {
- “server”: {
“ip” : “127.0.0.1”, “port”: “3002”, “world_size”: 3, “rank”: 0
}, “client_0”: {
“ip”: “127.0.0.1”, “port”: “3002”, “world_size”: 3, “rank”: 1
}, “client_1”: {
“ip”: “127.0.0.1”, “port”: “3002”, “world_size”: 3, “rank”: 2
}
}
- get_best_gpu()#
Return gpu (
torch.device
) with largest free memory.
- partition_report(targets, data_indices, class_num=None, verbose=True, file=None)#
Generate data partition report for clients in
data_indices
.Generate data partition report for each client according to
data_indices
, including ratio of each class and dataset size in current client. Report can be printed in screen or into file. The output format is comma-separated values which can be read bypandas.read_csv()
orcsv.reader()
.- 参数
targets (list or numpy.ndarray) – Targets for all data samples, with each element is in range of
0
toclass_num-1
.data_indices (dict) – Dict of
client_id: [data indices]
.class_num (int, optional) – Total number of classes. If set to
None
, thenclass_num = max(targets) + 1
.verbose (bool, optional) – Whether print data partition report in screen. Default as
True
.file (str, optional) – Output file name of data partition report. If
None
, then no output in file. Default asNone
.
Examples
First generate synthetic data labels and data partition to obtain
data_indices
({ client_id: sample indices}
):>>> sample_num = 15 >>> class_num = 4 >>> clients_num = 3 >>> num_per_client = int(sample_num/clients_num) >>> labels = np.random.randint(class_num, size=sample_num) # generate 15 labels, each label is 0 to 3 >>> rand_per = np.random.permutation(sample_num) >>> # partition synthetic data into 3 clients >>> data_indices = {0: rand_per[0:num_per_client], ... 1: rand_per[num_per_client:num_per_client*2], ... 2: rand_per[num_per_client*2:num_per_client*3]}
Check
data_indices
may look like:>>> data_indices {0: array([8, 6, 5, 7, 2]), 1: array([ 3, 10, 14, 4, 1]), 2: array([13, 9, 12, 11, 0])}
Now generate partition report for each client and each class:
>>> partition_report(labels, data_indices, class_num=class_num, verbose=True, file=None) Class frequencies: client,class0,class1,class2,class3,Amount Client 0,0.200,0.00,0.200,0.600,5 Client 1,0.400,0.200,0.200,0.200,5 Client 2,0.00,0.400,0.400,0.200,5
logger#
Module Contents#
record cmd info to file and print it to cmd at the same time |
message_code#
Module Contents#
Different types of messages between client and server that we support go here. |
serialization#
Module Contents#
- class SerializationTool#
Bases:
object
- static serialize_model_gradients(model: torch.nn.Module) torch.Tensor #
_summary_
- 参数
model (torch.nn.Module) – _description_
- 返回
_description_
- 返回类型
- static deserialize_model_gradients(model: torch.nn.Module, gradients: torch.Tensor)#
- static serialize_model(model: torch.nn.Module) torch.Tensor #
Unfold model parameters
Unfold every layer of model, concate all of tensors into one. Return a torch.Tensor with shape (size, ).
- 参数
model (torch.nn.Module) – model to serialize.
- static deserialize_model(model: torch.nn.Module, serialized_parameters: torch.Tensor, mode='copy')#
Assigns serialized parameters to model.parameters. This is done by iterating through
model.parameters()
and assigning the relevant params ingrad_update
. NOTE: this function manipulatesmodel.parameters
.- 参数
model (torch.nn.Module) – model to deserialize.
serialized_parameters (torch.Tensor) – serialized model parameters.
mode (str) – deserialize mode. “copy” or “add”.
Package Contents#
Define the algorithm of parameters aggregation |
|
record cmd info to file and print it to cmd at the same time |
|
Different types of messages between client and server that we support go here. |
|
- class Aggregators#
Bases:
object
Define the algorithm of parameters aggregation
- static fedavg_aggregate(serialized_params_list, weights=None)#
FedAvg aggregator
Paper: http://proceedings.mlr.press/v54/mcmahan17a.html
- 参数
serialized_params_list (list[torch.Tensor])) – Merge all tensors following FedAvg.
weights (list, numpy.array or torch.Tensor, optional) – Weights for each params, the length of weights need to be same as length of
serialized_params_list
- 返回
torch.Tensor
- static fedasync_aggregate(server_param, new_param, alpha)#
FedAsync aggregator
- class Logger(log_name=None, log_file=None)#
Bases:
object
record cmd info to file and print it to cmd at the same time
- info(log_str)#
Print information to logger
- warning(warning_str)#
Print warning to logger
- class MessageCode#
Bases:
enum.Enum
Different types of messages between client and server that we support go here.
- ParameterRequest = 0#
- GradientUpdate = 1#
- ParameterUpdate = 2#
- EvaluateParams = 3#
- Exit = 4#
- SetUp = 5#
- Activation = 6#
- class SerializationTool#
Bases:
object
- static serialize_model_gradients(model: torch.nn.Module) torch.Tensor #
_summary_
- 参数
model (torch.nn.Module) – _description_
- 返回
_description_
- 返回类型
- static deserialize_model_gradients(model: torch.nn.Module, gradients: torch.Tensor)#
- static serialize_model(model: torch.nn.Module) torch.Tensor #
Unfold model parameters
Unfold every layer of model, concate all of tensors into one. Return a torch.Tensor with shape (size, ).
- 参数
model (torch.nn.Module) – model to serialize.
- static deserialize_model(model: torch.nn.Module, serialized_parameters: torch.Tensor, mode='copy')#
Assigns serialized parameters to model.parameters. This is done by iterating through
model.parameters()
and assigning the relevant params ingrad_update
. NOTE: this function manipulatesmodel.parameters
.- 参数
model (torch.nn.Module) – model to deserialize.
serialized_parameters (torch.Tensor) – serialized model parameters.
mode (str) – deserialize mode. “copy” or “add”.
Package Contents#
- __version__ = 1.3.0_alpha#
- 1
Created with sphinx-autoapi
引用#
如果 FedLab 对您的研究有帮助,请在您的发表中引用 FedLab:
@article{smile2021fedlab,
title={FedLab: A Flexible Federated Learning Framework},
author={Dun Zeng, Siqi Liang, Xiangjing Hu and Zenglin Xu},
journal={arXiv preprint arXiv:2107.11621},
year={2021}
}
联系方式#
请通过邮件或者GitHub issues联系 FedLab 开发团队:
Dun Zeng: zengdun@foxmail.com
Siqi Liang: zszxlsq@gmail.com