partition#
Module Contents#
Base class for data partition in federated learning. |
|
CIFAR10 data partitioner. |
|
CIFAR100 data partitioner. |
|
Basic data partitioner. |
|
Data partitioner for vision data. |
|
Data partitioner for MNIST. |
|
Data partitioner for FashionMNIST. |
|
Data partitioner for SVHN. |
|
FCUBE data partitioner. |
|
Data partitioner for Adult. |
|
Data partitioner for RCV1. |
|
Data partitioner for Covtype. |
- class DataPartitioner#
Bases:
abc.ABC
Base class for data partition in federated learning.
Examples of
DataPartitioner
:BasicPartitioner
,CIFAR10Partitioner
.Details and tutorials of different data partition and datasets, please check Federated Dataset and DataPartitioner.
- abstract _perform_partition()#
- abstract __getitem__(index)#
- abstract __len__()#
- class CIFAR10Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
CIFAR10 data partitioner.
Partition CIFAR10 given specific client number. Currently 6 supported partition schemes can be achieved by passing different combination of parameters in initialization:
balance=None
partition="dirichlet"
: non-iid partition used in Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging. Refer tofedlab.utils.dataset.functional.hetero_dir_partition()
for more information.partition="shards"
: non-iid method used in FedAvg paper. Refer tofedlab.utils.dataset.functional.shards_partition()
for more information.
balance=True
: “Balance” refers to FL scenario that sample numbers for different clients are the same. Refer tofedlab.utils.dataset.functional.balance_partition()
for more information.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
balance=False
: “Unbalance” refers to FL scenario that sample numbers for different clients are different. For unbalance method, sample number for each client is drown from Log-Normal distribution with varianceunbalanced_sgm
. Whenunbalanced_sgm=0
, partition is balanced. Refer tofedlab.utils.dataset.functional.lognormal_unbalance_partition()
for more information. The method is from paper Federated Learning Based on Dynamic Regularization.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
For detail usage, please check Federated Dataset and DataPartitioner.
- Parameters:
targets (list or numpy.ndarray) – Targets of dataset for partition. Each element is in range of [0, 1, …, 9].
num_clients (int) – Number of clients for data partition.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int, optional) – Random seed. Default as
None
.
- num_classes = 10#
- _perform_partition()#
- __getitem__(index)#
Obtain sample indices for client
index
.
- __len__()#
Usually equals to number of clients.
- class CIFAR100Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, min_require_size=None, seed=None)#
Bases:
CIFAR10Partitioner
CIFAR100 data partitioner.
This is a subclass of the
CIFAR10Partitioner
. For details, please check Federated Dataset and DataPartitioner.- num_classes = 100#
- class BasicPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
DataPartitioner
Basic data partitioner.
Basic data partitioner, supported partition:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study and Federated Dataset and DataPartitioner.
- Parameters:
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
. Only works ifpartition="noniid-labeldir"
.seed (int) – Random seed. Default as
None
.
- Returns:
{ client_id: indices}
.- Return type:
- num_classes = 2#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class VisionPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
BasicPartitioner
Data partitioner for vision data.
Supported partition for vision data:
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
For more details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.
- Parameters:
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
partition (str) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float) – Parameter alpha for Dirichlet distribution. Only works if
partition="noniid-labeldir"
.major_classes_num (int) – Number of major class for each clients. Only works if
partition="noniid-#label"
.verbose (bool) – Whether output intermediate information. Default as
True
.seed (int) – Random seed. Default as
None
.
- Returns:
{ client_id: indices}
.- Return type:
- num_classes = 10#
- class MNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for MNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner.- num_features = 784#
- class FMNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for FashionMNIST.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 784#
- class SVHNPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
Data partitioner for SVHN.
For details, please check
VisionPartitioner
and Federated Dataset and DataPartitioner- num_features = 1024#
- class FCUBEPartitioner(data, partition)#
Bases:
DataPartitioner
FCUBE data partitioner.
FCUBE is a synthetic dataset for research in non-IID scenario with feature imbalance. This dataset and its partition methods are proposed in Federated Learning on Non-IID Data Silos: An Experimental Study.
Supported partition methods for FCUBE:
feature-distribution-skew:synthetic
IID
For more details, please refer to Section (IV-B-b) of original paper. For detailed usage, please check Federated Dataset and DataPartitioner.
- Parameters:
data (numpy.ndarray) – Data of dataset
FCUBE
.partition (str) – Partition type. Only supports ‘synthetic’ and ‘iid’.
- num_classes = 2#
- num_clients = 4#
- _perform_partition()#
- __getitem__(index)#
- __len__()#
- class AdultPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Adult.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 123#
- num_classes = 2#
- class RCV1Partitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for RCV1.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 47236#
- num_classes = 2#
- class CovtypePartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, min_require_size=None, seed=None)#
Bases:
BasicPartitioner
Data partitioner for Covtype.
For details, please check
BasicPartitioner
and Federated Dataset and DataPartitioner- num_features = 54#
- num_classes = 2#