dataset#
Package Contents#
Samples elements from a given list of indices, always in the same order once initialized. |
|
Partition dataset according to |
|
Get data sample indices given client id from data file with dict. |
|
Base class for data partition in federated learning. |
|
|
|
|
|
CIFAR10 data partitioner. |
|
CIFAR100 data partitioner. |
|
|
|
|
|
|
|
FCUBE data partitioner. |
|
|
|
|
|
|
|
Slice a dataset for non-IID. |
|
Slice a dataset randomly and equally for IID. |
- class SubsetSampler(indices, shuffle=False)#
Bases:
torch.utils.data.Sampler
Samples elements from a given list of indices, always in the same order once initialized.
It is a
Sampler
used inDataloader
, that each partition will be fixed once initialized.- Parameters
- __iter__(self)#
- __len__(self)#
- class RawPartitionSampler(dataset, client_id, num_replicas=None)#
Bases:
torch.utils.data.Sampler
Partition dataset according to
num_replicas
.Every client get a equal shard of dataset.
- Parameters
dataset (torch.utils.data.Dataset) –
client_id (int) –
num_replicas (int, optional) – Number of data replications. Default
None
means total number of client processes.
- __iter__(self)#
- __len__(self)#
- class DictFileSampler(dict_file, client_id)#
Bases:
torch.utils.data.Sampler
Get data sample indices given client id from data file with dict.
- __iter__(self)#
- __len__(self)#
- noniid_slicing(dataset, num_clients, num_shards)#
Slice a dataset for non-IID.
- Parameters
dataset (torch.utils.data.Dataset) – Dataset to slice.
num_clients (int) – Number of client.
num_shards (int) – Number of shards.
Notes
The size of a shard equals to
int(len(dataset)/num_shards)
. Each client will getint(num_shards/num_clients)
shards.- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }
- random_slicing(dataset, num_clients)#
Slice a dataset randomly and equally for IID.
- Args:
dataset (torch.utils.data.Dataset): a dataset for slicing. num_clients (int): the number of client.
- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }
- class DataPartitioner#
Bases:
abc.ABC
Base class for data partition in federated learning.
- abstract _perform_partition(self)#
- abstract __getitem__(self, index)#
- abstract __len__(self)#
- class BasicPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#
Bases:
DataPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_classes = 2#
- _perform_partition(self)#
- __getitem__(self, index)#
- __len__(self)#
- class VisionPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
BasicPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_classes = 10#
- class CIFAR10Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None)#
Bases:
DataPartitioner
CIFAR10 data partitioner.
Partition CIFAR10 given specific client number. Currently 6 supported partition schemes can be achieved by passing different combination of parameters in initialization:
balance=None
partition="dirichlet"
: non-iid partition used in Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging. Refer tofedlab.utils.dataset.functional.hetero_dir_partition()
for more information.partition="shards"
: non-iid method used in FedAvg paper. Refer tofedlab.utils.dataset.functional.shards_partition()
for more information.
balance=True
: “Balance” refers to FL scenario that sample numbers for different clients are the same. Refer tofedlab.utils.dataset.functional.balance_partition()
for more information.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
balance=False
: “Unbalance” refers to FL scenario that sample numbers for different clients are different. For unbalance method, sample number for each client is drown from Log-Normal distribution with varianceunbalanced_sgm
. Whenunbalanced_sgm=0
, partition is balanced. Refer tofedlab.utils.dataset.functional.lognormal_unbalance_partition()
for more information. The method is from paper Federated Learning Based on Dynamic Regularization.partition="iid"
: Random select samples from complete dataset given sample number for each client.partition="dirichlet"
: Refer tofedlab.utils.dataset.functional.client_inner_dirichlet_partition()
for more information.
- Parameters
targets (list or numpy.ndarray) – Targets of dataset for partition. Each element is in range of [0, 1, …, 9].
num_clients (int) – Number of clients for data partition.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.
- num_classes = 10#
- _perform_partition(self)#
- __getitem__(self, index)#
Obtain sample indices for client
index
.
- __len__(self)#
Usually equals to number of clients.
- class CIFAR100Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None)#
Bases:
CIFAR10Partitioner
CIFAR100 data partitioner.
This is a subclass of the
CIFAR10Partitioner
.- num_classes = 100#
- class FMNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 784#
- class MNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 784#
- class SVHNPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#
Bases:
VisionPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 1024#
- class FCUBEPartitioner(data, partition)#
Bases:
DataPartitioner
FCUBE data partitioner.
FCUBE is a synthetic dataset for research in non-IID scenario with feature imbalance. This dataset and its partition methods are proposed in Federated Learning on Non-IID Data Silos: An Experimental Study.
Supported partition methods for FCUBE:
feature-distribution-skew:synthetic
IID
For more details, please refer to Section (IV-B-b) of original paper.
- Parameters
data (numpy.ndarray) – Data of dataset
FCUBE
.
- num_classes = 2#
- num_clients = 4#
- _perform_partition(self)#
- __getitem__(self, index)#
- __len__(self)#
- class AdultPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#
Bases:
BasicPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 123#
- num_classes = 2#
- class RCV1Partitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#
Bases:
BasicPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 47236#
- num_classes = 2#
- class CovtypePartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#
Bases:
BasicPartitioner
label-distribution-skew:quantity-based
label-distribution-skew:distributed-based (Dirichlet)
quantity-skew (Dirichlet)
IID
- Parameters
targets –
num_clients –
partition –
dir_alpha –
major_classes_num –
verbose –
seed –
- num_features = 54#
- num_classes = 2#