dataset#

Package Contents#

SubsetSampler

Samples elements from a given list of indices, always in the same order once initialized.

RawPartitionSampler

Partition dataset according to num_replicas.

DictFileSampler

Get data sample indices given client id from data file with dict.

DataPartitioner

Base class for data partition in federated learning.

BasicPartitioner

  • label-distribution-skew:quantity-based

VisionPartitioner

  • label-distribution-skew:quantity-based

CIFAR10Partitioner

CIFAR10 data partitioner.

CIFAR100Partitioner

CIFAR100 data partitioner.

FMNISTPartitioner

  • label-distribution-skew:quantity-based

MNISTPartitioner

  • label-distribution-skew:quantity-based

SVHNPartitioner

  • label-distribution-skew:quantity-based

FCUBEPartitioner

FCUBE data partitioner.

AdultPartitioner

  • label-distribution-skew:quantity-based

RCV1Partitioner

  • label-distribution-skew:quantity-based

CovtypePartitioner

  • label-distribution-skew:quantity-based

noniid_slicing(dataset, num_clients, num_shards)

Slice a dataset for non-IID.

random_slicing(dataset, num_clients)

Slice a dataset randomly and equally for IID.

class SubsetSampler(indices, shuffle=False)#

Bases: torch.utils.data.Sampler

Samples elements from a given list of indices, always in the same order once initialized.

It is a Sampler used in Dataloader, that each partition will be fixed once initialized.

Parameters
  • indices (list[int]) – Indices in the whole set selected for subset

  • shuffle (bool) – shuffle the indices or not.

__iter__(self)#
__len__(self)#
class RawPartitionSampler(dataset, client_id, num_replicas=None)#

Bases: torch.utils.data.Sampler

Partition dataset according to num_replicas.

Every client get a equal shard of dataset.

Parameters
  • dataset (torch.utils.data.Dataset) –

  • client_id (int) –

  • num_replicas (int, optional) – Number of data replications. Default None means total number of client processes.

__iter__(self)#
__len__(self)#
class DictFileSampler(dict_file, client_id)#

Bases: torch.utils.data.Sampler

Get data sample indices given client id from data file with dict.

__iter__(self)#
__len__(self)#
noniid_slicing(dataset, num_clients, num_shards)#

Slice a dataset for non-IID.

Parameters

Notes

The size of a shard equals to int(len(dataset)/num_shards). Each client will get int(num_shards/num_clients) shards.

Returns:

dict: { 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }

random_slicing(dataset, num_clients)#

Slice a dataset randomly and equally for IID.

Args:

dataset (torch.utils.data.Dataset): a dataset for slicing. num_clients (int): the number of client.

Returns:

dict: { 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }

class DataPartitioner#

Bases: abc.ABC

Base class for data partition in federated learning.

abstract _perform_partition(self)#
abstract __getitem__(self, index)#
abstract __len__(self)#
class BasicPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#

Bases: DataPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_classes = 2#
_perform_partition(self)#
__getitem__(self, index)#
__len__(self)#
class VisionPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#

Bases: BasicPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_classes = 10#
class CIFAR10Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None)#

Bases: DataPartitioner

CIFAR10 data partitioner.

Partition CIFAR10 given specific client number. Currently 6 supported partition schemes can be achieved by passing different combination of parameters in initialization:

Parameters
  • targets (list or numpy.ndarray) – Targets of dataset for partition. Each element is in range of [0, 1, …, 9].

  • num_clients (int) – Number of clients for data partition.

  • balance (bool, optional) – Balanced partition over all clients or not. Default as True.

  • partition (str, optional) – Partition type, only "iid", shards, "dirichlet" are supported. Default as "iid".

  • unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as 0 for balanced partition.

  • num_shards (int, optional) – Number of shards in non-iid "shards" partition. Only works if partition="shards". Default as None.

  • dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if partition="dirichlet". Default as None.

  • verbose (bool, optional) – Whether to print partition process. Default as True.

  • seed (int, optional) – Random seed. Default as None.

num_classes = 10#
_perform_partition(self)#
__getitem__(self, index)#

Obtain sample indices for client index.

Parameters

index (int) – Client ID.

Returns

List of sample indices for client ID index.

Return type

list

__len__(self)#

Usually equals to number of clients.

class CIFAR100Partitioner(targets, num_clients, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None)#

Bases: CIFAR10Partitioner

CIFAR100 data partitioner.

This is a subclass of the CIFAR10Partitioner.

num_classes = 100#
class FMNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#

Bases: VisionPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 784#
class MNISTPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#

Bases: VisionPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 784#
class SVHNPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=None, verbose=True, seed=None)#

Bases: VisionPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 1024#
class FCUBEPartitioner(data, partition)#

Bases: DataPartitioner

FCUBE data partitioner.

FCUBE is a synthetic dataset for research in non-IID scenario with feature imbalance. This dataset and its partition methods are proposed in Federated Learning on Non-IID Data Silos: An Experimental Study.

Supported partition methods for FCUBE:

  • feature-distribution-skew:synthetic

  • IID

For more details, please refer to Section (IV-B-b) of original paper.

Parameters

data (numpy.ndarray) – Data of dataset FCUBE.

num_classes = 2#
num_clients = 4#
_perform_partition(self)#
__getitem__(self, index)#
__len__(self)#
class AdultPartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#

Bases: BasicPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 123#
num_classes = 2#
class RCV1Partitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#

Bases: BasicPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 47236#
num_classes = 2#
class CovtypePartitioner(targets, num_clients, partition='iid', dir_alpha=None, major_classes_num=1, verbose=True, seed=None)#

Bases: BasicPartitioner

  • label-distribution-skew:quantity-based

  • label-distribution-skew:distributed-based (Dirichlet)

  • quantity-skew (Dirichlet)

  • IID

Parameters
  • targets

  • num_clients

  • partition

  • dir_alpha

  • major_classes_num

  • verbose

  • seed

num_features = 54#
num_classes = 2#