functional#
Module Contents#
|
Splice the sample index list given number of each client. |
|
Assign same sample sample for each client. |
|
Assign different sample number for each client using Log-Normal distribution. |
|
Assign different sample number for each client using Dirichlet distribution. |
|
Partition data indices in IID way given sample numbers for each clients. |
|
Non-iid partition based on Dirichlet distribution. The method is from "hetero-dir" partition of |
|
Non-iid partition used in FedAvg paper. |
|
Non-iid Dirichlet partition. |
|
Label-skew:quantity-based partition. |
Feature-distribution-skew:synthetic partition. |
|
|
Return sample count for all clients in |
|
Slice a dataset for non-IID. |
|
Slice a dataset randomly and equally for IID. |
- split_indices(num_cumsum, rand_perm)#
Splice the sample index list given number of each client.
- balance_split(num_clients, num_samples)#
Assign same sample sample for each client.
- Parameters:
- Returns:
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- Return type:
- lognormal_unbalance_split(num_clients, num_samples, unbalance_sgm)#
Assign different sample number for each client using Log-Normal distribution.
Sample numbers for clients are drawn from Log-Normal distribution.
- Parameters:
- Returns:
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- Return type:
- dirichlet_unbalance_split(num_clients, num_samples, alpha)#
Assign different sample number for each client using Dirichlet distribution.
Sample numbers for clients are drawn from Dirichlet distribution.
- Parameters:
- Returns:
A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.- Return type:
- homo_partition(client_sample_nums, num_samples)#
Partition data indices in IID way given sample numbers for each clients.
- Parameters:
client_sample_nums (numpy.ndarray) – Sample numbers for each clients.
num_samples (int) – Number of samples.
- Returns:
{ client_id: indices}
.- Return type:
- hetero_dir_partition(targets, num_clients, num_classes, dir_alpha, min_require_size=None)#
Non-iid partition based on Dirichlet distribution. The method is from “hetero-dir” partition of Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging.
This method simulates heterogeneous partition for which number of data points and class proportions are unbalanced. Samples will be partitioned into \(J\) clients by sampling \(p_k \sim \text{Dir}_{J}({\alpha})\) and allocating a \(p_{p,j}\) proportion of the samples of class \(k\) to local client \(j\).
Sample number for each client is decided in this function.
- Parameters:
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
num_classes (int) – Number of classes in samples.
dir_alpha (float) – Parameter alpha for Dirichlet distribution.
min_require_size (int, optional) – Minimum required sample number for each client. If set to
None
, then equals tonum_classes
.
- Returns:
{ client_id: indices}
.- Return type:
- shards_partition(targets, num_clients, num_shards)#
Non-iid partition used in FedAvg paper.
- Parameters:
targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.
num_clients (int) – Number of clients for partition.
num_shards (int) – Number of shards in partition.
- Returns:
{ client_id: indices}
.- Return type:
- client_inner_dirichlet_partition(targets, num_clients, num_classes, dir_alpha, client_sample_nums, verbose=True)#
Non-iid Dirichlet partition.
The method is from The method is from paper Federated Learning Based on Dynamic Regularization. This function can be used by given specific sample number for all clients
client_sample_nums
. It’s different fromhetero_dir_partition()
.- Parameters:
targets (list or numpy.ndarray) – Sample targets.
num_clients (int) – Number of clients for partition.
num_classes (int) – Number of classes in samples.
dir_alpha (float) – Parameter alpha for Dirichlet distribution.
client_sample_nums (numpy.ndarray) – A numpy array consisting
num_clients
integer elements, each represents sample number of corresponding clients.verbose (bool, optional) – Whether to print partition process. Default as
True
.
- Returns:
{ client_id: indices}
.- Return type:
- label_skew_quantity_based_partition(targets, num_clients, num_classes, major_classes_num)#
Label-skew:quantity-based partition.
For details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.
- fcube_synthetic_partition(data)#
Feature-distribution-skew:synthetic partition.
Synthetic partition for FCUBE dataset. This partition is from Federated Learning on Non-IID Data Silos: An Experimental Study.
- Parameters:
data (np.ndarray) – Data of dataset
FCUBE
.- Returns:
{ client_id: indices}
.- Return type:
- samples_num_count(client_dict, num_clients)#
Return sample count for all clients in
client_dict
.
- noniid_slicing(dataset, num_clients, num_shards)#
Slice a dataset for non-IID.
- Parameters:
dataset (torch.utils.data.Dataset) – Dataset to slice.
num_clients (int) – Number of client.
num_shards (int) – Number of shards.
Notes
The size of a shard equals to
int(len(dataset)/num_shards)
. Each client will getint(num_shards/num_clients)
shards.- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }
- random_slicing(dataset, num_clients)#
Slice a dataset randomly and equally for IID.
- Args:
dataset (torch.utils.data.Dataset): a dataset for slicing. num_clients (int): the number of client.
- Returns:
dict:
{ 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }