functional#

Module Contents#

split_indices(num_cumsum, rand_perm)

Splice the sample index list given number of each client.

balance_split(num_clients, num_samples)

Assign same sample sample for each client.

lognormal_unbalance_split(num_clients, num_samples, ...)

Assign different sample number for each client using Log-Normal distribution.

dirichlet_unbalance_split(num_clients, num_samples, alpha)

Assign different sample number for each client using Dirichlet distribution.

homo_partition(client_sample_nums, num_samples)

Partition data indices in IID way given sample numbers for each clients.

hetero_dir_partition(targets, num_clients, ...[, ...])

Non-iid partition based on Dirichlet distribution. The method is from "hetero-dir" partition of

shards_partition(targets, num_clients, num_shards)

Non-iid partition used in FedAvg paper.

client_inner_dirichlet_partition(targets, num_clients, ...)

Non-iid Dirichlet partition.

label_skew_quantity_based_partition(targets, ...)

Label-skew:quantity-based partition.

fcube_synthetic_partition(data)

Feature-distribution-skew:synthetic partition.

samples_num_count(client_dict, num_clients)

Return sample count for all clients in client_dict.

noniid_slicing(dataset, num_clients, num_shards)

Slice a dataset for non-IID.

random_slicing(dataset, num_clients)

Slice a dataset randomly and equally for IID.

split_indices(num_cumsum, rand_perm)#

Splice the sample index list given number of each client.

Parameters:
  • num_cumsum (np.ndarray) – Cumulative sum of sample number for each client.

  • rand_perm (list) – List of random sample index.

Returns:

{ client_id: indices}.

Return type:

dict

balance_split(num_clients, num_samples)#

Assign same sample sample for each client.

Parameters:
  • num_clients (int) – Number of clients for partition.

  • num_samples (int) – Total number of samples.

Returns:

A numpy array consisting num_clients integer elements, each represents sample number of corresponding clients.

Return type:

numpy.ndarray

lognormal_unbalance_split(num_clients, num_samples, unbalance_sgm)#

Assign different sample number for each client using Log-Normal distribution.

Sample numbers for clients are drawn from Log-Normal distribution.

Parameters:
  • num_clients (int) – Number of clients for partition.

  • num_samples (int) – Total number of samples.

  • unbalance_sgm (float) – Log-normal variance. When equals to 0, the partition is equal to balance_partition().

Returns:

A numpy array consisting num_clients integer elements, each represents sample number of corresponding clients.

Return type:

numpy.ndarray

dirichlet_unbalance_split(num_clients, num_samples, alpha)#

Assign different sample number for each client using Dirichlet distribution.

Sample numbers for clients are drawn from Dirichlet distribution.

Parameters:
  • num_clients (int) – Number of clients for partition.

  • num_samples (int) – Total number of samples.

  • alpha (float) – Dirichlet concentration parameter

Returns:

A numpy array consisting num_clients integer elements, each represents sample number of corresponding clients.

Return type:

numpy.ndarray

homo_partition(client_sample_nums, num_samples)#

Partition data indices in IID way given sample numbers for each clients.

Parameters:
  • client_sample_nums (numpy.ndarray) – Sample numbers for each clients.

  • num_samples (int) – Number of samples.

Returns:

{ client_id: indices}.

Return type:

dict

hetero_dir_partition(targets, num_clients, num_classes, dir_alpha, min_require_size=None)#

Non-iid partition based on Dirichlet distribution. The method is from “hetero-dir” partition of Bayesian Nonparametric Federated Learning of Neural Networks and Federated Learning with Matched Averaging.

This method simulates heterogeneous partition for which number of data points and class proportions are unbalanced. Samples will be partitioned into \(J\) clients by sampling \(p_k \sim \text{Dir}_{J}({\alpha})\) and allocating a \(p_{p,j}\) proportion of the samples of class \(k\) to local client \(j\).

Sample number for each client is decided in this function.

Parameters:
  • targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.

  • num_clients (int) – Number of clients for partition.

  • num_classes (int) – Number of classes in samples.

  • dir_alpha (float) – Parameter alpha for Dirichlet distribution.

  • min_require_size (int, optional) – Minimum required sample number for each client. If set to None, then equals to num_classes.

Returns:

{ client_id: indices}.

Return type:

dict

shards_partition(targets, num_clients, num_shards)#

Non-iid partition used in FedAvg paper.

Parameters:
  • targets (list or numpy.ndarray) – Sample targets. Unshuffled preferred.

  • num_clients (int) – Number of clients for partition.

  • num_shards (int) – Number of shards in partition.

Returns:

{ client_id: indices}.

Return type:

dict

client_inner_dirichlet_partition(targets, num_clients, num_classes, dir_alpha, client_sample_nums, verbose=True)#

Non-iid Dirichlet partition.

The method is from The method is from paper Federated Learning Based on Dynamic Regularization. This function can be used by given specific sample number for all clients client_sample_nums. It’s different from hetero_dir_partition().

Parameters:
  • targets (list or numpy.ndarray) – Sample targets.

  • num_clients (int) – Number of clients for partition.

  • num_classes (int) – Number of classes in samples.

  • dir_alpha (float) – Parameter alpha for Dirichlet distribution.

  • client_sample_nums (numpy.ndarray) – A numpy array consisting num_clients integer elements, each represents sample number of corresponding clients.

  • verbose (bool, optional) – Whether to print partition process. Default as True.

Returns:

{ client_id: indices}.

Return type:

dict

label_skew_quantity_based_partition(targets, num_clients, num_classes, major_classes_num)#

Label-skew:quantity-based partition.

For details, please check Federated Learning on Non-IID Data Silos: An Experimental Study.

Parameters:
  • targets (List or np.ndarray) – Labels od dataset.

  • num_clients (int) – Number of clients.

  • num_classes (int) – Number of unique classes.

  • major_classes_num (int) – Number of classes for each client, should be less then num_classes.

Returns:

{ client_id: indices}.

Return type:

dict

fcube_synthetic_partition(data)#

Feature-distribution-skew:synthetic partition.

Synthetic partition for FCUBE dataset. This partition is from Federated Learning on Non-IID Data Silos: An Experimental Study.

Parameters:

data (np.ndarray) – Data of dataset FCUBE.

Returns:

{ client_id: indices}.

Return type:

dict

samples_num_count(client_dict, num_clients)#

Return sample count for all clients in client_dict.

Parameters:
  • client_dict (dict) – Data partition result for different clients.

  • num_clients (int) – Total number of clients.

Returns:

pandas.DataFrame

noniid_slicing(dataset, num_clients, num_shards)#

Slice a dataset for non-IID.

Parameters:

Notes

The size of a shard equals to int(len(dataset)/num_shards). Each client will get int(num_shards/num_clients) shards.

Returns:

dict: { 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }

random_slicing(dataset, num_clients)#

Slice a dataset randomly and equally for IID.

Args:

dataset (torch.utils.data.Dataset): a dataset for slicing. num_clients (int): the number of client.

Returns:

dict: { 0: indices of dataset, 1: indices of dataset, ..., k: indices of dataset }