dataset#
Package Contents#
Base dataset iterator |
|
For data subset with different augmentation for different client. |
|
FCUBE data set. |
|
Covtype binary dataset from LIBSVM Data. |
|
RCV1 binary dataset from LIBSVM Data. |
|
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com |
|
Rotate MNIST and partition them. |
|
Rotate CIFAR10 and patrition them. |
|
|
|
|
|
- class FedDataset#
Bases:
object
- preprocess()#
Define the dataset partition process
- abstract get_dataset(id, type='train')#
Get dataset class
- Parameters:
- Raises:
- abstract get_dataloader(id, batch_size, type='train')#
Get data loader
- __len__()#
- class BaseDataset(x, y)#
Bases:
torch.utils.data.Dataset
Base dataset iterator
- __len__()#
- __getitem__(index)#
- class Subset(dataset, indices, transform=None, target_transform=None)#
Bases:
torch.utils.data.Dataset
For data subset with different augmentation for different client.
- Parameters:
dataset (Dataset) – The whole Dataset
indices (List[int]) – Indices of sub-dataset to achieve from
dataset
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- __getitem__(index)#
Get item
- Parameters:
index (int) – index
- Returns:
(image, target) where target is index of the target class.
- __len__()#
- class FCUBE(root, train=True, generate=True, transform=None, target_transform=None, num_samples=4000)#
Bases:
torch.utils.data.Dataset
FCUBE data set.
From paper Federated Learning on Non-IID Data Silos: An Experimental Study.
- Parameters:
root (str) – Root for data file.
train (bool, optional) – Training set or test set. Default as
True
.generate (bool, optional) – Whether to generate synthetic dataset. If
True
, then generate new synthetic FCUBE data even existed. Default asTrue
.transform (callable, optional) – A function/transform that takes in an
numpy.ndarray
and returns a transformed version.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
num_samples (int, optional) – Total number of samples to generate. We suggest to use 4000 for training set, and 1000 for test set. Default is
4000
for trainset.
- train_files#
- test_files#
- num_clients = 4#
- _generate_train()#
- _generate_test()#
- _save_data()#
- __len__()#
- class Covtype(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
Covtype binary dataset from LIBSVM Data.
- Parameters:
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 54#
- url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/covtype.libsvm.binary.bz2'#
- source_file_name = 'covtype.libsvm.binary.bz2'#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
- class RCV1(root, train=True, train_ratio=0.75, transform=None, target_transform=None, download=False, generate=False, seed=None)#
Bases:
torch.utils.data.Dataset
RCV1 binary dataset from LIBSVM Data.
- Parameters:
root (str) – Root directory of raw dataset to download if
download
is set toTrue
.train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. Default as
None
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it. Default as
None
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- num_classes = 2#
- num_features = 47236#
- url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2'#
- source_file_name = 'rcv1_train.binary.bz2'#
- download()#
- generate()#
- _local_npy_existence()#
- _local_source_file_existence()#
- __getitem__(index)#
- __len__()#
- class PathologicalMNIST(root, path, num_clients=100, shards=200, download=True, preprocess=False)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
The partition stratigy in FedAvg. See http://proceedings.mlr.press/v54/mcmahan17a?ref=https://githubhelp.com
- Parameters:
- preprocess(download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class RotatedMNIST(root, path, num)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate MNIST and partition them.
- Parameters:
- preprocess(thetas=[0, 90, 180, 270], download=True)#
Define the dataset partition process
- get_dataset(id, type='train')#
Get dataset class
- Parameters:
- Raises:
- get_data_loader(id, batch_size=None, type='train')#
- class RotatedCIFAR10(root, save_dir, num_clients)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
Rotate CIFAR10 and patrition them.
- Parameters:
- preprocess(shards, thetas=[0, 180])#
_summary_
- Parameters:
shards (_type_) – _description_
thetas (list, optional) – _description_. Defaults to [0, 180].
- get_dataset(id, type='train')#
Get dataset class
- Parameters:
- Raises:
- get_data_loader(id, batch_size=None, type='train')#
- class PartitionedMNIST(root, path, num_clients, download=True, preprocess=False, partition='iid', dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- Parameters:
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
partition (str, optional) – Partition name. Only supports
"noniid-#label"
,"noniid-labeldir"
,"unbalance"
and"iid"
partition schemes.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(partition='iid', dir_alpha=None, verbose=True, seed=None, download=True, transform=None, target_transform=None)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class PartitionedCIFAR10(root, path, dataname, num_clients, download=True, preprocess=False, balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, transform=None, target_transform=None)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
FedDataset
with partitioning preprocess. For detailed partitioning, please check Federated Dataset and DataPartitioner.- Parameters:
root (str) – Path to download raw dataset.
path (str) – Path to save partitioned subdataset.
dataname (str) – “cifar10” or “cifar100”
num_clients (int) – Number of clients.
download (bool) – Whether to download the raw dataset.
preprocess (bool) – Whether to preprocess the dataset.
balance (bool, optional) – Balanced partition over all clients or not. Default as
True
.partition (str, optional) – Partition type, only
"iid"
,shards
,"dirichlet"
are supported. Default as"iid"
.unbalance_sgm (float, optional) – Log-normal distribution variance for unbalanced data partition over clients. Default as
0
for balanced partition.num_shards (int, optional) – Number of shards in non-iid
"shards"
partition. Only works ifpartition="shards"
. Default asNone
.dir_alpha (float, optional) – Dirichlet distribution parameter for non-iid partition. Only works if
partition="dirichlet"
. Default asNone
.verbose (bool, optional) – Whether to print partition process. Default as
True
.seed (int, optional) – Random seed. Default as
None
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- preprocess(balance=True, partition='iid', unbalance_sgm=0, num_shards=None, dir_alpha=None, verbose=True, seed=None, download=True)#
Perform FL partition on the dataset, and save each subset for each client into
data{cid}.pkl
file.For details of partition schemes, please check Federated Dataset and DataPartitioner.
- get_dataset(cid, type='train')#
Load subdataset for client with client ID
cid
from local file.
- class SyntheticDataset(root, path, preprocess=False)#
Bases:
fedlab.contrib.dataset.basic_dataset.FedDataset
- preprocess(root, path, partition=0.2)#
Preprocess the raw data to fedlab dataset format.
- get_dataset(id, type='train')#
Get dataset class
- Parameters:
- Raises:
- get_dataloader(id, batch_size, type='train')#
Get data loader