sent140#

Module Contents#

Sent140Dataset

BASE_DIR

BASE_DIR#

class Sent140Dataset(client_id: int, client_str: str, data: list, targets: list, is_to_tokens: bool = True, tokenizer: fedlab.contrib.dataset.utils.Tokenizer = None)#

Bases: torch.utils.data.Dataset

_process_data_target()#: process client’s data and target

_data2token()#

encode(vocab: fedlab.contrib.dataset.utils.Vocab, fix_len: int)#

transform token data to indices sequence by Vocab :param vocab: vocab for data_token :type vocab: fedlab_benchmark.leaf.nlp_utils.util.vocab :param fix_len: max length of sentence :type fix_len: int

Returns:: list of integer list for data_token, and a list of tensor target

__encode_tokens(tokens, pad_idx) → torch.Tensor#

encode fix_len length for token_data to get indices list in self.vocab if one sentence length is shorter than fix_len, it will use pad word for padding to fix_len if one sentence length is longer than fix_len, it will cut the first max_words words :param tokens: data after tokenizer :type tokens: list[str]

Returns:: integer list of indices with fix_len length for tokens input

__len__()#

__getitem__(item)#