core#

Package Contents#

DistNetwork

Manage torch.distributed network.

NetworkManager

Abstract class

class DistNetwork(address, world_size, rank, ethernet=None, dist_backend='gloo')#

Bases: object

Manage torch.distributed network.

Parameters
  • address (tuple) – Address of this server in form of (SERVER_ADDR, SERVER_IP)

  • world_size (int) – the size of this distributed group (including server).

  • rank (int) – the rank of process in distributed group.

  • ethernet (str) –

  • dist_backend (str or torch.distributed.Backend) – backend of torch.distributed. Valid values include mpi, gloo, and nccl. Default: "gloo".

init_network_connection(self)#

Initialize torch.distributed communication group

close_network_connection(self)#

Destroy current torch.distributed process group

send(self, content=None, message_code=None, dst=0)#

Send tensor to process rank=dst

recv(self, src=None)#

Receive tensor from process rank=src

__str__(self)#

Return str(self).

class NetworkManager(network)#

Bases: torch.multiprocessing.Process

Abstract class

Parameters

network (DistNetwork) – object to manage torch.distributed network communication.

run(self)#

Main Process:

  1. Initialization stage.

  2. FL communication stage.

  3. Shutdown stage. Close network connection.

setup(self, *args, **kwargs)#

Initialize network connection and necessary setups.

At first, self._network.init_network_connection() is required to be called.

Overwrite this method to implement system setup message communication procedure.

abstract main_loop(self, *args, **kwargs)#

Define the actions of communication stage.

shutdown(self, *args, **kwargs)#

Shutdown stage.

Close the network connection in the end.