core#

Package Contents#

DistNetwork

Manage torch.distributed network.

NetworkManager

Abstract class.

class DistNetwork(address: tuple, world_size: int, rank: int, ethernet: str = None, dist_backend: str = 'gloo')#

Bases: object

Manage torch.distributed network.

参数
  • address (tuple) – Address of this server in form of (SERVER_ADDR, SERVER_IP)

  • world_size (int) – the size of this distributed group (including server).

  • rank (int) – the rank of process in distributed group.

  • ethernet (str) – the name of local ethernet. User could check it using command ifconfig.

  • dist_backend (str or torch.distributed.Backend) – backend of torch.distributed. Valid values include mpi, gloo, and nccl. Default: gloo.

init_network_connection()#

Initialize torch.distributed communication group

close_network_connection()#

Destroy current torch.distributed process group

send(content=None, message_code=None, dst=0, count=True)#

Send tensor to process rank=dst

recv(src=None, count=True)#

Receive tensor from process rank=src

broadcast_send(content=None, message_code=None, dst=None, count=True)#
broadcast_recv(src=None, count=True)#
__str__()#

Return str(self).

class NetworkManager(network: fedlab.core.network.DistNetwork)#

Bases: torch.multiprocessing.Process

Abstract class.

参数

network (DistNetwork) – object to manage torch.distributed network communication.

run()#

Main Process:

  1. Initialization stage.

  2. FL communication stage.

  3. Shutdown stage. Close network connection.

setup()#

Initialize network connection and necessary setups.

At first, self._network.init_network_connection() is required to be called.

Overwrite this method to implement system setup message communication procedure.

abstract main_loop()#

Define the actions of communication stage.

shutdown()#

Shutdown stage.

Close the network connection in the end.