core#
Package Contents#
Manage |
|
Abstract class |
- class DistNetwork(address, world_size, rank, ethernet=None, dist_backend='gloo')#
Bases:
objectManage
torch.distributednetwork.- Parameters
address (tuple) – Address of this server in form of
(SERVER_ADDR, SERVER_IP)world_size (int) – the size of this distributed group (including server).
rank (int) – the rank of process in distributed group.
ethernet (str) –
dist_backend (str or torch.distributed.Backend) –
backendoftorch.distributed. Valid values includempi,gloo, andnccl. Default:"gloo".
- init_network_connection(self)#
Initialize
torch.distributedcommunication group
- close_network_connection(self)#
Destroy current
torch.distributedprocess group
- send(self, content=None, message_code=None, dst=0, count=True)#
Send tensor to process rank=dst
- recv(self, src=None, count=True)#
Receive tensor from process rank=src
- __str__(self)#
Return str(self).
- class NetworkManager(network)#
Bases:
torch.multiprocessing.ProcessAbstract class
- Parameters
network (DistNetwork) – object to manage torch.distributed network communication.
- run(self)#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup(self, *args, **kwargs)#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()is required to be called.Overwrite this method to implement system setup message communication procedure.
- abstract main_loop(self, *args, **kwargs)#
Define the actions of communication stage.
- shutdown(self, *args, **kwargs)#
Shutdown stage.
Close the network connection in the end.