core#
Package Contents#
Manage |
|
Abstract class |
- class DistNetwork(address, world_size, rank, ethernet=None, dist_backend='gloo')#
Bases:
object
Manage
torch.distributed
network.- Parameters
address (tuple) – Address of this server in form of
(SERVER_ADDR, SERVER_IP)
world_size (int) – the size of this distributed group (including server).
rank (int) – the rank of process in distributed group.
ethernet (str) –
dist_backend (str or torch.distributed.Backend) –
backend
oftorch.distributed
. Valid values includempi
,gloo
, andnccl
. Default:"gloo"
.
- init_network_connection(self)#
Initialize
torch.distributed
communication group
- close_network_connection(self)#
Destroy current
torch.distributed
process group
- send(self, content=None, message_code=None, dst=0, count=True)#
Send tensor to process rank=dst
- recv(self, src=None, count=True)#
Receive tensor from process rank=src
- __str__(self)#
Return str(self).
- class NetworkManager(network)#
Bases:
torch.multiprocessing.Process
Abstract class
- Parameters
network (DistNetwork) – object to manage torch.distributed network communication.
- run(self)#
Main Process:
Initialization stage.
FL communication stage.
Shutdown stage. Close network connection.
- setup(self, *args, **kwargs)#
Initialize network connection and necessary setups.
At first,
self._network.init_network_connection()
is required to be called.Overwrite this method to implement system setup message communication procedure.
- abstract main_loop(self, *args, **kwargs)#
Define the actions of communication stage.
- shutdown(self, *args, **kwargs)#
Shutdown stage.
Close the network connection in the end.