Dev Docs¶
Design¶
Terminologies¶
slots
are an abstraction for computational resources on each worker node.job
encapsulates the high level goal to be accomplished. For ex: a map-reduce operation on a cluster is an example of ajob
.task
signifies components that make up ajob
. For ex:map
andreduce
operations fall under the category of atask
.client
is a user or a process that submits ajob
to theMaster
.
Assumptions¶
It is assumed that there can be only two types of tasks:
map
reduce
Faults are assumed to never occur.
All tasks utilize exactly one
slot
.Clocks are synchronized across nodes in the cluster (can be overcome by using Lamport timestamps).
Master¶
The Master
class implements the logic for the master node of the cluster.
At a high level, the Master
class does the following:
Listens for job job requests from clients. It does this on port
5000
.Keeps track of each worker nodes and listens for updates from them on port
5001
.Implements a
Scheduler
class that is responsible for scheduling tasks on workers depending on the type of scheduling policy set inMaster
.
A config
file (json
) is provided to the Master
which holds information about the workers such as number of slots
per worker, worker_id
of the worker and the port
the worker will listen on for task submissions from the Master
.
An example config
file is as follows:
{
"workers": [
{
"worker_id": "0",
"slots": 5,
"port": 4000
},
{
"worker_id": "1",
"slots": 7,
"port": 4001
},
{
"worker_id": "2",
"slots": 3,
"port": 4002
}
]
}
Implementation¶
For all purposes of inter-thread communication, the python
Queue
class is used. This was chosen sinceQueue
is thread-safe by default.For handling multiple concurrent connections:
Everytime there is a new incoming connection on a socket, a connection is established to this and the
Listener
class is used to maintain this connection in a seperate thread.In case some time in the future, implementation changes requires the
Master
to send acknowledgements to the client. This is enabled by registering anEvent
on the thread maintaining the connection.A mapping from the
client
address to the correspondingListener
object is maintained for ease of implementation of the above mentioned feature.
To handle job requests comming in from clients and updates coming in from workers, the following is done:
Two seperate queues are maintained (one for job requests and the other for updates).
For a job type
Listener
, the queue for job requests is passed to it and similarly the update queue is passed to the update type ofListener
.Doing this allows us to share these queues among multiple threads (safely since
Queue
is thread safe).
Everytime data is recieved in a
Listener
thread, this data along with the address (ip
andport
) of the machine sending it isput
into the queue.By default, the
get()
method of theQueue
class blocks if the queue is empty.A seperate thread is spawned which polls each queue (blocks if empty) and extracts items from the queue.
Scheduling related actions are broadly divided into categories:
When a new job request arrives at the
Master
.When a
Worker
sends an update on a task.
For the architecture, please see Master.
Worker¶
The Worker
class implements the logic for a worker node in the cluster.
At a high level, the Worker
class does the following:
Listen for task assignments from the
Master
on a port specified as command line argument (PORT
).After simulating task execution, send an update to the
Master
signifying that task execution is completed successfuly.
Implementation¶
Maintain a queue of completed tasks which can be polled and completion updates can be sent accordingly to the
Master
.Spawn threads to implement the above mentioned functionality
Spawn a thread to listen for incoming tasks from the
Master
Spawn a thread to simulate execution of tasks on the
Worker
Spawn a thread to poll the completed queue. Items taken out of this queue are then part of the completion update sent to the master.
For the architecture, please see Worker.