virtmat.middleware.engine package
Submodules
virtmat.middleware.engine.wfengine module
A simple workflow engine
- class virtmat.middleware.engine.wfengine.WFEngine(launchpad, qadapter=None, wf_query=None, name=None, launchdir=None, unique_launchdir=True, sleep_time=30)
Bases:
FWSerializableA simple engine to manage workflows
- add_node(func, inputs, outputs=None, name=None, kwargs=None, category=None, fworker=None, qadapter=None)
Add a python function node to an existing workflow
- Parameters:
func (str) – a function name with an optional module name in the format ‘module.function’
inputs ([tuple]) – a list of positional arguments for the provided function. Every input is described by a tuple (fw_id, name, value) with the following elements: fw_id (int): The fw_id of a parent node providing the input; if the input is provided as a constant value, then None should be specified. name (str): The name of the input as provided in the list of outputs of the parent node; value: The value of the input; if output data from a parent node is used as input, then this should be set to None.
outputs ([str]) – names of the outputs
name (str, None) – name of the node
kwargs (dict, None) – a dictionary of keyword arguments for func
category (str, None) – job category, either ‘batch’ or ‘interactive’
fworker (FWorker, None) – fworker for executing the batch jobs
qadapter (CommonAdapter, None) – qadapter for submitting batch jobs
- add_workflow(workflow=None, fw_id=None)
Add a workflow to the engine Either a workflow object or a fw_id must be defined.
- Parameters:
workflow (Workflow, None) – a workflow object
fw_id (int, None) – a fw_id of a workflow existing on the launchpad
- append_wf_id(wf_id)
append a workflow id (wf_id) to the list of wf_ids
- cancel_job(fw_id, restart=False, pause=False)
Cancel the execution of a node in RESERVED or RUNNING state Either restart or pause can be set to True if required.
- Parameters:
fw_id (int) – the fw_id of the node to cancel
restart (bool) – rerun after cancelling a RUNNING node
pause (bool) – pause after cancelling a RUNNING node
- exec_cancel(res_id)
Execute the slurm cancel command
- classmethod from_dict(*args, **kwargs)
- property fw_ids
get the current firework ids of the engine
- get_failed()
Get failed job ids
- Returns:
a list of fw_ids of failed jobs
- Return type:
([int])
- get_lost_jobs(time=14400)
Detect nodes that have been launched but not updated within the specified time. The state of such nodes is set to FIZZLED.
- Parameters:
time (int) – minimim time in seconds since the most recent update
- Returns:
a list of fw_ids of the lost runs
- Return type:
lost_fw_ids ([int])
- get_unreserved_nodes(time=1209600)
Detect reserved nodes, i.e. in ‘RESERVED’ state within FireWorks, that have not been updated for a while. Possible inconsistent states in SLURM are ‘CANCELLED’, ‘FAILED’, ‘COMPLETED’, ‘OUT_OF_MEMORY’, ‘BOOT_FAIL’, ‘TIMEOUT’ and ‘DEADLINE’
- Parameters:
time (int) – minimum time in seconds since the most recent update
- Returns:
a list of dictionaries containing the fw_ids, the reservation ids, the SLURM states and the launch directories of such reserved nodes
- Return type:
([dict])
- launcher(stop_event)
The main loop of the launcher
- Parameters:
stop_event (threading.Event) – an object used to quit the launcher
- logger = <Logger virtmat.middleware.engine.wfengine (ERROR)>
- property name
get the name of the engine
- qlaunch(fw_id)
Launch a batch node by submitting a job to the queuing system
- Parameters:
fw_id (int) – a fwd_id of the node to launch
- remove_wf_id(wf_id)
remove a workflow id (wf_id) from the list of wf_ids
- remove_workflow(fw_id)
Remove a workflow from the engine (but not deleted from launchpad)
- Parameters:
fw_id (int) – a fw_id of a node in the workflow to remove
- rerun_node(fw_id)
Rerun a workflow node Only nodes in COMPLETED and FIZZLED states can be rerun.
- Parameters:
fw_id – the fw_id of the node to rerun
- rlaunch(fw_id)
Launch an interactive node
- Parameters:
fw_id (int) – a fwd_id of the node to launch
- show_launcher_status()
Check whether a launcher thread is running
- show_nodes_status()
Display the status summary of the nodes
- show_wf_status(add_io_info=True)
Display the status summary of the workflows
- start()
Start a launcher thread
- status_detail(*fw_ids)
Print a detailed status of specified nodes
- Parameters:
fw_ids ([int]) – One or more fw_ids of the nodes
- Returns:
a list of dictionaries, containing the nodes
- status_summary()
Display a status summary of workflows and nodes
- stop(join=False)
Gracefully stop the launcher thread if it is running
- to_dict(*args, **kwargs)
- update_node(fw_id, update_dict)
Update (modify) a workflow node Only nodes in WAITING, READY and FIZZLED states can be modified.
- Parameters:
fw_id (int) – the fw_id of the node to modify
update_dict (dict) – a dictionary with the updates to perform
- update_rerun_node(fw_id, update_dict)
Update (modify) and rerun a workflow node combined in one function Only nodes in COMPLETED, WAITING, READY and FIZZLED states can be processed.
- Parameters:
fw_id (int) – the fw_id of the node to process
update_dict (dict) – a dictionary with the updates to perform
- property wf_ids
get the workflow ids of the engine
- property wf_query
get the query for the engine
virtmat.middleware.engine.wfengine_jupyter module
A graphical user interface for WFEngine based on ipywidgets
- class virtmat.middleware.engine.wfengine_jupyter.WFEnginejupyter
Bases:
objectA class for construcing a GUI for FireWorks
- add_node_button_clicked(bvar)
add node button is clicked
- add_nodes_button_clicked(bvar)
add nodes button is clicked
- add_workflow_button_clicked(bvar)
add workflows from a query or a file
- cancel_launch_button_clicked(bvar)
cancel launched (reserved or running) nodes
- commit_remove_workflow_button_clicked(bvar)
commit workflows removal from engine
- create_wf_id_select()
creates a new selector with updated workflow ids
- dump_engine_button_clicked(bvar)
dump the engine to file
- func_name = None
- job_category = None
- jqadapter = None
- logger = <Logger virtmat.middleware.engine.wfengine_jupyter (ERROR)>
- lpad_button_clicked(bvar)
load user defined launchpad
- manage_nodes_button_clicked(bvar)
Manage nodes button is clicked
- manage_workflows_button_clicked(bvar)
manage workflows button is clicked
- new_engine_button_clicked(bvar)
create new engine button is clicked
- node_id_select = None
- nodes_status_button_clicked(bvar)
nodes status summary
- qadapter_button_clicked(bvar)
load user defined qadapter
- remove_workflow_button_clicked(bvar)
remove workflows from engine
- rerun_node_button_clicked(bvar)
rerun selected nodes and print their new status
- resume_engine_button_clicked(bvar)
resume engine button is clicked
- rows_inputs = None
- rows_outputs = None
- size_inp = None
- size_out = None
- start_launcher_clicked(bvar)
start launcher button clicked
- status_button_clicked(bvar)
workflow status summary
- status_detailed_button_clicked(bvar)
status details about selected nodes
- stop_launcher_clicked(bvar)
stop launcher button clicked
- update_node_button_clicked(bvar)
update selected nodes
- update_rerun_node_button_clicked(bvar)
update and rerun selected nodes
- wf_id_select = None
- wfe = None
- virtmat.middleware.engine.wfengine_jupyter.add_workflow_method_changed(bvar)
select the method to add workflows from radio buttons
- virtmat.middleware.engine.wfengine_jupyter.clear_button_outputs()
Clear top buttons outputs
- virtmat.middleware.engine.wfengine_jupyter.clear_consoleoutput()
Clear outputs
- virtmat.middleware.engine.wfengine_jupyter.configure_button_clicked(bvar)
Configure button is clicked
- virtmat.middleware.engine.wfengine_jupyter.configure_engine_method_changed(bvar)
select engine configuration method from radio buttons
- virtmat.middleware.engine.wfengine_jupyter.manage_launcher_button_clicked(bvar)
manage launcher button is clicked
- virtmat.middleware.engine.wfengine_jupyter.new_workflow_button_clicked(bvar)
new workflow button is clicked
- virtmat.middleware.engine.wfengine_jupyter.remote_cluster_changed(bvar)
toggle the remote cluster checkbox
- virtmat.middleware.engine.wfengine_jupyter.resconfig_button_clicked(bvar)
resconfig button is clicked
virtmat.middleware.engine.wfengine_remote module
Launch workflow nodes on remote resources
- class virtmat.middleware.engine.wfengine_remote.WFEngineRemote(launchpad, qadapter, wf_query, host=None, user=None, conf='', **kwargs)
Bases:
WFEngineA subclass of wfEngine to manage remote workers
- Parameters:
host – hostname of the remote resource
user – username on the remote resource
conf – configuration command to set up the remote environment
Passwordless connection via SSH to the remote system must be enabled. Otherwise the following error message will occur: PasswordRequiredException: private key file is encrypted
- check_jobcancel(res_id)
Execute the slurm sacct command remotely
- exec_cancel(res_id)
Execute the slurm cancel command remotely
- classmethod from_dict(*args, **kwargs)
- launcher(stop_event)
Awake every sleep_time seconds and launch all READY nodes
- logger = <Logger virtmat.middleware.engine.wfengine_remote (ERROR)>
- setup_remote_configuration()
Create remote launch directory and copy all configuration files
- setup_remote_fworker()
Create configuration for remote worker
- setup_remote_launchpad()
Create launchpad file for remote worker
- setup_remote_qadapter()
Create qadapter file for remote worker
- slaunch(fw_id)
Launch a batch node on a remote resource
- to_dict(*args, **kwargs)