Resource configuration
The resource configuration (resconfig) includes all resources used in WFEngine. These are the computing resources provided by a batch system such as Slurm, the LaunchPad databases (MongoDB), and runtime environments. The resconfig includes a list of workers and a default worker. The worker is mapped to the computing cluster / computer used by WFEngine. If the computing cluster has Slurm then queues and computing resources are configured.
Location of resource configuration file
The default location of the resource configuration is $HOME/.fireworks/res_config.yaml. This path can be overridden by setting the environment variable RESCONFIG_LOC. The actual path is returned by the function resconfig.get_resconfig_loc().
Create a resource configuration from scratch
Whether a resconfig file already exists, can be checked with this:
from os import path
from virtmat.middleware.resconfig import get_resconfig_loc
path.exists(get_resconfig_loc()) # False
If, as in the example, a resconfig does not yet exist a new resconfig can be created and written to the default resconfig location:
from virtmat.middleware.resconfig import ResConfig, set_defaults_from_guess
from virtmat.middleware.resconfig import get_resconfig_loc
cfg = ResConfig.from_scratch()
set_defaults_from_guess(cfg.default_worker)
cfg.to_file(get_resconfig_loc())
With this one worker configuration is created and set as default.
Add a worker to an existing resconfig
If a resconfig with configured workers already exists, one can add a worker for the current system:
path.exists(get_resconfig_loc()) # True
cfg = ResConfig.from_file(get_resconfig_loc())
cfg.add_worker_from_scratch(default=False)
cfg.to_file(get_resconfig_loc())
To set the new worker as the default worker, one should call with default=True.
Worker configuration
Every computer or computing cluster is represented by a worker. The worker has a name and type. Workers of type SLURM have a list of queues and a list of group names that can be used for accounting when a job is submitted to some queue. A worker configuration can be created using the WorkerConfig class:
from virtmat.middleware.resconfig import WorkerConfig
wcfg = WorkerConfig(name='w_name')
In this case, the type, the queues and the accounts have to be set manually. A more rapid method to configure a worker is to use the class method from_scratch():
wcfg = WorkerConfig.from_scratch()
In both cases, one has to set the default queue and default account:
# set the first queue as default:
wcfg.set_default_queue()
# set the current group (guid) as default:
wcfg.set_default_account()
One can also set a queue and a group explicitly:
wcfg.default_queue = qcfg
wcfg.default_account = 'group_name'
If qcfg or group_name are not in the lists queues and accounts, respectively, a ResourceConfigurationError is raised.
The from_scratch() method also sets up a list of environment modules provided on the computing cluster that is accessible via the modules attribute.
Besides accounts and queues, the class WorkerConfig has further attributes related to the runtime environment: environment modules, environment variables and default virtual environment, default launch directory, and default shell commands.
Queue configuration
The queue configuration accommodates computing resource configurations via the resources attribute. Further attributes are name, public, accounts_allow, accounts_deny, and groups_allow. A queue configuration is created using the QueueConfig class:
qcfg = QueueConfig()
With the get_resource() method a specific resource configuration can be retrieved:
print(qcfg.get_resource('time'))
A resource can be set with set_resource() method, e.g. to set the default walltime to 5 minutes:
qcfg.set_resource('time', 'default', 5)
If the resource does not exist, it is created and added to list of resources in the queue. The second argument, the resource type, may be one of minimum, maximum or default. If other keyword is used then a ValueError will be raised.
With the method validate_resource(), a resource can be validated, e.g. to check whether time of 4000 minutes may be requested in this queue:
qcfg.validate_resource('time', 4000)
In case the resource value exceeds the limits, then a ResourceConfigurationError is raised.
Computing resource configuration
This configuration includes one single resource. The public attributes are name, minimum, maximum and default. Here an example of creating and adding a resource to the list of resources of a queue:
from virtmat.middleware.resconfig import ResourceConfig
time = ResourceConfig('time')
time.minimum = 1
time.maximum = 100
time.default = 5
qcfg.resources.append(time)
Configuration of environment modules
Environment modules (alternative link) are commonly used on computing clusters to set up the run-time environment of various software. Thus they can be used as a discovery source (registry) for software deployed on the cluster. The attributes of the ModuleConfig class are:
prefix (
str): module prefix, if module name has prefix, elseNonename (
str): module name, may not beNoneversions (
[str]): available module versions, empty list if no versions specifiedpath (
str): path to module file, if different from default modules path, elseNone
Example: The module files chem/gromacs/2022.6 and chem/gromacs/2022.5 that are in the default modules path are represented in resconfig as the object mcfg = ModuleConfig(prefix='chem', name='gromacs', versions=['2022.6', '2022.5']). The command module load chem/gromacs/2022.6 can be retrieved by mcfg.get_command('gromacs', '==2022.6'). In case of no match, e.g. mcfg.get_command('gromacs', '>2022.6') will return None.
The module configurations are added as lists to the modules and default_modules attributes of the WorkerConfig object, e.g.
wcfg.modules = [mcfg] # available modules
wcfg.default_modules = [mcfg] # module loaded by default
Modules from wcfg.modules can be added to a custom qadapter object using the module keyword while wcfg.default_modules are added automatically to every qadapter.
Configuration of environment variables
Run-time environments often require setting evironment variables in the shell. Per worker, i.e. per instance of the WorkerConfig class, one can define variables and their values, e.g.
wcfg.envvars = {'APP': 'app_name', 'PYTHONPATH': '/path/to/python/libs'}
wcfg.default_envvars = ['PYTHONPATH']
The envvars can be selected to add to a custom qadapter objects by passing the venvs keyword. The default_envvars are added automatically to every qadapter using the corresponding value from envvars.
Configuration of default virtual environment
A default Python virtual environment (such as venv and conda) can be configured using
wcfg.default_venv = {'type': 'conda',
'name': os.environ.get('CONDA_DEFAULT_ENV'),
'prefix': os.environ.get('CONDA_PREFIX')}
for conda environments and
wcfg.default_venv = {'type': 'venv',
'name': sys.prefix.rsplit('/', maxsplit=1)[-1],
'prefix': sys.prefix}
for venv. A command is generated and added to the qadapter that used to activate this environment at run time.
Configuration of default shell commands
Sometimes it is not possible to perform certain modification in the environment using environment variables and some shell commands have to be executed. For example, setting umask 027 can be configured in the worker like this
wcfg.default_commands = [CommandConfig('umask', ['027'])]
Configuration of default launch directory
The launch directory is where a Slurm job is run. By default this is the current working directory of the shell submitting the job. Another launch directory can be set with
wcfg.default_launchdir = '/path/to/launch/directory'
This directory is added to the qadapter and then used at run time.