SLURM client#
Basic scripts can be submitted from python with SlurmScriptRestClient
with three parameters:
URL of the SLURM REST API.
The user under which the SLURM job will be executed.
Access token for the user, obtained for example with the command
scontrol token lifespan=<timeinseconds>
on a SLURM node.
import os
import getpass
from pyslurmutils.client import SlurmScriptRestClient
url = os.environ.get("SLURM_URL")
token = os.environ.get("SLURM_TOKEN")
user_name = os.environ.get("SLURM_USER", getpass.getuser())
client = SlurmScriptRestClient(url=url, user_name=user_name, token=token)
Submit and wait#
When no shebang is provided, a script is assumed to be a bash script
SCRIPT = """
echo 'Message to STDOUT'
>&2 echo 'Message to STDERR'
"""
job_id = client.submit_script(SCRIPT)
print(client.wait_finished(job_id))
The last line waits for the job to finish and prints the final status COMPLETED, FAILED, CANCELLED or TIMEOUT.
Job logging#
To see the standard output and standard error (merged by default) you can provide the log_directory
parameter
import os
log_directory = os.path.join(os.path.sep, "tmp_14_days", user_name, "slurm_logs")
client = SlurmScriptRestClient(
url=url,
user_name=user_name,
token=token,
log_directory=log_directory,
std_split=True,
)
SCRIPT = """
echo 'Message to STDOUT'
>&2 echo 'Message to STDERR'
"""
job_id = client.submit_script(SCRIPT)
try:
print(client.wait_finished(job_id))
client.print_stdout_stderr(job_id)
finally:
client.clean_job_artifacts(job_id)
The output looks like this
COMPLETED
STDOUT: /tmp_14_days/<username>/slurm_logs/pyslurmutils.<hostname>.15119577.out
---------------------------------------------------------------------------
Message to STDOUT
STDERR: /tmp_14_days/<username>/slurm_logs/pyslurmutils.<hostname>.15119577.err
---------------------------------------------------------------------------
Message to STDERR
The clean_job_artifacts
method can be used to delete the log files.
Job properties#
To get the job status (COMPLETED, FAILED, RUNNING, …) from a job ID (which is in integer)
print(client.get_status(job_id))
To get the job status with reason, description and exit code
print(client.get_full_status(job_id))
To get all job parameters (including the parameters used for job submission)
print(client.get_job_properties(job_id))