SLURM client#

Basic scripts can be submitted from python with SlurmScriptRestClient with three parameters:

  • URL of the SLURM REST API.

  • The user under which the SLURM job will be executed.

  • Access token for the user, obtained for example with the command scontrol token lifespan=<timeinseconds> on a SLURM node.

import os
import getpass
from pyslurmutils.client import SlurmScriptRestClient

url = os.environ.get("SLURM_URL")
token = os.environ.get("SLURM_TOKEN")
user_name = os.environ.get("SLURM_USER", getpass.getuser())

client = SlurmScriptRestClient(url=url, user_name=user_name, token=token)

Submit and wait#

When no shebang is provided, a script is assumed to be a bash script

SCRIPT = """
echo 'Message to STDOUT'
>&2 echo 'Message to STDERR'
"""

job_id = client.submit_script(SCRIPT)
print(client.wait_finished(job_id))

The last line waits for the job to finish and prints the final status COMPLETED, FAILED, CANCELLED or TIMEOUT.

Job logging#

To see the standard output and standard error (merged by default) you can provide the log_directory parameter

import os

log_directory = os.path.join(os.path.sep, "tmp_14_days", user_name, "slurm_logs")

client = SlurmScriptRestClient(
    url=url,
    user_name=user_name,
    token=token,
    log_directory=log_directory,
    std_split=True,
)

SCRIPT = """
echo 'Message to STDOUT'
>&2 echo 'Message to STDERR'
"""

job_id = client.submit_script(SCRIPT)
try:
    print(client.wait_finished(job_id))
    client.print_stdout_stderr(job_id)
finally:
    client.clean_job_artifacts(job_id)

The output looks like this

COMPLETED

STDOUT: /tmp_14_days/<username>/slurm_logs/pyslurmutils.<hostname>.15119577.out
---------------------------------------------------------------------------
Message to STDOUT

STDERR: /tmp_14_days/<username>/slurm_logs/pyslurmutils.<hostname>.15119577.err
---------------------------------------------------------------------------
Message to STDERR

The clean_job_artifacts method can be used to delete the log files.

Job properties#

To get the job status (COMPLETED, FAILED, RUNNING, …) from a job ID (which is in integer)

print(client.get_status(job_id))

To get the job status with reason, description and exit code

print(client.get_full_status(job_id))

To get all job parameters (including the parameters used for job submission)

print(client.get_job_properties(job_id))