Batch ===== The purpose of this sub-command is to facilitate job submission on clusters that use SLURM as their workload manager. Note that this tools assumes that the SLURM file is properly prepared by the user and does not modify the SLURM script in any way. Similar to other sub-commands ``batch`` has several sub-commands itself: .. code:: batch cms batch create-job JOB_NAME --slurm-script=SLURM_SCRIPT_PATH \ --input-type=INPUT_TYPE --slurm-cluster=SLURM_CLUSTER_NAME \ --job-script-path=SCRIPT_PATH --remote-path=REMOTE_PATH \ --local-path=LOCAL_PATH [--argfile-path=ARGUMENT_FILE_PATH] \ [--outfile-name=OUTPUT_FILE_NAME] [--suffix=SUFFIX] [--overwrite] cms batch run-job JOB_NAME cms batch fetch JOB_NAME cms batch test-connection SLURM_CLUSTER_NAME cms batch set-param slurm-cluster CLUSTER_NAME PARAMETER VALUE cms batch set-param job-metadata JOB_NAME PARAMETER VALUE cms batch list slurm-clusters [DEPTH [default:1]] cms batch list jobs [DEPTH [default:1]] cms batch remove slurm-cluster CLUSTER_NAME cms batch remove job JOB_NAME cms batch clean-remote JOB_NAME The main options are: - ``create-job``: used for creating a job configuration (this does not run the job automatically) - ``run-job``: used for running a job configuration that is previously created. - ``test-connection``: used for testing the connection to a SLURM cluster - ``set-param``: used for setting a parameter in any configuration key - ``list``: used for listing possible instances of an entity - ``remove``: used for removing a cluster or job - ``clean-remote``: used for cleaning the files of a job from a cluster Each of these sub-commands are reviewed in the following sections with examples. Creating a job configuration ---------------------------- As can be seen, this sub-command has the most number of arguments and is the vital part of the ``batch`` tool. The parameters are all self-explanatory, but we will review the important ones here: - ``--slurm-script``: defines the path to the SLURM script that is going to be submitted to the SLURM cluster. - ``--input-type``: defines the type of input for the application that is going to be run on the cluster. This is important because if the program takes a file name as an argument, that file has to be transferred to the cluster as well. Possible values for this parameter is either ``params`` or ``params+file``. Note that if you pass ``params+file`` then you have to specify the ``--argfile-path`` as well where you define the path to the argument file. - ``--slurm-cluster``: defines the name of the cluster that is previously defined in cloudmesh yaml file. - ``--job-script-path``: defines the path to the file that is going to be run on the SLURM cluster - ``--remote-path``: defines the path on SLURM cluster on which the job files are going to be copied, run and collected. - ``--local-path``: defines the local path for saving the results. Consider the following example : .. code:: bash cms batch create-job SlurmTest1 --slurm-script=./1_argsin_stdout.slurm \ --input-type=params --slurm-cluster=slurm-taito \ --job-script-path=./1_argsin_stdout_script.sh \ --remote-path=~/tmp --local-path=../batch/sample_scripts/out \ --overwrite This will create a job that looks like this in the ``slurm_batch`` configuration file placed in the workspace directory: :: slurm_cluster: slurm-taito: name: taito credentials: sshconfigpath: ~/vms/sshconfig_slurm job-metadata: SlurmTest1: suffix: _20181206_19275141 slurm_cluster_name: slurm-taito input_type: params+file raw_remote_path: ~/tmp slurm_script_path: ./4_filein_fileout.slurm job_script_path: ./4_filein_fileout_script.sh argfile_path: ./test-script-argument argfile_name: test-script-argument script_name: 4_filein_fileout_script.sh slurm_script_name: 4_filein_fileout.slurm remote_path: ~/tmp/job_20181206_19275141/ remote_script_path: ~/tmp/job_20181206_19275141/4_filein_fileout_script.sh remote_slurm_script_path: ~/tmp/job_20181206_19275141/4_filein_fileout.slurm local_path: ../batch/sample_scripts/out Testing the connection ---------------------- Note that the cluster information is already extracted and added to this file. Therefore unlike ``vcluster``, there is no need to add the cluster manually. So far, we have just added and updated the configuration and the job is neither submitted nor run in the cluster. Before doing that, let's try to test our connection to the cluster: .. code:: bash cms batch test-connection slurm-taito # Slurm Cluster taito is accessible. Running the Job --------------- Now that we are sure that the ssh connection works fine, let's try to run the job: .. code:: bash cms batch run-job SlurmTest1 # Remote job ID: 32846209 Despite the short output, this command does a lot of work behind the seen including: - Creating the proper folder structure in the remote - Copying the SLURM script, as well as the job script and the argument files if any. - Submitting the job - Keeping the job ID and save it in the configuration file so that the results can be fetched later Just for the demonstration purpose, let's check the remote folder in the cluster and you will see that all of the files as well as the results will be available there: .. code:: bash taito-login3:~/tmp/job_20181206_19301175$ ll total 28 drwxr-xr-x 2 4096 Dec 7 02:36 ./ drwx------ 3 4096 Dec 7 02:35 ../ -rwxr-xr-x 1 238 Dec 7 02:35 4_filein_fileout.slurm* -rw-r--r-- 1 0 Dec 7 02:36 4_filein_fileout.slurm.e32846209 -rw-r--r-- 1 117 Dec 7 02:36 4_filein_fileout.slurm.o32846209 -rwxr-xr-x 1 48 Dec 7 02:35 4_filein_fileout_script.sh* -rw-r--r-- 1 35 Dec 7 02:35 test-script-argument -rw------- 1 35 Dec 7 02:36 test-script-output Downloading the Results ----------------------- Now that the results are ready we can fetch the results using the following command: .. code:: bash cms batch fetch SlurmTest1 collecting results Results collected from taito for jobID 32846209 waiting for other results if any... All of the remote results collected. Using this, the results will be downloaded in the local path specified in the configuration file: .. code:: bash out$ ll job_20181206_19301175/ total 1M drwxr-xr-x 2 corriel 1M Dec 6 19:40 ./ drwxr-xr-x 3 corriel 1M Dec 6 19:40 ../ -rw-r--r-- 1 corriel 0M Dec 6 19:40 4_filein_fileout.slurm.e32846209 -rw-r--r-- 1 corriel 1M Dec 6 19:40 4_filein_fileout.slurm.o32846209 -rw------- 1 corriel 1M Dec 6 19:40 test-script-output Cleaning the remote ------------------- Now that you are done, you can easily clean the remote using: .. code:: bash cms batch clean-remote SlurmTest1 Job SlurmTest1 cleaned successfully. Get the list of the jobs and clusters ------------------------------------- Naturally after working with the ``batch`` for a while, several jobs and clusters will be accumulated in the configuration file. You can get the list of current jobs and clusters using the following commands: .. code:: bash cms batch list slurm-clusters slurm-taito: name credentials cms batch list jobs SlurmTest1: suffix slurm_cluster_name input_type raw_remote_path slurm_script_path job_script_path argfile_path argfile_name script_name slurm_script_name remote_path remote_script_path remote_slurm_script_path local_path jobIDs It is also possible to increase the depth of the information by adding the desired depth as the next parameter: .. code:: bash cms batch list slurm-clusters 2 slurm-taito: name: taito credentials: sshconfigpath: ~/vms/sshconfig_slurm Modifying the Configuration by Setting Parameters ------------------------------------------------- In case you want to modify or add a configuration parameter, there is no need to directly modify the file. Indeed you can use the ``set-param`` command to set a key for both jobs and slurm-clusters. In the next example we will add a test-key and test-value parameter to the ``slurm-taito`` cluster: .. code:: bash cms batch set-param slurm-cluster slurm-taito test-key test-value slurm-cluster parameter test-key set to test-value successfully. cms batch list slurm-clusters 2 slurm-taito: name: taito credentials: sshconfigpath: ~/vms/sshconfig_slurm test-key: test-value Removing jobs and clusters -------------------------- Finally, when you are done with a job, or when a cluster is not accessible anymore, you can easily remove them from the ``batch`` configuration file using the following: .. code:: bash cms baremove slurm-cluster slurm-taito Slurm-cluster slurm-taito removed successfully. Similarly, you can remove a obsolete job using the following command: .. code:: bash cms batch remove job SlurmTest1 Job SlurmTest1 removed successfully.