Cluster Architecture

Head Node

When you ssh into Arjuna, you connect to the “Head Node”, also known as c001. From this machine, you can do the following

Transfer files to/from Arjuna to a local machine (i.e. a compute node or another computer connected to the CMU network to which you have access)
Download files from the internet
Submit jobs to the cluster
Monitor the status of existing jobs

DO NOT run compute jobs on the Head Node. Moreover, do not use the Head Node for anything other than submitting jobs to compute nodes. Unauthorized uses of the Head Node include, but are not limited to:

Running Jupyter Notebooks to analyze data
Webscraping
Running Simulations

Authorized uses of the Head Node include, but are not limited to:

Installing software for use on the compute nodes
Moving data to and from Arjuna
Submitting jobs

If the desired compute task is anything other than trivial operations required for job submission, which can only be run on the Head Node of Arjuna, it should be run on a worker node or elsewhere.

Partitions

Partition can be specified by passing the -p or --partition flag to srun, sbatch or salloc. By default jobs are submitted to the debug partition.

Unless otherwise specified, jobs have a max runtime of 1 day and receive 1GB of memory per requested CPU. See Allocating Resources for more information.

Partition	Count	Cores	Memory	Tmp Storage	Max Time
cpu	58	56	128 GB	100 GB	7 days
highmem	2	32	512 GB	100 GB	7 days
gpu	27	64	128 GB	100 GB	14 days
debug	2	64	128 GB	100 GB	10 minutes

For more information on the partitions and their default settings run scontrol show partitions on Arjuna.

We reserve 2 cores per node for system usage (e.g. the slurmd daemon). These cores are not available for jobs. For example, a node from the gpu partition has up to 62 cores per node available for jobs.

GPU Nodes

Each GPU Nodes has 4 K80 NVIDIA GPUs available for usage. To request a gpu, use the --gres=gpu:N flag, where N is the number of GPUs requested.

See Generic Resource Scheduling for more information about requesting GPUs.

Internet Access

Workers can not resolve domain names, and as a result, most internet services may not function correctly. For example,

Cloning a git repository from github.com
Downloading packages using spack, pip, conda or Pkg.jl
Downloading files with wget, rclone or curl from the Internet

Users are encoraged to use the Head Node for these tasks.

Storage

Arjuna has 20 TB of RAID 6 storage.

Arjuna’s RAID Storage is not meant for long-term storage of data. For long-term data storage, please use other resources such as CMU’s unlimited google drive access.

Users are allocated 350 GB of storage in their /home directory.

If you exceed this, you have 7 days before further usage is blocked
There is a hard cap at 500 GB

Once your grace period has elapsed (Or you exceed the hard limit), additional writes will be blocked. This may cause job failure

To see your current disk usage: quota -s

Removing Temporary Files

Temporary files generated by jobs can quickly consume storage space. The following command will recursively delete all *.gpw files in your home directory (~).

find ~ -name `*.gpw` -delete

Backing up Data

User data stored on Arjuna is NOT backed up by the Arjuna Admin Team. Users are responsible for backing up their data. See Data Backup for additional guidance.