Cheatsheets for Queuing System Quick Reference

Key Points

Why use a Cluster?	High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world. These other systems can be used to do work that would either be impossible or much slower on smaller systems. The standard method of interacting with such systems is via a command line interface called Bash.
Working on a remote HPC system	An HPC system is a set of networked machines. HPC systems typically provide login nodes and a set of worker nodes. The resources found on independent (worker) nodes can vary in volume and type (amount of RAM, processor architecture, availability of network mounted file systems, etc.). Files saved on one node are available on all nodes.
Scheduling jobs	The scheduler handles how compute resources are shared between users. Everything you do should be run through the scheduler. A job is just a shell script. If in doubt, request more resources than you will need.
Accessing software	Load software with `module load softwareName` Unload software with `module purge` The module system handles software versioning and package conflicts for you automatically.
Transferring files	`wget` downloads a file from the internet. `scp` transfer files to and from your computer.
Running a parallel job	Parallelism is an important feature of HPC clusters. MPI parallelism is a common case. The queuing system facilitates executing parallel tasks.
Using resources effectively	The smaller your job, the faster it will schedule.
Using shared resources responsibly	Be careful how you use the login node. Your data on the system is your responsibility. Plan and test large data transfers. It is often best to convert many files to a single archive file before transferring. Again, don’t run stuff on the login node.

SLURM

Units and Language

A computer’s memory and disk are measured in units called Bytes (one Byte is 8 bits). As today’s files and memory have grown to be large given historic standards, volumes are noted using the SI prefixes. So 1000 Bytes is a Kilobyte (kB), 1000 Kilobytes is a Megabyte (MB), 1000 Megabytes is a Gigabyte (GB), etc.

History and common language have however mixed this notation with a different meaning. When people say “Kilobyte”, they mean 1024 Bytes instead. In that spirit, a Megabyte is 1024 Kilobytes.

To address this ambiguity, the International System of Quantities standardizes the binary prefixes (with base of 2¹⁰=1024) by the prefixes Kibi (ki), Mibi (Mi), Gibi (Gi), etc. For more details, see here.

Glossary

The following list captures terms that need to be added to this glossary. This is a great way to contribute.

Accelerator: to be defined
Beowulf cluster: to be defined
Central processing unit: to be defined
Cloud computing: to be defined
Cluster: a collection of computers configured to enable collaboration on a common task by means of purposefully configured hardware (e.g., networking) and software (e.g. workload management).
Distributed memory: to be defined
Grid computing: to be defined
High availability computing: to be defined
High performance computing: to be defined
Interconnect: to be defined
Node: to be defined
Parallel: to be defined
Serial: to be defined
Server: to be defined
Shared memory: to be defined
Slurm: to be defined
Supercomputer: … “a major scientific instrument” …
Workstation: to be defined
Grid Engine: to be defined
Parallel File System: to be defined

Local resources

Advanced Research Computing (ARC) at the University of Saskathewan

Introduction to High-Performance Computing: Cheatsheets for Queuing System Quick Reference

Key Points