ARC Systems
At the centre of the ARC service are two high performance compute clusters - arc and htc.
arc is designed for multi-node parallel computation
htc is designed for high-thoughput operation (lower core count jobs).
htc is also a more heterogeneous system offering different types of resources, such as GPGPU computing and high memory systems; nodes on arc are uniform. Users get access to both both clusters automatically as part of the process of obtaining an account with ARC, and can use either or both.
For more detailled information on the hardware specifications of these clusters, see the tables below:
Cluster |
Description |
Login Node |
Compute Nodes |
Minimum Job Size |
Notes: |
---|---|---|---|---|---|
arc |
Our largest compute cluster. Optimised for large parallel jobs spanning multiple nodes. Scheduler prefers large jobs. Offers low-latency interconnect (Mellanox HDR 100). |
arc-login |
CPU: 48 core Cascade Lake (Intel Xeon Platinum 8268 CPU @ 2.90GHz) Memory: 392GB |
1 core |
Non-blocking island size is 2212 cores |
htc |
Optimised for single core jobs, and SMP jobs up to one node in size. Scheduler prefers small jobs. Also catering for jobs requiring resources other than CPU cores (e.g. GPUs). |
htc-login |
CPUs: mix of Broadwell, Haswell, Cacade Lake GPU: P100, V100, A100, RTX |
1 core |
Jobs will only be scheduled onto a GPU node if requesting a GPU resource. |
Operating system
The ARC systems use the Linux Operating System (specifically CentOS 8) which is commonly used in HPC. We do not have any HPC systems running Windows (or MacOS). If you are unfamiliar with using Linux, please consider:
Finding introduction to Linux resources online (through Google/Bing/Yahoo etc).
Working through our brief Introduction to Linux course.
Attending our Introduction to ARC training course (this does not teach you how to use Linux but the examples will help you gain a greater understanding).
Capability cluster (arc)
The capability system - cluster name arc - has a total of 305 48 core worker nodes, some of which are co-investment hardware. These machines are available for general use, but may be subject to job time limits and/or may occasionally be reserved for exclusive use of the entity that purchased them.
The ARC system offers a total of 14,640 CPU cores.
All nodes have the following:
2x Intel Platinum 8628 CPU. The Platinum 8628 is a 24 core 2.90GHz Cascade Lake CPU. Thus all nodes have 48 CPU cores per node.
384GB memory
HDR 100 infiniband interconnect. The fabric has a 3:1 blocking factor with non-blocking islands of 44 nodes (2112 cores).
OS is CentOS Linux 8.1. Scheduler is SLURM.
Login node for the system is ‘arc-login.arc.ox.ac.uk’, which allows logins from the University network range (including VPN).
The generally available partitions are:
Partition |
Nodes / cores |
Nodes |
Default run time |
Maximum run time |
---|---|---|---|---|
short |
293 / 14,064 |
arc-c[001-293] |
1 hour |
12 hours |
medium |
242 / 11,616 |
arc-c[046-287] |
12 hours |
2 days |
long |
242 / 11,616 |
arc-c[046-287] |
1 day |
unlimited |
devel |
2 / 96 |
arc-c[302-303] |
10 minutes |
|
interactive |
2 / 96 |
arc-c[304-305] |
1 hour |
4 hours |
Throughput cluster (htc)
The throughput system - cluster name htc - currently 95 worker nodes, some of which are co-investment hardware. These machines are available for general use, but may be subject to job time limits and/or may occasionally be reserved for exclusive use of the entity that purchased them. The hardware on the HTC system is more heterogeneous than on the ARC system.
49 of the nodes are GPGPU nodes. More information on how to access GPU nodes is available.
2 of the nodes are High Memory nodes with 3TB of RAM.
OS is CentOS Linux 8.1. Scheduler is SLURM.
Login node for the system is ‘htc-login.arc.ox.ac.uk’, which allows logins from the University network range (including VPN).
Details on the partitions are:
Partition |
Nodes / cores, GPUs |
Nodes |
Default run time |
Maximum run time |
---|---|---|---|---|
short |
93 / 3,716
|
htc-c[001-046]
htc-g[001-006,009-018,020-038,041-052] |
1 hour |
12 hours |
medium |
61 / 2,808
|
htc-c[001-004,006-046]
htc-g[009-018,044-049] |
12 hours |
2 days |
long |
61 / 2,808
|
htc-c[001-004,006-046]
htc-g[009-018,044-049] |
1 day |
unlimited |
devel |
1 / 28
|
htc-g039 |
10 minutes |
|
interactive |
1 / 28
|
htc-g040 |
1 hour |
4 hours |
Node CPU details are:
Nodes |
CPU |
Cores per node |
memory per node |
interconnect |
---|---|---|---|---|
htc-c[005-006] |
Intel Platinum 8628 (Cascade Lake), 2.90GHz |
96 |
3TB |
HDR100 |
htc-c[007-046] |
Intel Platinum 8628 (Cascade Lake), 2.90GHz |
48 |
384GB |
|
htc-c047 |
Intel E7-8860v3 (Haswell), 2.60GHz |
128 |
6TB |
|
htc-g[001-018] |
Intel Platinum 8628 (Cascade Lake), 2.90GHz |
48 |
384GB |
HDR100 |
htc-g019 |
AMD Epyc 7452 (Rome), 2.35GHz |
64 |
1TB |
|
htc-g[020-029] |
Intel Silver 4210 (Cascade Lake), 2.20GHz |
20 |
256GB |
|
htc-g[030-040] |
Intel Gold 5120 (Cascade Lake), 2.20GHz |
28 |
384GB |
|
htc-g[041-043] |
Intel Silver 4112 (Cascade Lake), 2.60GHz |
8 |
192GB |
|
htc-g[044-049] |
Intel E5-2698 v4 (Broadwell), 2.20GHz |
40 |
512GB |
|
htc-g[050-052] |
Intel Silver 4208 (Cascade Lake), 2.10GHz |
16 |
128GB |
HDR100 |
GPU Resources
ARC has a number of GPU nodes in the “htc” cluster.
Node GPU details are:
Nodes |
GPUs |
#GPUs |
GPU memory |
ECC |
CUDA cores |
CUDA compute capability |
nvlink |
---|---|---|---|---|---|---|---|
htc-g[001-008] |
V100 |
2 |
32GB |
yes |
5120 |
7.0 |
no |
htc-g[009-014] |
RTX8000 |
4 |
40GB |
yes |
4608 |
7.5 |
no |
htc-g[015-019] |
A100 |
4 |
40GB |
yes |
6912 |
8.0 |
no |
htc-g[020-029] |
Titan RTX |
4 |
24GB |
no |
4606 |
7.5 |
pairwise |
htc-g[030-034] |
P100 |
4 |
16GB |
yes |
3584 |
6.0 |
no |
htc-g[035-036] |
V100 |
4 |
16GB |
yes |
5120 |
7.0 |
no |
htc-g[037-038] |
V100 |
4 |
32GB |
yes |
5120 |
7.0 |
yes |
htc-g[039-040] |
V100 |
4 |
16GB |
yes |
5120 |
7.0 |
yes |
htc-g[041-043] |
Titan RTX |
4 |
24GB |
yes |
4606 |
7.5 |
pairwise |
htc-g044 |
V100 |
8 |
16GB |
yes |
5120 |
7.0 |
yes |
htc-g[045-049] |
V100-LS |
8 |
32GB |
yes |
5120 |
7.0 |
yes |
htc-g[050-052] |
RTXA6000 |
4 |
48GB |
yes |
10,752 |
8.6 |
yes |
Storage
Our clusters systems share 2PB of high-performance GPFS storage; this holds per-cluster scratch file systems as well as project data storage.
On all nodes with HDR100 interconnect, project data storage is mounted natively; all other nodes access this storage via NFS.
Software
Users may find the application they are interested in running is already been installed on at least one of the systems. Users are welcome to request the installation of new applications and libraries or updates to already installed applications via our software request form.