ARC Systems

At the centre of the ARC service are two high performance compute clusters - arc and htc.

  • arc is designed for multi-node parallel computation

  • htc is designed for high-thoughput operation (lower core count jobs).

htc is also a more heterogeneous system offering different types of resources, such as GPGPU computing and high memory systems; nodes on arc are uniform. Users get access to both both clusters automatically as part of the process of obtaining an account with ARC, and can use either or both.

For more detailled information on the hardware specifications of these clusters, see the tables below:

Cluster

Description

Login Node

Compute Nodes

Minimum Job Size

Notes:

arc

Our largest compute cluster. Optimised for large parallel jobs spanning multiple nodes. Scheduler prefers large jobs. Offers low-latency interconnect (Mellanox HDR 100).

arc-login

CPU: 48 core Cascade Lake (Intel Xeon Platinum 8268 CPU @ 2.90GHz) Memory: 392GB

1 core

Non-blocking island size is 2212 cores

htc

Optimised for single core jobs, and SMP jobs up to one node in size. Scheduler prefers small jobs. Also catering for jobs requiring resources other than CPU cores (e.g. GPUs).

htc-login

CPUs: mix of Broadwell, Haswell, Cacade Lake GPU: P100, V100, A100, RTX

1 core

Jobs will only be scheduled onto a GPU node if requesting a GPU resource.

Operating system

The ARC systems use the Linux Operating System (specifically CentOS 8) which is commonly used in HPC. We do not have any HPC systems running Windows (or MacOS). If you are unfamiliar with using Linux, please consider:

  • Finding introduction to Linux resources online (through Google/Bing/Yahoo etc).

  • Working through our brief Introduction to Linux course.

  • Attending our Introduction to ARC training course (this does not teach you how to use Linux but the examples will help you gain a greater understanding).

Capability cluster (arc)

The capability system - cluster name arc - has a total of 305 48 core worker nodes, some of which are co-investment hardware. These machines are available for general use, but may be subject to job time limits and/or may occasionally be reserved for exclusive use of the entity that purchased them.

The ARC system offers a total of 14,640 CPU cores.

All nodes have the following:

  • 2x Intel Platinum 8628 CPU. The Platinum 8628 is a 24 core 2.90GHz Cascade Lake CPU. Thus all nodes have 48 CPU cores per node.

  • 384GB memory

  • HDR 100 infiniband interconnect. The fabric has a 3:1 blocking factor with non-blocking islands of 44 nodes (2112 cores).

  • OS is CentOS Linux 8.1. Scheduler is SLURM.

Login node for the system is ‘arc-login.arc.ox.ac.uk’, which allows logins from the University network range (including VPN).

The generally available partitions are:

Partition

Nodes / cores

Nodes

Default run time

Maximum run time

short

293 / 14,064

arc-c[001-293]

1 hour

12 hours

medium

242 / 11,616

arc-c[046-287]

12 hours

2 days

long

242 / 11,616

arc-c[046-287]

1 day

unlimited

devel

2 / 96

arc-c[302-303]

10 minutes

interactive

2 / 96

arc-c[304-305]

1 hour

4 hours

Throughput cluster (htc)

The throughput system - cluster name htc - currently 95 worker nodes, some of which are co-investment hardware. These machines are available for general use, but may be subject to job time limits and/or may occasionally be reserved for exclusive use of the entity that purchased them. The hardware on the HTC system is more heterogeneous than on the ARC system.

49 of the nodes are GPGPU nodes. More information on how to access GPU nodes is available.

2 of the nodes are High Memory nodes with 3TB of RAM.

OS is CentOS Linux 8.1. Scheduler is SLURM.

Login node for the system is ‘htc-login.arc.ox.ac.uk’, which allows logins from the University network range (including VPN).

Details on the partitions are:

Partition

Nodes / cores, GPUs

Nodes

Default run time

Maximum run time

short

93 / 3,716
  • 76x V100

  • 16x A100

  • 24x RTX8000

  • 12x RTXA6000

  • 20x P100

  • 52x Titan RTX

htc-c[001-046]

htc-g[001-006,009-018,020-038,041-052]

1 hour

12 hours

medium

61 / 2,808
  • 48x V100

  • 16x A100

  • 24x RTX8000

htc-c[001-004,006-046]

htc-g[009-018,044-049]

12 hours

2 days

long

61 / 2,808
  • 48x V100

  • 16x A100

  • 24x RTX8000

htc-c[001-004,006-046]

htc-g[009-018,044-049]

1 day

unlimited

devel

1 / 28
  • 4x V100

htc-g039

10 minutes

interactive

1 / 28
  • 4x V100

htc-g040

1 hour

4 hours

Node CPU details are:

Nodes

CPU

Cores per node

memory per node

interconnect

htc-c[005-006]

Intel Platinum 8628 (Cascade Lake), 2.90GHz

96

3TB

HDR100

htc-c[007-046]

Intel Platinum 8628 (Cascade Lake), 2.90GHz

48

384GB

htc-c047

Intel E7-8860v3 (Haswell), 2.60GHz

128

6TB

htc-g[001-018]

Intel Platinum 8628 (Cascade Lake), 2.90GHz

48

384GB

HDR100

htc-g019

AMD Epyc 7452 (Rome), 2.35GHz

64

1TB

htc-g[020-029]

Intel Silver 4210 (Cascade Lake), 2.20GHz

20

256GB

htc-g[030-040]

Intel Gold 5120 (Cascade Lake), 2.20GHz

28

384GB

htc-g[041-043]

Intel Silver 4112 (Cascade Lake), 2.60GHz

8

192GB

htc-g[044-049]

Intel E5-2698 v4 (Broadwell), 2.20GHz

40

512GB

htc-g[050-052]

Intel Silver 4208 (Cascade Lake), 2.10GHz

16

128GB

HDR100

GPU Resources

ARC has a number of GPU nodes in the “htc” cluster.

Node GPU details are:

Nodes

GPUs

#GPUs

GPU memory

ECC

CUDA cores

CUDA compute capability

nvlink

htc-g[001-008]

V100

2

32GB

yes

5120

7.0

no

htc-g[009-014]

RTX8000

4

40GB

yes

4608

7.5

no

htc-g[015-019]

A100

4

40GB

yes

6912

8.6

no

htc-g[020-029]

Titan RTX

4

24GB

no

4606

7.5

pairwise

htc-g[030-034]

P100

4

16GB

yes

3584

6.0

no

htc-g[035-036]

V100

4

16GB

yes

5120

7.0

no

htc-g[037-038]

V100

4

32GB

yes

5120

7.0

yes

htc-g[039-040]

V100

4

16GB

yes

5120

7.0

yes

htc-g[041-043]

Titan RTX

4

24GB

yes

4606

7.5

pairwise

htc-g044

V100

8

16GB

yes

5120

7.0

yes

htc-g[045-049]

V100-LS

8

32GB

yes

5120

7.0

yes

htc-g[050-052]

RTXA6000

4

48GB

yes

10,752

8.6

yes

Storage

Our clusters systems share 2PB of high-performance GPFS storage; this holds per-cluster scratch file systems as well as project data storage.

On all nodes with HDR100 interconnect, project data storage is mounted natively; all other nodes access this storage via NFS.

Software

Users may find the application they are interested in running is already been installed on at least one of the systems. Users are welcome to request the installation of new applications and libraries or updates to already installed applications via our software request form.