Once your account has been created, you will immediately have access to two persistent storage areas:
$HOME (/home/username) with a 15GB quota
$DATA (/data/projectname/username) sharing a 5TB quota with your project colleagues
Additionally, when you run a SLURM job, a per-job $SCRATCH and $TMPDIR for temporary data/workfiles is created.
$TMPDIR is local to a compute node
$SCRATCH is on a shared file system and available to all nodes in a job, if a job spans multiple nodes.
Using ARC $SCRATCH storage
Note: Both $SCRATCH and $TMPDIR are not persistent; they will be automatically removed on job exit. It is important that your job copies all files into your $DATA area before it exits; we will not be able to recover your data if you left it on $SCRATCH or $TMPDIR once a job finished.
As a rule we recommend that you use your $DATA area to store your data, but utilise the per job $SCRATCH or $TMPDIR area - especially for intermediate or temporary files. Generally you would copy all required input data at the start of your job and then copying results back to your $DATA area.
A simple example of how to do this would be:
#!/bin/bash # # After SBATCH lines in submission script... # # cd $SCRATCH || exit 1 # # Copy job data to $SCRATCH # rsync -av $DATA/myproject/input ./ rsync -av $DATA/myproject/bin ./ # # Job specific lines... # module load foss/2020b mpirun ./bin/my_software # # Copy data back from $SCRATCH to $DATA/myproject directory # rsync -av --exclude=input --exclude=bin ./ $DATA/myproject/
This example copies the directories
$DATA/myproject/bin into $SCRATCH (which will then contain directories
bin). The script then runs
./bin/my_software; and copies all files in the $SCRATCH directory - excluding directories
bin - back to
$DATA/myproject/ once the
The process is more straightforward if you only need to copy single input/ouput files when the application is centrally hosted, for example:
#!/bin/bash # # After SBATCH lines in submission script... # # # Load appropriate module, in this test case we are using Gaussian module load Gaussian/16.C.01 # Set input/output filenames # export INPUT_FILE=test397.com export OUTPUT_FILE=test397.log echo "copying input from $SLURM_SUBMIT_DIR/$INPUT_FILE ..." cd $SCRATCH || exit 1 cp $SLURM_SUBMIT_DIR/$INPUT_FILE ./ # Job specific application command line... # g16 < $INPUT_FILE > $OUTPUT_FILE # Copy output back from $SCRATCH to job directory # echo "copying output back to $SLURM_SUBMIT_DIR/ ..." cp $OUTPUT_FILE $SLURM_SUBMIT_DIR/
If you are unable to access either of these directories, please let us know.
By default your $HOME area will have a 15GB quota while the $DATA area will have a 5TB quota that is shared between yourself and the other members of your project.
To check your quota use the command:
This command will list both your home quota and the quota of shared project data areas that you are a member of.
We can provide more detailed statements of data area quota usage to project leaders on request.
Larger Data quotas (more than 5TB) are available on request as a chargeable service. Please contact ARC support for further information.
If you are a user of Anaconda virtual environments and find yourself over quota in $HOME, please check your conda package cache size. Information on this can be found here: Anaconda Package Cache
We do NOT currently create backups of the ARC shared file system (although the file system IS resilient to failures). We therefore strongly encourage you to keep copies of your files elsewhere, particularly when that data is critical to your research.
Snapshots have been configured to be generated on home directories. Snapshots provide easy access to older versions of files. This is useful if files have been accidentally deleted or overwritten. It does not, however, constitute a backup; old snapshots will not be kept indefinitely (max. two weeks for weekly snapshots).
Within your home directory, there is a .snapshot directory which contains the hourly, daily and weekly snapshots available. To list/examine the snapshots, simply ‘cd’ into $HOME/.snapshot and list the available directories:
cd $HOME/.snapshot ls -1tr
You will see a listing of all snapshots (reverse order, i.e. newest last):
weekly.2020-08-02_0015 weekly.2020-07-26_0015 daily.2020-08-06_0010 hourly.2020-08-07_1105 hourly.2020-08-07_1005 hourly.2020-08-07_0905 daily.2020-08-07_0010 hourly.2020-08-07_1305 hourly.2020-08-07_1205 hourly.2020-08-07_1405
To choose a particular snapshot, simply change into the relevant directory:
Within those directories you will essentially find a copy of your home directory as it was when the snapshot was taken.
If you’ve accidentally deleted a file in your home directory which existed earlier than the last snapshot, then you can retrieve the older copy from the snapshot. Simply find the version of the file you are after within the .snapshot structure, and copy it back into your home directory.
For example - assuming you have deleted a file ‘ARC-Introduction-2018-Hilary.pptx’ from folder $HOME/Documents by mistake. To recover it, the steps would be:
[$(arcus) Documents]$ pwd /home/ouit0622/Documents [$(arcus) Documents]$ ls -1 ARC-Introduction-2018-Hilary.pptx arc_job_submission_exercises arc_presentation MATLAB [$(arcus) Documents]$ rm ARC-Introduction-2018-Hilary.pptx [$(arcus) Documents]$ ls -1 arc_job_submission_exercises arc_presentation MATLAB [$(arcus) Documents]$ cd $HOME/.snapshot/ [$(arcus) .snapshot]$ ls -1tr weekly.2020-08-02_0015 weekly.2020-07-26_0015 daily.2020-08-06_0010 hourly.2020-08-07_1105 hourly.2020-08-07_1005 hourly.2020-08-07_0905 daily.2020-08-07_0010 hourly.2020-08-07_1305 hourly.2020-08-07_1205 hourly.2020-08-07_1405 [$(arcus) .snapshot]$ cd hourly.2020-08-07_1405 [$(arcus) hourly.2020-08-07_1405]$ pwd /home/ouit0622/.snapshot/hourly.2020-08-07_1405 [$(arcus) hourly.2020-08-07_1405]$ cd Documents [$(arcus) Documents]$ ls -1 ARC-Introduction-2018-Hilary.pptx arc_job_submission_exercises arc_presentation MATLAB [$(arcus) Documents]$ cp ARC-Introduction-2018-Hilary.pptx $HOME/Documents [$(arcus) Documents]$ $HOME/Documents/ [$(arcus) Documents]$ pwd /home/ouit0622/Documents [$(arcus) Documents]$ ls -1 ARC-Introduction-2018-Hilary.pptx arc_job_submission_exercises arc_presentation MATLAB
Note: Snapshots do not take up space in the file system, i.e. they do not count towards your quota. If you are trying to determine where in your home directory space is used, you must exclude the .snapshot directory from your commands as otherwise the information would be incorrect.