Installation of BG_Flood
Warning
BG_Flood has been written in CUDA language, C++ based language created by NVIDIA to interact directly with their GPUs. Even if the code can run on CPU (for testing purposes for example), it will be performant on NVIDIA GPUs. The best performances are observed on large NVIDIA GPUs on supercomputers.
The code has only two dependencies:
- CUDA
- netcdf
Windows 10 - 11
On windows OS you should be able to use the binaries/executable we make available in each release. Simply download and unzip the file in a suitable directory and either add the folder to your PATH or move the dlls and .exe around where you want to run (you might need to unblock it before first use).
Build from source
To build BG_Flood from source on Windows you will need to have pre-install:
- Visual Studio Community with C++ component installed
- Compatible (Cuda toolkit)[https://developer.nvidia.com/cuda-toolkit]
- Downloaded/cloned/forked source of the repo
- Netcdf developer install (i.e. netcdf.h and netcdf.lib)
Setup on Visual Studio
- start a new empty project
- add CUDA build dependencies to the project
- add NetCDF folder(s) to the include and library directories in the project properties
- add "netcdf.lib" to the input (Properties -> Linker -> Input)
- switch the "Generate Relocatable device code" to Yes (Properties -> CUDA C/C++ -> Common)
- disable deprecation add _CRT_SECURE_NO_WARNINGS to preprocessor definition (Properties -> C/C++ -> Preprocessor)
Linux
Make sure you have latest CUDA Toolkit, g++ and NetCDF libraries installed.
Note
Make sure the GPU driver being used is the Nvidia driver!
Do a quick comand line test to see if nvcc (CUDA compiler) is available from here.
If not, you may need to modify the cuda path in the makefile (line 155) :
Warning
The code can compile for multiple GPU architecture but later compiler do not support old GPU (2.0 is no longer supported). If needed, remove unsupported architecture in line 213 of the makefile.
Then just type
Success
Many warning will show up but that is OK...
Supercomputers
The code can be run on local machines with NVIDIA GPU but it will get better performance by running on large GPU.Below are example of installation and running procedures on HPC the develloper used.
ESNZ supercomputer: Cascade
This machine is set-up using stack and all tools need to be install through it before compiling/running the code.
The PBS job manager is used (see man pbs
for more information).
Compiling the code
. $(ls /opt/niwa/profile/spack_* | tail -1)
spack load netcdf-c@4.9.2%gcc@11.5.0 cuda@12.8.0
nclib=`nc-config --libdir`
export LD_LIBRARY_PATH="${nclib}:$LD_LIBRARY_PATH"
Note
The executable is copied to a bin two folders up: cp BG_Flood ../../bin/x86_64/linux/release
. Check that you have writing right there!
Note
Spack load doesn't set LD_LIBRARY_PATH so running executable won't find libnetcdf. Also it doesn't set LDFLAGS=-Wl,-rpath (and Makefile doesn't honour LDFLAGS anyway), so libnetcdf path isn't linked in. So hack this in for now.
Running the code
#!/bin/bash
#PBS -N *my_pbs_job_name*
#PBS -l select=1:ncpus=1:ngpus=1:mem=32gb:nodepool=a100p
#PBS -l walltime=01:00:00
#PBS -q a100q
#PBS -W umask=027
# Change to running directory if required
cd *my_case_dir*
# Loading needed packages
. $(ls /opt/niwa/profile/spack_* | tail -1)
spack load netcdf-c@4.9.2%gcc@11.5.0 cuda@12.8.0
nclib=`nc-config --libdir`
export LD_LIBRARY_PATH="${nclib}:$LD_LIBRARY_PATH"
# Launch of the solver
./BG_Flood BG_param.txt
Basic PBS commands
Based on NASA, hecc website
The four most commonly used PBS commands, qsub
, qstat
, qdel
, and qhold
, are briefly described below. See man pbs for a list of all PBS commands.
qsub
To submit a batch job to the specified queue using a script:
Only one queue for GPUs for the moment. When queue_name is omitted, the job is routed to the default queue, which is the normal queue.The resource_list typically specifies the number of nodes, CPUs, amount of memory and wall time needed for this job.
See man pbs_resources
for more information on what resources can be specified.
Note
If -l resource_list
is omitted, the default resources for the specified queue is used. When queue_name
is omitted, the job is routed to the default queue, which is the normal queue.
qstat
To display queue information:
Each option uses a different format to display all of the queues available, and their constraints and status. Thequeue_name
is optional.
To display job status:
Flag | Description |
---|---|
a | Display all jobs in any status (running, queued, held) |
r | Display all running or suspended jobs |
n | Display the execution hosts of the running jobs |
i | Display all queued, held or waiting jobs |
u username | Display jobs that belong to the specified user |
s | Display any comment added by the administrator or scheduler. This option is typically used to find clues of why a job has not started running. |
f job_id | Display detailed information about a specific job |
xf job_id / xu user_id | Display status information for finished jobs (within the past 7 days). |
Tip
Some of these flags can be combined when you are checking the job status.
qdel
To delete (cancel) a job:
qhold
To hold a job:
Only the job owner or a system administrator with "su" or "root" privilege can place a hold on a job. The hold can be released using theqrls
command.
For more detailed information on each command, see their corresponding man pages.
NESI (Maui-Mahuika)
Depreciated
NESI supercomputer has now been closed and replaced by REANZ new generation of machines.
The code is actually running on New Zealand eScience Infrastructure (NeSI). This national center uses a module systems associated to the slurm job manager.
Compiling the code
The Code needs to be compile on the machine, using the sources from the github repository. Due to the code dependency to CUDA and netCDF, two modules need to be loaded:
-
On Maui:
-
On Mahuika:
Running the code
-
Example of a slurm file on Maui:
#!/bin/bash #SBATCH --job-name=MY-TEST-NAME #SBATCH --time=8:00:00 #SBATCH --account=MY-NESI-ACCOUNT #SBATCH --partition=nesi_gpu #SBATCH --gres=gpu #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=5GB #Running directory (to be completed) BGFLOOD=/nesi/project/XXXXXXXXXXXXXXX cd ${BGFLOOD} module load CUDA/11.4.1 module load netCDF-C++4/4.3.0-GCC-7.1.0 # Launching the executable srun ./BG_Flood_Maui echo "output_file = Output/${testname}/BGoutput-${reftime}.nc" echo "end of setup_run_BG.sh"
-
Example of a slurm file on Mahuika:
#!/bin/bash #SBATCH --job-name=MY-TEST-NAME #SBATCH --time=05:00:00 #SBATCH --account=MY-NESI-ACCOUNT #SBATCH --gpus-per-node=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=1GB #Running directory (to be completed) BGFLOOD=/nesi/project/XXXXXXXXXXXXXXX cd ${BGFLOOD} #module load netCDF-C++4/4.3.0-gimkl-2017a module load netCDF-C++4/4.3.1-gimpi-2020a module load CUDA/11.4.1 # Launching the executable srun ./BG_Flood_Mahuika echo "output_file = Output/${testname}/BGoutput-${reftime}.nc" echo "end of setup_run_BG.sh"