Installation of BG_Flood

Warning

BG_Flood has been written in CUDA language, C++ based language created by NVIDIA to interact directly with their GPUs. Even if the code can run on CPU (for testing purposes for example), it will be performant on NVIDIA GPUs. The best performances are observed on large NVIDIA GPUs on supercomputers.

The code has only two dependencies:

CUDA
netcdf

Windows 10 - 11

On windows OS you should be able to use the binaries/executable we make available in each release. Simply download and unzip the file in a suitable directory and either add the folder to your PATH or move the dlls and .exe around where you want to run (you might need to unblock it before first use).

Build from source

To build BG_Flood from source on Windows you will need to have pre-install:

Visual Studio Community with C++ component installed
Compatible (Cuda toolkit)[https://developer.nvidia.com/cuda-toolkit]
Downloaded/cloned/forked source of the repo
Netcdf developer install (i.e. netcdf.h and netcdf.lib)

Setup on Visual Studio

start a new empty project
add CUDA build dependencies to the project
add NetCDF folder(s) to the include and library directories in the project properties
add "netcdf.lib" to the input (Properties -> Linker -> Input)
switch the "Generate Relocatable device code" to Yes (Properties -> CUDA C/C++ -> Common)
disable deprecation add _CRT_SECURE_NO_WARNINGS to preprocessor definition (Properties -> C/C++ -> Preprocessor)

Linux

Make sure you have latest CUDA Toolkit, g++ and NetCDF libraries installed.

sudo apt-get install nvidia-cuda-dev
sudo apt-get install g++
sudo apt-get install libnetcdf-dev

Note

Make sure the GPU driver being used is the Nvidia driver!

Do a quick comand line test to see if nvcc (CUDA compiler) is available from here.

If not, you may need to modify the cuda path in the makefile (line 155) :

NVCC          := nvcc -ccbin $(HOST_COMPILER)

Warning

The code can compile for multiple GPU architecture but later compiler do not support old GPU (2.0 is no longer supported). If needed, remove unsupported architecture in line 213 of the makefile.

Then just type

make

Success

Many warning will show up but that is OK...

Supercomputers

The code can be run on local machines with NVIDIA GPU but it will get better performance by running on large GPU.Below are example of installation and running procedures on HPC the develloper used.

ESNZ supercomputer: Cascade

This machine is set-up using stack and all tools need to be install through it before compiling/running the code. The PBS job manager is used (see man pbs for more information).

Compiling the code

. $(ls /opt/niwa/profile/spack_* | tail -1)
spack load netcdf-c@4.9.2%gcc@11.5.0 cuda@12.8.0
nclib=`nc-config --libdir`
export LD_LIBRARY_PATH="${nclib}:$LD_LIBRARY_PATH"

Go in the folder where BG_Flood sources have been cloned and compile:

cd BG_Flood

make -j 10

Note

The executable is copied to a bin two folders up: cp BG_Flood ../../bin/x86_64/linux/release. Check that you have writing right there!

Note

Spack load doesn't set LD_LIBRARY_PATH so running executable won't find libnetcdf. Also it doesn't set LDFLAGS=-Wl,-rpath (and Makefile doesn't honour LDFLAGS anyway), so libnetcdf path isn't linked in. So hack this in for now.

Running the code

#!/bin/bash

#PBS -N *my_pbs_job_name*
#PBS -l select=1:ncpus=1:ngpus=1:mem=32gb:nodepool=a100p
#PBS -l walltime=01:00:00
#PBS -q a100q
#PBS -W umask=027

# Change to running directory if required
cd *my_case_dir*

# Loading needed packages
. $(ls /opt/niwa/profile/spack_* | tail -1)
spack load netcdf-c@4.9.2%gcc@11.5.0 cuda@12.8.0
nclib=`nc-config --libdir`
export LD_LIBRARY_PATH="${nclib}:$LD_LIBRARY_PATH"

# Launch of the solver
./BG_Flood BG_param.txt

Basic PBS commands

Based on NASA, hecc website

Link to initial documentation

The four most commonly used PBS commands, qsub, qstat, qdel, and qhold, are briefly described below. See man pbs for a list of all PBS commands.

qsub

To submit a batch job to the specified queue using a script:

%qsub -q queue_name job_script

Only one queue for GPUs for the moment. When queue_name is omitted, the job is routed to the default queue, which is the normal queue.

The resource_list typically specifies the number of nodes, CPUs, amount of memory and wall time needed for this job.

See man pbs_resources for more information on what resources can be specified.

Note

If -l resource_list is omitted, the default resources for the specified queue is used. When queue_name is omitted, the job is routed to the default queue, which is the normal queue.

qstat

To display queue information:

%qstat -Q queue_name
%qstat -q queue_name
%qstat -fQ queue_name

Each option uses a different format to display all of the queues available, and their constraints and status. The queue_name is optional.

To display job status:

Flag	Description
a	Display all jobs in any status (running, queued, held)
r	Display all running or suspended jobs
n	Display the execution hosts of the running jobs
i	Display all queued, held or waiting jobs
u username	Display jobs that belong to the specified user
s	Display any comment added by the administrator or scheduler. This option is typically used to find clues of why a job has not started running.
f job_id	Display detailed information about a specific job
xf job_id / xu user_id	Display status information for finished jobs (within the past 7 days).

Tip

Some of these flags can be combined when you are checking the job status.

qdel

To delete (cancel) a job:

%qdel job_id

qhold

To hold a job:

%qhold job_id

Only the job owner or a system administrator with "su" or "root" privilege can place a hold on a job. The hold can be released using the qrls command.

For more detailed information on each command, see their corresponding man pages.

NESI (Maui-Mahuika)

Depreciated

NESI supercomputer has now been closed and replaced by REANZ new generation of machines.

The code is actually running on New Zealand eScience Infrastructure (NeSI). This national center uses a module systems associated to the slurm job manager.

Compiling the code

The Code needs to be compile on the machine, using the sources from the github repository. Due to the code dependency to CUDA and netCDF, two modules need to be loaded:

On Maui:

module load CUDA\11.4.1
module load netCDF-C++4/4.3.0-GCC-7.1.0

On Mahuika:

module load CUDA/11.4.1
module load netCDF-C++4/4.3.1-gimpi-2020a

Running the code

Example of a slurm file on Maui:

#!/bin/bash
#SBATCH --job-name=MY-TEST-NAME
#SBATCH --time=8:00:00
#SBATCH --account=MY-NESI-ACCOUNT
#SBATCH --partition=nesi_gpu
#SBATCH --gres=gpu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=5GB

#Running directory (to be completed)
BGFLOOD=/nesi/project/XXXXXXXXXXXXXXX

cd ${BGFLOOD}

module load CUDA/11.4.1
module load netCDF-C++4/4.3.0-GCC-7.1.0

# Launching the executable
srun ./BG_Flood_Maui

echo "output_file = Output/${testname}/BGoutput-${reftime}.nc"

echo "end of setup_run_BG.sh"

Example of a slurm file on Mahuika:

#!/bin/bash
#SBATCH --job-name=MY-TEST-NAME
#SBATCH --time=05:00:00
#SBATCH --account=MY-NESI-ACCOUNT
#SBATCH --gpus-per-node=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB

#Running directory (to be completed)
BGFLOOD=/nesi/project/XXXXXXXXXXXXXXX

cd ${BGFLOOD}

#module load netCDF-C++4/4.3.0-gimkl-2017a
module load netCDF-C++4/4.3.1-gimpi-2020a
module load CUDA/11.4.1

# Launching the executable
srun ./BG_Flood_Mahuika

echo "output_file = Output/${testname}/BGoutput-${reftime}.nc"

echo "end of setup_run_BG.sh"