Skip to content

Big Data Transfers to HPC

Overview

Researchers on the HSRN can make use of the high-bandwidth connections to increase the efficiency of big data transfers to and from High Performance Computing (HPC) resources. Specifically, the Green Data Transfer Node (gdtn.hpc.nyu.edu) should be utilized for high bandwidth transfers and is directly connected to the HSRN. The available filesystems on this node are /home, /scratch, and /archive. More info can be found in the HPC documentation.

This guide showcases a comparison of speeds over different file transfer methods between a device on NYU-NET and another on HSRN.

Please note that big data transfers will be much more I/O-bound than Network-bound. If you do not see high performance, consider examining systems statistics to determine the culprit.

Globus

Globus is the preferred method of transferring large files to HPC. To use Globus to transfer files to HPC, you would typically install a Globus Personal Endpoint on the source machine, then use the Globus web console to initiate the transfer from the source machine to HPC's Globus Server Endpoint, which can be found by searching for the collection nyu#greene and selecting the appropriate filesystem target. See the HPC documentation for Globus.

Simply select the files you wish to transfer in the web UI, then click Start. This file transfer is automatically parallelized. Additional options such as Encryption are available in the Transfer & Timer Options.

Below is an example of transfer and its performance over HSRN (~16 Gbps).

Task Type: TRANSFER
Status: SUCCEEDED
Source: hsrn-ed10d-7e12
Destination: Greene scratch directory
Label: n/a
Request Time: 2023-12-21 22:27:54.762225 (UTC)
Completion Time: 2023-12-21 22:28:49.309727 (UTC)
Files Transfered: 84
Directories Transfered: 56
Bytes Transfered: 113571284237
Effective Speed: 2082062057 Bytes per Second

Over NYU-NET (on a much smaller single file), we see transfer rates of 95 MBps, (around 99.5% slower).

Task Type: TRANSFER
Status: SUCCEEDED
Source: hsrn-dev10a-7e12
Destination: Greene scratch directory
Label: n/a
Request Time: 2024-03-06 18:53:51.108966 (UTC)
Completion Time: 2024-03-06 18:57:38.851705 (UTC)
Files Transfered: 1
Directories Transfered: 0
Bytes Transfered: 21510801277
Effective Speed: 94452194 Bytes per Second

SCP

Secure Copy (SCP) is a standard file transfer tool that utilizes SSH to transfer files over a network. This can be used to get files around the HSRN securely, however is single threaded, and is typically slower than other options.

An example scp <FILE> <NETID>@gdtn.hpc.nyu.edu/scratch/<NETID>

would yield a rate similar to the following:

iot_23_datasets_small.tar.gz              100% 8940MB 311.2MB/s   00:28    

If you are transferring data over WAN, avoid SCP and use encrypted Globus instead.