NVIDIA DGX-1 CLUSTER

dgx-1-p100Introduction:

The NVIDIA DGX-1 is a deep learning system, architected for high throughput and high interconnect bandwidth to maximize neural network training performance. The core of the system is a complex of Eight Tesla V100 GPUs connected in the hybrid cube-mesh NVLink network topology. In addition to the eight GPUs, DGX-1 includes two CPUs for boot, storage management, and deep learning framework coordination. DGX-1 is built into a three-rack-unit (3U) enclosure that provides power, cooling, network, multi-system interconnect, and SSD file system cache, balanced to optimize throughput and deep learning training time.

NVLink is an energy-efficient, high-bandwidth interconnect that enables NVIDIA GPUs to connect to peer GPUs or other devices within a node at an aggregate bi-directional bandwidth of up to 300 GB/s per GPU: over nine times that of current PCIe Gen3 x16 interconnections. The NVLink interconnect and the DGX-1 architecture’s hybrid cube-mesh GPU network topology enables the highest achievable data-exchange bandwidth between a group of eight Tesla V100 GPUs.

Vendors:

OEMNVIDIA Corporations.

Authorized seller – LOCUZ Enterprise Solutions Ltd

Hardware Overview:

GPUs – 8 x Tesla V100
GPU Memory – 256 GB total system
CPU – Dual 20-core Intel Xeon E5-2698 v4 2.2 GHz
NVIDIA CUDA cores – 40,960
NVIDIA Tensor cores (on V100 based systems) – 5,120
System Memory – 512 GB 2.133 GHz DDR4 RDIMM
Storage – 4 x 1.92 TB SSD RAID-0
Network – Dual 10 GbE

Performance – 1 Peta FLOPS.[Mixed Precision] 🔗 Read More

TESLA V100 GPU (NV-LINK) Performance Single V100 GPU TOTAL ( 8 * V100 GPU)
Single Precision Up to 7.8 TFLOPS Up to 62.4 TFLOPS
Double Precision Up to 15.7 TFLOPS Up to 125.6 TFLOPS
Deep Learning(Mixed Precision) Up to 125 TFLOPS Up to 1 PFLOPS
 Software Overview:

Ubuntu 16.04 Linux OS – Linux x86_64 Platform

🔗Deep Learning Frameworks
🔗Softwares
🔗Job submission System
How to Use DGX1:

Accessing the system:

The NVIDIA-DGX1 cluster has one login node,nvidia-dgx, through which the user can access the cluster and submit jobs.
The machine is accessible for login using ssh from inside IISc network.
ssh <computational_userid>@nvidia-dgx.serc.iisc.
ac.in

The machine can be accessed after applying for basic HPC access, for which:

  • Fill the online HPC application form here & submit at Room: 109, SERC.
  • HPC Application form must be duly signed by your Advisor/Research Supervisor.
Location of DGX 1 Cluster: 

CPU Room – Ground Floor, SERC, IISc