Benefits of Cloud GPU over Consumer GPU
24/7 Uptime
Guaranteed quality in a server
Server-Grade Engineering
> Lower operating voltage for longer reliability
> Zero-error testing at aggressive clocks
> Error Correction Code (ECC) for data integrity
Consumer GPU
PC-Grade Engineering
Reduced thermal stress for uncompromised reliability
Forced Air Cooling Design
> Designed for maximum airflow in a server
> Lower GPU temperature for reliability
> Lower power consumption
Active Fan Design
> Fan works against server airflow
> GPU runs up to 30-40% hotter, increasing failure rate
> Higher power consumption
Higher data center availability and serviceability
Dynamic Page Retirement
> Monitors and removes bad memory with simple reboot
N/A: Need to physically remove GPU with bad memory
Scalable Performance
Scalable Application performance across nodes
GPU Direct RDMA
> Direct transfers between GPUs
> 67% lower latency
> 5X higher GPU-to-GPU MPI bandwidth
N/A
Strong scaling performance in a node
NVIDIA NVLink™
> 5X higher GPU-to-GPU bandwidth
> Linear strong scaling for lots of GPUs
N/A
Deploy large models
> Up to 32 GB HBM2 for Tesla V100
4 GB to 12 GB