NVIDIA Announces the H100 NVL — Maximum Memory Server Card for Large-Language Models
ChatGPT currently runs on A100 chips with 80 GB cache. Nvidia thought this was too slow and developed the much faster H100 chip (H100 is twice as fast than A100), which has 94GB of cache. They then put two of these chips on a single card, with high-speed connections between them to get a total 188GB cache.
Hardware is becoming more and more impressive.
NVIDIA is currently rolling out new products that are based on its Hopper and Ada Lovelace graphics cards, which were introduced last year. The company announced today a new accelerator variant of the H100 that is specifically designed for large language models users. It’s called the H100 NVL.
The H100-NVL is a variant of NVIDIA’s H100 PCIe Card that is targeted at a single market, large language models (LLM). This card is atypical of NVIDIA’s typical server offerings. It’s two H100 PCIe cards that are already connected together. But the biggest difference is its large memory capacity. The dual-GPU combined card has a total of 188GB HBM3 memory, 94GB on each card. This is more memory per GPU that any other NVIDIA product to date.
Memory capacity is the driving force behind this SKU. Memory capacity is a major issue for large language models such as the GPT family, which will quickly fill an H100 accelerator to accommodate all their parameters (175B for the largest GPT-3 model). NVIDIA decided to create a new SKU of the H100 that has more memory per GPU. The H100’s are limited by 80GB.