Server Hardware Guide

Deep Learning is very computationally intensive, so you will need a fast CPU with many cores, right? Or is it maybe wasteful to buy a fast CPU? One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.

In my work on parallelizing deep learning I built a GPU cluster for which I needed to make careful hardware selections. Despite careful research and reasoning I made my fair share of mistakes when I selected the hardware parts which often became clear to me when I used the cluster in practice. Here I want to share what I have learned so you will not step into the same traps as I did.

GPU

This blog post assumes that you will use a GPU for deep learning. If you are building or upgrading your system for deep learning, it is not sensible to leave out the GPU. The GPU is just the heart of deep learning applications – the improvement in processing speed is just too huge too ignore.

I talked at length about GPU choice in, and the choice of your GPU is probably the most critical choice for your deep learning system. Generally, I recommend a GTX 680 from eBay if you lack money, a GTX Titan X (if you have the money; for convolution) or GTX 980 (very cost effective; a bit limited for very large convolutional nets) for the best current GPUs, a GTX Titan from eBay if you need cheap memory. I supported the GTX 580 before, but due to new updates to the cuDNN library which increase the speed of convolution dramatically, all GPUs that do not support cuDNN have become obsolete — the GTX 580 is such a GPU. If you do not use convolutional nets at all however, the GTX 580 is still a solid choice.

CPU underclocking on MNIST and ImageNet: Performance is measured as time taken on 100 epochs MNIST or half an epoch on ImageNet with different CPU core clock rates, where the maximum clock rate is taken as a base line for each CPU. For comparison: Upgrading from a GTX 580 to a GTX Titan is about +20% performance; from GTX Titan to GTX 980 another +30% performance; GPU overclocking yields about +5% performance for any GPUCPU

To be able to make a wise choice for the CPU we first need to understand the CPU and how it relates to deep learning. What does the CPU do for deep learning? The CPU does little computation when you run your deep nets on a GPU, but your CPU does still work on these things:

Needed number of CPU cores

When I train deep neural nets with three different libraries I always see that one CPU thread is at 100% (and sometimes another thread will fluctuate between 0 and 100% for some time). And this immediately tells you that most deep learning libraries – and in fact most software applications in general – just use a single thread. This means that multi-core CPUs are rather useless. If you run multiple GPUs however and use parallelization frameworks like MPI, then you will run multiple programs at once and you will need multiple threads also. You should be fine with one thread per GPU, but two threads per GPU will result in better performance for most deep learning libraries; these libraries run on one core, but sometimes call functions asynchronously for which a second CPU thread will be utilized. Remember that many CPUs can run multiple threads per core (that is true especially for Intel CPUs), so that one core per GPU will often suffice.

CPU and PCI-Express

It’s a trap! Some new Haswell CPUs do not support the full 40 PCIe lanes that older CPUs support – avoid these CPUs if you want to build a system with multiple GPUs. Also make sure that your processor actually supports PCIe 3.0 if you have a motherboard with PCIe 3.0.

CPU cache size

As we shall see later, CPU cache size is rather irrelevant further along the CPU-GPU-pipeline, but I included a short analysis section anyway so that we make sure that every possible bottleneck is considered along this pipeline and so that we can get a thorough understanding of the overall process.

CPU cache is often ignored when people buy a CPU, but generally it is a very important piece in the overall performance puzzle. The CPU cache is very small amount of on chip memory, very close to the CPU, which can be used for high speed calculations and operations. A CPU often has a hierarchy of caches, which stack from small, fast caches (L1, L2), to slow, large caches (L3, L4). As a programmer, you can think of it as a hash table, where every entry is a key-value-pair, and where you can do very fast lookups on a specific key: If the key is found, one can perform fast read and write operations on the value in the cache; if the key is not found (this is called a cache miss), the CPU will need to wait for the RAM to catch up and will then read the value from there – a very slow process. Repeated cache misses result in significant decreases in performance. Efficient CPU caching procedures and architectures are often very critical to CPU performance.

How the CPU determines its caching procedure is a very complex topic, but generally one can assume that variables, instructions, and RAM addresses that are used repeatedly will stay in the cache, while less frequent items do not.

In deep learning, the same memory is read repeatedly for every mini-batch before it is sent to the GPU (the memory is just overwritten), but it depends on the mini-batch size if its memory can be stored in the cache. For a mini-batch size of 128, we have 0.4MB and 1.5 MB for MNIST and CIFAR, respectively, which will fit into most CPU caches; for ImageNet, we have more than 85 MB for a mini-batch, which is much too large even for the largest cache (L3 caches are limited to a few MB).

Cyber Power CyberPower OR1500PFCLCD PFC Sinewave UPS 1500VA 1050W PFC Compatible Tower
Speakers (Cyber Power)
  • 1500VA/1050 Watts, Pure Sine Wave UPS system - designed to support Active PFC power supplies and conventional power supplies
  • Line interactive, AVR and GreenPower - Protects PCs, servers, network infrastructure. Prevents data loss and protects electronic equipment from harmful power problems.
  • Multifunction LCD provides runtime in minutes, battery status, load level and other status information. USB and Serial Communication Ports
  • Output Connections: (8) Battery Backup & Surge Protected Outlets
  • 3 Year Warranty
APC APC Smart-UPS RM SMT1500RM2U 1000W/1440VA 2U Rackmount LCD UPS System
Speakers (APC)
  • Output: Output Power Capacity: 1 Watts/1440 VA Max Configurable Power: 1 Watts/1440 VA Nominal Output Voltage: 120 V Efficiency at Full Load: 97%
  • Output Voltage Distortion: Less than 5% at full load Output Frequency (sync to mains): 57~63Hz for 60Hz nominal Crest Factor: Upto 5:1 Waveform Type: Sine wave Output...
  • Input: Nominal Input Voltage: 120 V Input Frequency: 50/60 Hz +/- 3Hz (auto sensing)
  • Batteries & Runtime: Battery Type: Maintenance-free sealed Lead-Acid battery with suspended electrolyte - leak proof Typical recharge time: 3 hours
  • Communications & Management: Interface Ports: Smart Slot, USB Available Smart Slot Interface Quantity: 1 and This Item is Non-Returnable
Synology America Corp Synology NAS DiskStation DS116 (Diskless)
Personal Computer (Synology America Corp)
  • Compact & quiet, blending into your small offices and home
  • Dual-core CPU with hardware encryption engine
  • Over 112.53 MB/s reading, .41 MB/s writing
  • Dual USB 3.0 ports for external storages and printers
  • Multimedia streaming via DLNA-certified media server
Gameband/Now Computing Gameband for Minecraft - Small
Boost (Gameband/Now Computing)
  • Play Minecraft anywhere, anytime on any computer: Your game, your Worlds, your mod launchers, all your Minecraft experience on your wrist.
  • Automatic back-up of your entire Minecraft folder. No complicated installs. Simply plug Game band into the USB port of your Mac, Windows or Linux computer and click...
  • Each Game band is customizable to showcase messages, images and animation on the LED display right on your wrist.
  • Game band comes in two sizes: Small and Large. Each holds up to 8GB of information.
  • If you have any questions about getting or using a Game band, Pixel Furnace, our Backup and Replacement Services, or for any other reasons you can dream up, contact...
Oeveo Uninterruptible Power Supply (UPS) Mount for APC, CyberPower, and More - 12" x 4.6" x 15" - Proudly Made in the USA
Speakers (Oeveo)
  • Custom Made to Fit Your APC, CyberPower, and other brand Uninterruptible Power Supply Units, UPS, Battery Backup Units, and Surge Protectors! SEE BELOW for Mounting...
  • Innovative Design: Side Mount your UPS to a wall or other stable surface. Under Mount your UPS under your desk, workstation, or home office.
  • Easy to Install, Screws and Anchors INCLUDED! Dimensions are 12.1 x 4.7 x 15 . Sleek, Modern, Flat Black Powder Coat Finish!
  • Proudly Made in the USA!
  • COMPATIBLE WITH: APC BR1300G Back-UPS Pro 10-outlet UPS, APC BR1500G Back-UPS Pro 10-outlet UPS, APC BR24BPG Back-UPS Pro External Battery Pack, CyberPower CP1500AVRLCD...

Related posts: