There are two primary factors to consider when choosing a data science workstation: which tools and techniques you use the most and the size of your data sets.
When it comes to data science frameworks, higher core counts don’t always translate into better performance. NumPy, SciPy, and scikit-learn don’t scale well past 18 cores. On the other hand, HEAVY.AI (formerly OmniSci) will take all the cores it can get.
All of the Intel-based data science workstations use Intel® Core™, Intel® Xeon® W, and Intel® Xeon® Scalable processors that excel at data science workloads in real-world tests. You’ll get best-in-processor-family performance from all of them, which makes memory capacity your most important choice.
Data science frameworks balloon memory footprints two to three times. To get your baseline memory needs, examine your typical data sets and multiple by three. If you can work with 512 GB or less, you can get excellent performance in a desktop machine. If your data sets tend to be above 500 GB, you’ll want a tower with 1.5 TB of memory or more.
GPU accelerators shine at deep learning model training and large-scale deep learning inference. However, for the bulk of data science work—data prep, analysis, and classic machine learning—those GPUs sit idle because most Python libraries for data science run natively on the CPU. You do need a graphics adapter to drive your displays, but not a GPU appliance.
The cloud won’t give you the best performance unless you’re running on a dedicated VM or a bare metal server. Cloud instances present themselves as a single node, but on the back end, things are highly distributed. Your workload and data get split across multiple servers in multiple locations. This creates processing and memory latencies that degrade runtime. Plus, working with large data sets and graphs through a remote desktop is not an ideal experience.
Keeping the workload and data local, on a single machine, can deliver much better performance and a more fluid and responsive work experience.
You can, but you’ll burn immense amounts of time watching data shuffle between storage, memory, and the CPU. If you’re working in a professional environment, upgrading to an Intel® data science laptop or midrange desktop can be a time-saver. We intentionally tested and specced Intel® Core™-based data science laptops so that students, beginners, and AI makers could have an affordable option for developing and experimenting with open source AI tools.
You can run Python-based data science tooling faster on a standard PC using Intel-optimized libraries and distributions. They’re all part of the free Intel AI Kit.