[{"categories":["Complete AI Workstation Configuration Guide"],"content":"This guide is part of a series on BIOS optimization, with reference to official AMD documentation. Its purpose is to help you optimize the BIOS settings of your AMD EPYC™ 7002 Series processor based system for the best possible AI workload performance. Reference： High Performance Computing (HPC) Tuning Guide for AMD EPYC™ 7002 Series Processors AMD EPYC™ 9005 BIOS \u0026 WORKLOAD TUNING GUIDE Characteristics of AI Workloads and Optimization Goals AI workloads, especially deep learning, have extremely high demands on computing resources, which are mainly reflected in: High floating-point computation: It is necessary to make full use of dedicated instructions like AVX2 and keep the core frequency at the highest possible level. High memory bandwidth and capacity: The loading and processing of large-scale datasets are highly sensitive to memory speed and NUMA topology. Efficient data transfer: Fast interconnects (e.g., PCIe) are crucial for offloading computing tasks to accelerators such as GPUs. Given these characteristics, the following are the BIOS settings that need special attention. Key BIOS Settings Explained in Detail 1. NUMA Configuration (NPS - NUMA Nodes Per Socket) Goal: Optimize data locality and memory bandwidth. AI workloads are extremely sensitive to data access patterns. Recommendation: For most AI workloads, it is recommended to set NPS to 2 or 4. NPS=2: Creates two NUMA domains per processor socket. This effectively utilizes memory bandwidth and is an excellent choice for balancing performance and versatility. NPS=4: Creates four NUMA domains per processor socket. This setting is suitable for highly parallel workloads where the dataset for each NUMA node is relatively small. By explicitly binding processes to specific NUMA nodes, you can achieve the ultimate in data locality. How to check: In a Linux system, you can use the numactl --hardware or hwloc-ls commands to verify the changed NUMA topology. Note However, in practice, it has been found that NPS=1 is a better choice for inference tasks. 2. Core Performance and Power Settings Goal: Ensure that CPU cores always run stably at the highest frequency and minimize latency. Performance mode: Set the “Determinism Slider” or a similar performance mode option to “Performance” or “Max Performance.” This prioritizes performance over energy efficiency, keeping the core clock high. C-states: Set “Global C-state Control” or “C-States” to “Disabled.” C-states are CPU power-saving modes. While they save energy, they introduce latency when cores need to wake up from a low-power state, which can affect the consistency of AI training. P-states: Look for settings like “P-states Control,” “DF P-states,” or “APBDIS.” Disable P-states or ensure that the processor is locked in its highest performance state (P0). This prevents the CPU frequency from dynamically adjusting with the load, ensuring the highest clock speed. 3. Memory Settings Goal: Maximize memory bandwidth while reducing latency. Memory speed: Configure the memory to run at the optimal latency supported by the motherboard and DIMMs, for example, DDR4-2933 MT/s. 3200 MT/s is not recommended because the Infinity Fabric clock for the 7002 series is 2933, and keeping the memory clock synchronized with the Infinity Fabric clock is a better choice for low latency. Memory interleaving: Ensure that the memory interleaving is configured to provide the best bandwidth. Generally, filling all 8 memory channels per socket provides the maximum bandwidth. For specific settings, refer to your system’s official documentation. 4. PCIe and IOMMU Goal: Ensure optimal communication between GPUs and other accelerators. PCIe generation: Set the PCIe slot used for the GPU to its supported highest generation and link width (e.g., PCIe Gen4 x16). Above 4G Decoding: Be sure to enable this option. This is crucial for systems with a large amount of GPU memory, allowing the system to correctly identify and map I/O memory over 4GB. IOMMU: If you are usin","date":"2025-08-23","objectID":"/en/ai-server/fb8e9b2/:0:0","tags":["AI-Workstation","BIOS Tuning"],"title":"BIOS Optimization","uri":"/en/ai-server/fb8e9b2/"},{"categories":["Complete AI Workstation Configuration Guide"],"content":"This series of AI workstations features a combination of SSD and HDD drives, and it’s crucial to allocate and optimize these resources effectively. System: Ubuntu-24.04 Linux System Partition Policy Resource Allocation Strategy The core idea is to use the fastest SSD for the system and frequently used applications, the second fastest SSD for high-performance data or backups, and the HDD for large-capacity cold storage. SSD 1 (Ubuntu system is already installed, 1TB) Usage: Operating system, applications, and a small amount of frequently used personal files. Allocation Plan: / (Root Directory): Allocate 100GB - 200GB. This will contain the Ubuntu system itself, all installed software, and some system caches. This size is more than enough for most users, even with a large number of applications installed. /home (User Home Directory): Allocate the remaining space on SSD 1 to /home. If you want to place a part of the /home partition on the HDD (e.g., only large media files), you can adjust this as needed. However, to maximize personal file access speed, it’s recommended to keep most of your frequently used personal files here. SWAP (Swap Space): Not recommended. SSD 2 (Empty, 1TB) Usage: High-performance workloads, important projects, virtual machines, game libraries, frequently accessed large datasets, fast backups, or as an extension of SSD 1 for high-speed storage. Allocation Plan: A single partition, mounted to a custom directory such as /mnt/ssd2 or /data/ssd_fast. Format it with the ext4 file system. Specific Usage Examples: Virtual Machine Images: If you run multiple virtual machines, putting them here will provide optimal performance. Large Game Libraries: Install your Steam library or other games on this drive. Video Editing/Graphic Design Workspace: Temporarily store project files and rendering outputs here. Code Repositories/Development Environment: If your projects depend on a lot of file I/O. Cloud Sync Folders: For services like Dropbox or Nextcloud, if your sync directory is large and requires fast access. Cache or Temporary Directories: For applications with very large caches, such as the Docker image storage directory. HDD (Empty, 4TB) Usage: Large-capacity storage, infrequently used data, data that doesn’t require high speed, archives, media files (movies, music), and long-term backups. Allocation Plan: A single partition, mounted to a custom directory like /mnt/hdd_storage or /data/archive. Format it with the ext4 file system. Specific Usage Examples: Movie, TV show, and music libraries. Photo archives. System backups (e.g., Timeshift backup target). Infrequently used old project files. Archived software installation packages. Installation Process Guide Planning from the start is the most effective approach, as it avoids the complexities of modifying existing partitions and migrating data. This is highly recommended. Back up all important data! We can’t stress this enough! Create a bootable Ubuntu installation USB drive. Boot from the USB drive and select “Something else” for manual partitioning. Example Partitioning Steps: SSD 1 (1TB): EFI System Partition (ESP): 512MB, FAT32, mount point /boot/efi. / (Root Directory): 100GB - 200GB, ext4, mount point /. /home (User Home Directory): Remaining space (approx. 800GB - 900GB), ext4, mount point /home. SSD 2 (1TB): /mnt/ssd2_data (Custom Directory): The entire 1TB, ext4, mount point /mnt/ssd2_data (you can also choose /data or another name you prefer). HDD (4TB): /mnt/hdd_archive (Custom Directory): The entire 4TB, ext4, mount point /mnt/hdd_archive (you can also choose /data_archive or /media/storage, etc.). Recommended Root (/) Partition Space Deciding how much space to reserve for the root (/) directory is a common question in Ubuntu reinstallation scenarios. This directory contains core operating system files, most installed applications, and various system configurations and temporary files. Given your setup with two 1TB SSDs and one 4TB HDD, a reasonable recom","date":"2025-08-19","objectID":"/en/ai-server/1b15ccc/:0:0","tags":["AI-Workstation","Partition Policy"],"title":"Linux System Partition Policy","uri":"/en/ai-server/1b15ccc/"},{"categories":["Complete AI Workstation Configuration Guide"],"content":"When it comes to choosing an operating system for AI workloads, Windows and Linux are two sides of the same coin, each with unique advantages and pain points. This summary is based on my personal experience over half a year, documenting a journey of switching back and forth between the two systems, which ultimately led to a valuable conclusion for any AI professional. Linux-vs-Windows The First Encounter: Ubuntu 22.04 LTS I initially chose Ubuntu 22.04 LTS for my AI workstation, a common choice in the AI field. The installation was smooth, and the interface felt modern. However, the honeymoon phase was short-lived, as a series of small but frustrating issues began to surface: Remote Desktop: Windows RDP worked, but the need to enter a randomly generated password every time was inefficient. Other remote tools were unstable due to resolution and configuration problems. Hardware Control: Managing the NVIDIA GPU fan speed became a major headache. The default temperature control policy was highly impractical, and none of the widely shared “fixes” online seemed to work, consuming a significant amount of time. Development Environment: The complex version dependencies between Python and PyTorch caused configuration issues. Desktop Experience: Configuring a stable Chinese input method was a struggle. Familiar Windows apps like Notepad++ had a poor user experience, and compatibility issues with Snap applications were frequent (e.g., VS Code cursor-input misalignment, PyCharm’s package manager failing to refresh). These seemingly minor issues accumulated, severely impacting my workflow. When the NVIDIA fan issue remained unresolved for an entire holiday week, I started to reconsider my choice. Round 1: The “Sweet Spot” of Switching Back to Windows Acting on impulse, I reinstalled Windows 11. The entire process took just two hours. Everything was plug-and-play, and the GPU fan curve was easily configured with the official vendor software. This seamless experience was a stark contrast to the struggles I had with Linux. Python version management and CUDA setup were also familiar and straightforward, with the whole development environment up and running in a day or two. Windows’ advantage was clear: an extremely low entry barrier and unparalleled compatibility. You can easily use a vast ecosystem of commercial software without worrying about hardware drivers. However, after three or four months, Windows’ “black screen on login” and occasional high CPU usage issues reignited my desire to switch back to Linux. In particular, a long-running compression task that pegged the CPU at 100% caused all fans to spin at full speed. When I discovered that there were very few IPMI control solutions for Windows, I realized that for heavy, stable workloads like AI, Windows wasn’t a sustainable long-term solution. Round 2: A New Beginning by Sticking with Linux Determined to solve all the issues, I returned to Linux. This time, I learned my lesson and chose the more forward-looking Ubuntu 24.04 LTS. The installation experience was surprisingly smooth, and by opting to install the NVIDIA driver during the setup, everything worked perfectly on the first try. The persistent NVIDIA fan control problem that had plagued me was now easily solved with a simple cool-bit=4 parameter. The Ubuntu 24.04 desktop experience is also a step up from 22.04, on par with Windows 11. At that moment, I finally understood that Linux’s initial pain isn’t insurmountable; you just need to find the right path. Windows vs. Linux: A Deep-Dive for AI Workloads Based on this challenging journey, I’ve summarized the key differences for AI workloads. This breakdown should help you make an informed decision. Comparison Metric Windows Linux Ease of Use Extremely high: Easy to install, plug-and-play, with rich application and hardware compatibility. Extremely low: Steep learning curve, requires manual configuration for basic functions, prone to user-error crashes. AI Environment Setup Difficult","date":"2025-08-18","objectID":"/en/ai-server/80b0c46/:0:0","tags":["AI-Workstation","OS"],"title":"AI Workstation OS Selection","uri":"/en/ai-server/80b0c46/"},{"categories":["Complete AI Workstation Configuration Guide"],"content":"For geeks who are passionate about AI development, having a powerful and highly expandable AI workstation is key to improving efficiency and exploring cutting-edge technology. This article will share a carefully planned and practically verified hardware selection plan, aiming to build a powerful machine with the ultimate cost-performance that can handle both daily use and future AI workload upgrades. AI workstation overall effect The core positioning of this workstation is very clear: a main machine that can completely replace Windows, with AI development as its primary task. To cope with future diverse AI workloads, we must reserve enough “room for growth” for the hardware configuration. This means it must be able to support at least four graphics cards, and the speed of all PCIe interfaces must not become a bottleneck. Therefore, our core requirement is: the CPU must provide at least 4×16=644 \\times 16 = 644×16=64 PCIe 4.0 lanes to ensure the efficiency of multi-card parallel computing. Core Components: The Perfect Balance of Performance and Scalability CPU: AMD EPYC™ 7542—The “King of Cost-Performance” in the Second-Hand Market AMD-EPYC-7000 CPU Faced with the strict PCIe lane requirements mentioned above, we quickly focused on the server-grade AMD EPYC series. Its high number of Serdes lanes and affordable second-hand price make it a standout in the workstation field. After considering the response speed for daily use, we chose the AMD EPYC™ 7542 processor, which has 32 cores and 64 threads. Its boost frequency of up to 3.4GHz ensures multi-tasking capabilities while also balancing single-core performance, truly achieving the best of both worlds. Motherboard: Supermicro H12SSL-i—Born for AI Supermicro-H12SSL-i PCB Having chosen a powerful EPYC CPU, the matching motherboard is naturally no ordinary board. The Supermicro H12SSL-i motherboard stands out with its astonishing expandability: it provides 5 PCIe 4.0 x16 slots and 2 PCIe 4.0 x8 slots, perfectly meeting the needs of multi-card deployment. Supermicro-H12SSL-i Functional Block Diagram At the same time, its 8 DIMM memory slots support up to 2TB of ECC memory, which provides a solid memory guarantee for loading large language models in the future (such as using llama.cpp) and completely eliminates “memory anxiety.” However, the advanced features of a server motherboard, such as IPMI, do require a certain learning curve, but this is undoubtedly worthwhile. Memory: From “Sufficient” to “Feeding” LLMs DDR4 ECC RDIMM RAM The initial configuration was 4 sticks of 32GB DDR4 3200MHz ECC memory, totaling 128GB. While this seemed quite sufficient at the time, in today’s LLM era, it is already stretched thin. Therefore, we strongly recommend upgrading the memory capacity to 512GB or even 1TB, which will greatly improve the efficiency of processing large models. Graphics Card: RTX-4060-16G—The “Sweet Spot” Card for AI Workloads NVIDIA RTX-4060-Ti-16G For AI workloads, video memory capacity is often more important than absolute performance. The NVIDIA RTX-4060-Ti-16G, with its large 16GB of video memory and relatively low power consumption, has become the best choice for initial investment. Its cost-performance for AI workloads is second to none. Initially, you can configure one card and then, based on workloads needs, gradually increase to four cards or replace them with higher-end graphics cards. Power Supply, Case, and Cooling: Ensuring Stable Operation Power Supply: Considering the future power needs of four cards at full load, we resolutely chose the Great Wall (长城) 2000W EPS2000BL consumer-grade power supply. As a core component, the stability of the power supply is critical, so this part must be new. Case: To accommodate the huge server motherboard, multiple graphics cards, and a powerful cooling system, we chose the Phanteks PK620 XL full-tower workstation case, which provides ample space for all future upgrades. Cooling: Stability is the cornerstone of long-term work. We c","date":"2025-08-07","objectID":"/en/ai-server/1558208/:0:0","tags":["AI-Workstation","hardware"],"title":"The Ultimate Guide to Cost-Effective AI Workstation Hardware Selection","uri":"/en/ai-server/1558208/"}]