Introduction
Building a compute server for high-VRAM AI tasks (like video generation models such as Huyan, requiring 48–60 GB VRAM) demands careful GPU selection. Memory per dollar (VRAM per $) is a critical metric here – we want the most GB of VRAM for the least cost – without sacrificing too much in performance or multi-GPU scalability. All GPUs discussed are NVIDIA (for maximal framework compatibility and support with tools like Exo), and have ≥24 GB VRAM (with 48–80 GB options for larger models). We compare consumer, prosumer, and datacenter cards on VRAM/$, current pricing (new vs used), multi-GPU use, and other considerations (power, cooling, etc.), then recommend options for various budget tiers.
Key Considerations for High-VRAM GPUs
Before diving into specific GPUs, keep in mind the following factors:
-
VRAM per Dollar: We evaluate cost using both new retail prices and used market prices, as many high-VRAM GPUs (especially older datacenter cards) offer great value when bought used. For each GPU, we calculate an approximate cost per GB of VRAM to compare value. Used cards often provide much lower $/GB (sometimes under $10/GB on older models (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News)) than new cards. However, older/cheaper cards may have lower computational performance or limited support for newer optimizations.
-
Minimum VRAM: All GPUs here have at least 24 GB VRAM, which is the baseline for training or running large AI models. For 48–60 GB requirements, you can either use a single GPU with that much memory, or combine multiple 24 GB GPUs in a multi-GPU setup (with model partitioning across GPUs). Tools like Exo can partition models across devices, effectively leveraging combined memory without needing NVLink, although high interconnect bandwidth (e.g. NVLink) can help.
-
Multi-GPU Compatibility: If you plan to use 2, 4, 6, or 8 GPUs together, consider the physical form factor, cooling, and interconnect:
- NVLink Bridges: Some GPUs support NVLink bridging (usually 2 GPUs) to directly connect GPUs with a high-bandwidth link, enabling faster communication and even memory pooling in certain workloads. For example, two RTX A6000 cards can be NVLinked to provide a combined 96 GB memory space (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums). NVLink is supported on many professional GPUs (RTX A6000, Quadro RTX 6000/8000, older Titan/RTX 3090, etc.), but newest consumer/prosumer cards (RTX 40-series and RTX 6000 Ada) have dropped NVLink (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums) (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums). In multi-GPU setups without NVLink, GPUs still work in parallel but cannot directly share memory; frameworks like Exo or distributed training libraries must partition the model and exchange data over PCIe or system memory.
- Physical Form Factor & Cooling: Consumer GPUs (e.g. RTX 3090/4090) often have triple-fan open coolers and 2.5–3 slot sizes, making them tricky to pack densely. Professional/datacenter GPUs typically use blower-style or passive cooling and dual-slot or single-slot designs, better suited for tight multi-GPU chassis. Ensure adequate case airflow or consider water-cooling if using many consumer cards in one server.
- Power Requirements: High-end GPUs can draw 300–450 W each, so multi-GPU systems need robust PSUs. For instance, 4× RTX 3090 (350 W TDP each) can consume ~1400 W just for the GPUs, whereas older datacenter cards like Tesla P40 draw ~250 W each (still ~1000 W for 4). Plan power accordingly (and remember 8 GPUs can easily exceed standard ATX PSU capacities, often requiring multiple PSUs or server PSUs).
-
Performance vs Architecture: All these GPUs will run AI workloads, but newer architectures (Turing, Ampere, Ada) have features like Tensor Cores and faster memory that dramatically improve throughput for mixed-precision AI (FP16/FP8) and tensor operations. Older cards (Pascal and earlier) lack Tensor Cores and have slower memory, so even if they have large VRAM, their speed may be much lower for the same task. In fact, a Tesla P40 (Pascal) runs FP16/INT8 workloads much slower – one community report noted FP16 support on Pascal is about 1/64th the speed of a 4090 (Nvidia Tesla P40 and SDXL? : r/StableDiffusion - Reddit). Older GPUs may also not support newer compute features or might require using higher precision (and thus more memory) for the same model. (E.g. Maxwell-generation 24 GB cards effectively have to use FP32 for many ops, losing the memory advantage of lower precision (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News).) In short, Pascal or newer is recommended for modern deep learning frameworks (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News) to ensure compatibility with FP16/INT8 acceleration and software support.
-
Exo Compatibility: Exo (the distributed AI cluster software) is designed to work with a variety of devices (NVIDIA GPUs, even phones, etc.), and it dynamically partitions models across available resources. Any CUDA-capable NVIDIA GPU with sufficient memory should work with Exo. However, to fully benefit, the GPUs should have driver support for modern CUDA versions. NVIDIA is beginning to phase out support for pre-RTX architectures (Maxwell/Pascal/Volta are moving to legacy support in new CUDA releases) (Nvidia to wind down CUDA support for Maxwell and Pascal - Reddit). This doesn’t immediately break support, but it means no new optimizations for those architectures. For longevity and best Exo performance, prefer GPUs from the Turing generation or newer if possible.
With those factors in mind, let’s examine specific GPU options and their value metrics.
High-VRAM GPU Options (24 GB and Up)
We break down the options into two classes: 24 GB GPUs (which are common in consumer/prosumer and some pro cards) and 48+ GB GPUs (mostly professional/datacenter cards). All prices are approximate (in USD) and current as of Q1 2025, and VRAM-per-dollar “value” is computed for both new and used pricing where available.
24 GB GPUs (Consumer & Prosumer Cards)
These cards offer 24 GB VRAM per GPU. They are often the best bang-for-buck for most users, especially on the used market. Multiple 24 GB GPUs can be combined in a server to reach 48 GB or more total VRAM. Key options:
-
NVIDIA GeForce RTX 3090 (24 GB, Ampere): A popular consumer GPU from 2020 with 24 GB GDDR6X. As a gaming card, it also performs very well in AI (it has Tensor Cores and large memory bandwidth). New RTX 3090s originally launched at $1,499 MSRP; at present new stock is scarce and marked up (~$1,866 on Amazon ( RTX 3090 Price Tracker US - Mar 2025)). Used RTX 3090s are plentiful, often around $800–$900 on eBay ( RTX 3090 Price Tracker US - Mar 2025). That equates to roughly $35–40 per GB (used) – a strong value for a modern architecture. The RTX 3090 supports NVLink (you can bridge 2 cards for faster GPU-GPU transfers or potentially memory pooling in certain pro apps). Multi-GPU setups of 2–4× 3090 are common in workstations (though you’ll need a big chassis and ample cooling). Notes: 350 W TDP; usually 3-slot coolers (some blower models exist from OEMs); requires NVIDIA’s consumer drivers (which support up to 4 GPUs in Linux). This card is often recommended for budget-conscious ML builds (Cheap used GPUs for multi-GPU hobby ML rig - GPU - Level1Techs Forums), as it “provides 24GB of VRAM, cheap if used” (Cheap used GPUs for multi-GPU hobby ML rig - Level1Techs Forums).
-
NVIDIA GeForce RTX 3090 Ti (24 GB, Ampere): A slightly faster version of the 3090 with the same VRAM. It has a higher TDP (450 W) and was very expensive at launch ($1,999). As of 2025, new units are ~$1,990 (RTX 3090 Ti Price Tracker US - Mar 2025 - Best Value GPU) (scarce supply), and used ones around $1,150–$1,200 (RTX 3090 Ti Price Tracker US - Mar 2025 - Best Value GPU). That’s roughly $50/GB (used) – worse value than a 3090. Unless you find a 3090 Ti at a great discount, the standard 3090 is usually the better value pick, as the 3090 Ti is only ~5–10% faster. NVLink is supported (2-way). Notes: 450 W TDP, triple-slot cooling on most; slightly newer silicon than 3090 but same 24GB VRAM size.
-
NVIDIA GeForce RTX 4090 (24 GB, Ada Lovelace): The current flagship consumer GPU (2022). It has 24 GB GDDR6X and significantly higher compute performance than the 3090 (more CUDA cores, 4th-gen Tensor Cores). MSRP is $1,599; real-world new prices ~$1,650–$1,700 for base models (GPU Price Index 2025: Lowest price on every graphics card from ...). Used 4090s are not much cheaper (often $1,500 or more, as demand is high; in some regions second-hand 4090s even approach new price (What is 4090 ideally worth in 2025? : r/nvidia - Reddit)). That puts cost around $65–70/GB (new). Value per GB is lower than last-gen used cards, but you’re paying for top-tier performance. The 4090 notably has no NVLink, so multi-GPU usage is limited to software-based parallelism (which is usually fine for AI workloads, just with no unified memory). Despite that, 2× or 4× 4090 setups are viable for parallel workloads (many AI researchers build multi-4090 rigs). Cooling is a concern: 4090s are 450 W and bulky; for multi-GPU you’d likely use PCIe extenders or water cooling, or limit to 2 per machine. Notes: Best raw performance, but lowest VRAM/$ among 24GB cards. Great for when you need maximum speed and 24 GB is sufficient, but for purely maximizing memory per dollar, consider older cards.
-
NVIDIA Titan RTX (24 GB, Turing): A prosumer card from 2018 (Turing architecture) with 24 GB GDDR6. It was originally priced at $2,499 (targeted at researchers/prosumers) (GEFORCE RTX 3090 vs NVIDIA TITAN RTX - Linus Tech Tips). It’s essentially the predecessor to the 3090 (and has similar performance to an RTX 3080 in modern terms, but with double the VRAM). On the used market in 2025, Titan RTX cards can be found around ~$800–$1,000 (roughly $35–$42/GB, similar to a used 3090). The Titan RTX does support NVLink (2-way bridge to get 48 GB addressable, which some ML frameworks can utilize) and uses a dual-slot blower cooler – convenient for multi-GPU in a tower. Notes: 280 W TDP, blower fan (no external airflow needed), and no artificial driver limits on usage (though it uses the Game Ready or Studio driver, not a Quadro driver). Performance is about 40% lower than an RTX 3090 in CUDA tasks (GeForce RTX 3090 vs TITAN RTX - Technical City), so a cheaper 3090 is usually a better deal if available. But in constrained markets or certain pro workflows, Titan RTX is still a solid 24GB card. (The Quadro RTX 6000 24GB is essentially the same hardware with ECC memory and pro drivers – those are also found used ~$1,100 (NVIDIA QUADRO RTX 6000 24gb GPU for sale online | eBay), typically not worth the extra cost over a Titan/3090 unless you specifically need Quadro features.)
-
NVIDIA RTX A5000 (24 GB, Ampere): A professional workstation GPU from 2021, essentially an Ampere-based successor to the Quadro series. 24 GB GDDR6 with ECC. New price ~$2,000 (nvidia rtx a5000 gpu 24gb 8192 cuda cores memory interface 384 ...); used price about $1,300–$1,500 (Best Prices for NVIDIA RTX A5000 - CoinPoet.com). That’s ~$54/GB used, on par with a used 4090 – so not particularly high value unless you need its specific features. The A5000’s advantage is a blower cooler (2-slot) and lower 230 W TDP, making it easier to use in a multi-GPU workstation than a 3090/4090. It also supports NVLink (you can link 2× A5000 for 48 GB combined). In terms of performance, the A5000 is slower than a 3090 (it has about 8192 CUDA cores vs. 10496 on 3090), roughly ~70% of 3090’s throughput in FP16. Notes: Good option for multi-GPU setups in tight spaces or if you find a good deal used, but otherwise a used 3090 offers similar or better performance and value.
-
NVIDIA Tesla P40 (24 GB, Pascal): A datacenter GPU from 2016 (Pascal generation) that has become a surprising budget champion for VRAM-heavy workloads. It features 24 GB of GDDR5 and was designed for inference tasks. While its compute performance is far below modern GPUs, the P40 can be found extremely cheaply on the used market – around $150–$200 (often ~$180 on eBay (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News)). That’s an incredible
$7.5 per GB of VRAM (used), making it one of the best VRAM/$ options available. In fact, one enthusiast noted the Tesla P40 (24GB) delivers about 70% of the throughput of an RTX 4090 for large language model inference, at roughly 1/8th the cost (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News). Multiple P40s can be installed in a system (they do not support NVLink/SLI for memory pooling – NVLink was not available on this model), but Exo or other software can still partition models across them. P40s are Pascal architecture, so they lack Tensor Cores and have limited FP16 performance. They will run modern frameworks, but often have to use FP32 for calculations (e.g. a Pascal GPU might need almost twice the VRAM to run the same model in FP32 if newer GPUs run it in FP16). Still, for hobbyists on a shoestring budget who need 24GB VRAM, the P40 is very attractive. Notes: 250 W TDP, passive cooling (no fan onboard) – you must provide airflow (server chassis or a fan zip-tied to it as many do (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News)). Requires NVIDIA datacenter driver (or Linux driver) since it has no video outputs. Do not go older than Pascal – the previous-gen Tesla M40 (Maxwell, 24GB) is also dirt cheap ($100) but it cannot handle modern low-precision workloads (24GB M40 effectively behaves like 6GB for 16-bit models (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News) and has no CUDA support beyond sm_5.x which is being deprecated). So, Tesla P40 is the lowest-end recommendable GPU for 24GB VRAM on a budget (Cheap used GPUs for multi-GPU hobby ML rig - GPU - Level1Techs Forums).
(Other 24GB cards not detailed above include the NVIDIA Quadro P6000 (Pascal, 24GB) and Tesla P100 16GB SXM2 (Pascal, 16GB per card, not 24GB). The Quadro P6000 performs like a P40 with a blower cooler; it’s usually more expensive than P40 on secondary markets, so we focused on the P40. Similarly, the newer RTX 6000 Ada Generation (48GB) has a smaller sibling NVIDIA RTX 4000 Ada 24GB (sometimes called “L4” for datacenter), but those run ~$2K+ for 24GB at the moment (Cheapest (and best) used Nvidia GPU with 48GB VRAM for doing AI (LLM/SD)? : r/homelab), so they don’t compete in value.)
48 GB (and Higher VRAM) GPUs (Professional & Datacenter)
If you need 48 GB or more per GPU, you’re looking at professional/datacenter-class products. These have fantastic memory capacity, but generally very high prices, so the VRAM-per-dollar is lower. Often, it can be more cost-effective to use two 24GB GPUs (if your software can split the load) than one 48GB card. Still, certain scenarios (large models that are hard to parallelize, or simpler setup) call for a single GPU with 48+ GB. Here are the main options:
-
NVIDIA Quadro RTX 8000 (48 GB, Turing): Released in 2018 as a pro visualization GPU, this card has 48 GB GDDR6. It was ~$10,000 at launch, but as Turing ages, prices have come down. On the used market, Quadro RTX 8000 cards go for around $3,000–$4,000 in early 2025 (PNY NVIDIA Quadro RTX 8000, Black, Green, Silver (VCQRTX8000 ...) (NVIDIA Quadro RTX 8000 48GB GPU GDDR6 Express PCIe x16 Video Graphics Card | eBay). (We see listings at ~$3,999 open-box (NVIDIA Quadro RTX 8000 48GB GPU GDDR6 Express PCIe x16 Video Graphics Card | eBay), and some as low as $2,996 in new condition from resellers (PNY NVIDIA Quadro RTX 8000, Black, Green, Silver (VCQRTX8000 ...).) That’s roughly $62–$83 per GB used – about double the $/GB of a used 3090, but you get the convenience of 48GB on one card. It has a blower cooler (2-slot) and 260 W TDP, making it feasible to stack multiple in a workstation or server. NVLink is supported: two RTX 8000s can be bridged for 96 GB total addressable memory (useful for huge models, though NVLink on Turing gives ~100 GB/s bandwidth, so not as fast as merging on an 80GB A100, for instance). Notes: Still a powerful card (4608 CUDA cores, comparable to RTX 2080 Ti performance). Often the cheapest single-card 48GB solution if bought used. Great for content creation as well (has display outputs and pro drivers), if that matters.
-
NVIDIA RTX A6000 (48 GB, Ampere): A pro GPU launched in 2020, effectively the Ampere successor to the Quadro RTX 8000. It has 48 GB GDDR6 ECC and 10,752 CUDA cores (similar core count to a 3090, but lower clocked). New, the RTX A6000 retails around $6,000 (prices vary; ~$5,487 at retail vendors (Computer Graphics Cards Nvidia Quadro - Walmart)). On the used market, the A6000 still commands about $3,200–$3,500 ([W] used RTX A6000 : r/homelabsales - Reddit) (some eBay sales around $4K (Rtx A6000 48GB - eBay), but also enthusiasts in homelab forums looking at ~$3.2K deals ([W] used RTX A6000 : r/homelabsales - Reddit)). That’s about $65–$73 per GB – similar to the RTX 8000’s range. The RTX A6000 has a dual-slot blower (300 W TDP) and notably supports NVLink (2-way) just like the RTX 8000. Two A6000s can be connected for 96 GB combined at 112 GB/s (NVIDIA RTX A6000 For Powerful Visual Computing) (NVIDIA RTX A6000 + HW Sync - PNY Technologies). Many research labs liked A6000s for multi-GPU servers due to their large VRAM and lower power draw than A100s. Notes: For someone with a high budget, a used A6000 offers one of the best 48GB options (faster and more efficient than a Quadro RTX 8000, at similar cost). However, one caution: the newer “RTX 6000 Ada” (Ada Lovelace generation) removed NVLink (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums) – so if you anticipate needing NVLink/pooled memory, the Ampere A6000 is preferable to the Ada 6000 (the latter still has 48GB but no direct GPU-GPU bridge).
-
NVIDIA Tesla A40 (48 GB, Ampere): A datacenter GPU corresponding to the RTX A6000 (GA102 chip with 48GB). It’s a passively-cooled 300 W card for servers. Specs are very similar to A6000 (10752 CUDA cores, 48GB ECC). The A40 was primarily sold to enterprise, and on the secondary market it’s rarer than A6000. Recent price checks show them around $4,500–$5,000 used (e.g. an eBay sale at ~$4,900 ([PC] NVIDIA Tesla A40 48GB : r/homelabsales), with knowledgeable sellers suggesting ~$4,200 as a “steal” price ([PC] NVIDIA Tesla A40 48GB : r/homelabsales)). This is roughly ~$90+/GB. Unless you get a bargain, an A6000 is usually a better deal than A40 for essentially the same capabilities. Notes: Only consider if you have a line on cheap pulls from servers. Requires external cooling (designed for 4U servers with high airflow). NVLink is supported (likely 2-way, similar to A6000). Because A40 is headless, it’s ideal for pure compute servers (no video outputs).
-
NVIDIA Quadro GV100 (32 GB, Volta): An older (2017) professional GPU with 32 GB of HBM2 memory. It’s basically the workstation version of the Tesla V100. This card has 5120 CUDA cores and first-gen Tensor Cores (great for FP16). At launch it was extremely expensive (~$9k), but in 2025 you might find it around $1,500–$2,000 used (Nvidia Quadro GV100 Volta 32GB HBM2 Graphics AI Workstation ...) (** NVIDIA Quadro GV100 32GB HBM2 PCI-E 3.0 x16 Volta Graphics ...). For example, Amazon lists one at $1,949 (Nvidia Quadro GV100 Volta 32GB HBM2 Graphics AI Workstation ...), and some sites show used Quadro GV100 around $1,300 (if you can snag a deal) (graphics card - Quadro GV100 - 32 GB - 3ME26AA - Zones). At ~$1.5k, that’s about $46/GB, which is actually better value per GB than the 48GB Ampere cards (and similar to a used 3090’s $/GB). Performance-wise, a GV100 (Volta) won’t beat an Ampere or Ada card, but it does have Tensor Cores and 32 GB of fast HBM2, making it still very capable for AI training. NVLink is supported (you could pair two for 64GB). Notes: Dual-slot blower, 250 W. Requires NVIDIA’s pro drivers (or Linux drivers). Could be a sweet spot for someone who needs >24GB on one card but can’t afford $3k+ for 48GB – if you find one under $2k, it’s an excellent mid-high-end value for 32 GB VRAM. (Tesla V100 PCIe 32GB cards similarly have dropped in price: some used around $1.9k (Best Prices for NVIDIA Tesla V100 - CoinPoet.com). Just ensure it’s the PCIe version with a cooler, not the SXM form factor.)
-
NVIDIA Tesla V100 32 GB (Volta): The datacenter accelerator (PCIe card) with 32 GB HBM2. Performance is the same as Quadro GV100. These show up used from retired servers. Prices are in the ~$2,000 range as noted above. If you find one significantly cheaper (e.g. $1,500), it’s a great high-VRAM workhorse. Note it’s passive cooling usually (blower variants exist as “NVIDIA Titan V CEO Edition” which also had 32GB, but those were ultra-limited edition). Most will need server cooling or a fan mod.
-
NVIDIA A100 (40 GB or 80 GB, Ampere): These are datacenter/HPC GPUs (PCIe and SXM4 form factors). They are extremely powerful (with 3rd-gen Tensor Cores, etc.) and come in 40GB and 80GB memory configurations. However, cost is very high – well beyond the others. A100 40GB PCIe cards still go for ~$7k+ new, and used perhaps ~$4k–$5k if decommissioned (not common yet). The 80GB version is even more (well over $10k new). In terms of VRAM/$, that’s $100–$125 per GB or worse (for example, an Alibaba listing had ~$2,000 for 32GB V100, but $8,000 for a 80GB A100, i.e. $100/GB (NVIDIA A40 Enterprise Tensor Core 48GB 190W - Viperatech) ([PC] NVIDIA Tesla A40 48GB : r/homelabsales - Reddit)). Unless you truly need the single-card performance of the A100, it is not a good value option for a self-built server. You could get 4 used RTX 3090s (total 96GB across them) for the price of one A100 40GB. However, for completeness: if money is no object and you need 80GB on one card (to avoid multi-GPU splitting overhead), an A100 80GB or the newer H100 80GB (Hopper) would be the top choices. Just note that these often require specialized servers (for cooling and power), and some features may not be fully utilized outside HPC environments. For instance, the A100 supports NVLink and even NVSwitch (in HGX systems) to pool memory with high bandwidth, but if you’re building a one-off server, you likely won’t have NVSwitch fabric – you’d be limited to 2-way NVLink on PCIe A100s (which requires three NVLink bridges for two cards per NVIDIA’s documentation (NVIDIA RTX A6000 / RTX A5500 / RTX A5000 / A40 (A100) | A15958)). In short, A100/H100 are fantastic but poor in VRAM-per-$. They target enterprise buyers.
-
Others (Instinct, etc.): NVIDIA’s focus here precludes AMD cards, but it’s worth a quick note that some AMD GPUs like the MI60 (32GB) were mentioned in communities as budget alternatives (Cheapest (and best) used Nvidia GPU with 48GB VRAM for doing AI (LLM/SD)? : r/homelab) – however, their software support (ROCm) and performance for AI are very different. Since we focus on NVIDIA/Exo, we have not detailed those.
The table below summarizes the key specs and metrics for selected GPUs:
Comparison of NVIDIA GPUs (VRAM per Dollar Focus)
GPU | VRAM | Price (New) | Price (Used) | Value (VRAM per $) | Multi-GPU Scaling | Notes |
---|---|---|---|---|---|---|
Tesla P40 (Pascal) | 24 GB | N/A (datacenter only) | ~$180 ([Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News](https://news.ycombinator.com/item?id=35861109#:~:text=I%20don%27t%20know%20how%20anyone,run%2065B%20or%20larger%20models)) | 0.133 GB/$ (≅$7.5/GB) ([Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News](https://news.ycombinator.com/item?id=35861109#:~:text=I%20don%27t%20know%20how%20anyone,run%2065B%20or%20larger%20models)) |
Tesla M40 (Maxwell) | 24 GB | N/A | ~$100 ([Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News](https://news.ycombinator.com/item?id=35861109#:~:text=The%20P40%20is%20a%20LOT,Mac%2C%20and%20a%20lot%20cheaper)) | 0.240 GB/$ (≅$4.2/GB) | Yes (PCIe only; no NVLink) |
Titan RTX (Turing) | 24 GB | $2,499 (launch) | ~$900 (800–1000) | ~0.027 GB/$ (≈$37/GB) | Yes (NVLink 2-way for 48GB) | 280W, blower cooler. Originally prosumer flagship. Still decent compute but surpassed by 30-series. NVLink pooling supported. Often replaced by 3090 in builds due to similar VRAM and lower cost (GeForce RTX 3090 vs TITAN RTX - Technical City). |
GeForce RTX 3090 (Ampere) | 24 GB | $1,499 MSRP (current ~$1,650) (GPU Price Index 2025: Lowest price on every graphics card from ...) | ~$850 ( RTX 3090 Price Tracker US - Mar 2025) | 0.028 GB/$ (≅$35/GB used) ( RTX 3090 Price Tracker US - Mar 2025) | Yes (NVLink 2-way) | 350W, 3-slot cooler (some blower variants). Outstanding perf/$ for 24GB (Ampere + Tensor Cores). Used price is relatively low ( RTX 3090 Price Tracker US - Mar 2025). Popular for multi-GPU (ensure adequate cooling & PSU). |
GeForce RTX 3090 Ti | 24 GB | $1,999 MSRP | ~$1,180 (RTX 3090 Ti Price Tracker US - Mar 2025 - Best Value GPU) | 0.020 GB/$ (≅$49/GB used) (RTX 3090 Ti Price Tracker US - Mar 2025 - Best Value GPU) | Yes (NVLink 2-way) | 450W, large cooler. ~10% faster than 3090 but much higher cost and power. Only consider if priced near 3090. |
GeForce RTX 4090 (Ada) | 24 GB | $1,599 MSRP (street ~$1,650) (GPU Price Index 2025: Lowest price on every graphics card from ...) | ~$1,200–$1,400 (varies) | ~0.020 GB/$ (≅$66/GB new; ~$50/GB used) | Yes (PCIe only; no NVLink) | 450W, 3-slot. Fastest single GPU in general. 24GB VRAM is the max on consumer Ada. Good multi-GPU via software (Exo, etc.), but no direct memory pooling. Great for performance, average for $/GB. |
RTX A5000 (Ampere) | 24 GB | ~$2,000 (nvidia rtx a5000 gpu 24gb 8192 cuda cores memory interface 384 ...) | ~$1,350 (Dell - NVIDIA Quadro RTX A5000 (24GB GDDR6) Graphics Card) | 0.018 GB/$ (≅$54/GB used) (Best Prices for NVIDIA RTX A5000 - CoinPoet.com) | Yes (NVLink 2-way) | 230W, 2-slot blower. Ampere workstation card. Reliable and easy to stack, but expensive for the performance (a used 3090 outperforms it for less cost). ECC memory. |
Tesla V100 / Quadro GV100 (Volta) | 32 GB | ~$8,000+ (launch) | ~$1,500–$2,000 | ~0.016 GB/$ (≅$50/GB) (Nvidia Quadro GV100 Volta 32GB HBM2 Graphics AI Workstation ...) (** NVIDIA Quadro GV100 32GB HBM2 PCI-E 3.0 x16 Volta Graphics ...) | Yes (NVLink 2-way; NVSwitch in HGX) | 250W, blower or passive. 1st-gen Tensor Cores (great FP16). Used prices have fallen a lot – a very solid deal for 32GB if found ~$1.5k (offers good perf and VRAM) (graphics card - Quadro GV100 - 32 GB - 3ME26AA - Zones). NVLink bridge can combine two to 64GB. |
Quadro RTX 8000 (Turing) | 48 GB | ~$5,500 (launch) | ~$3,000 – $4,000 ([NVIDIA Quadro RTX 8000 48GB GPU GDDR6 Express PCIe x16 Video Graphics Card | eBay](https://www.ebay.com/itm/296811751821#:~:text=US%20%243%2C999)) | ~0.012 GB/$ (≅$83/GB at $4k) ([NVIDIA Quadro RTX 8000 48GB GPU GDDR6 Express PCIe x16 Video Graphics Card | eBay](https://www.ebay.com/itm/296811751821#:~:text=US%20%243%2C999)) |
NVIDIA RTX A6000 (Ampere) | 48 GB | ~$5,000 – $6,000 (Computer Graphics Cards Nvidia Quadro - Walmart) | ~$3,200 ([W] used RTX A6000 : r/homelabsales - Reddit) | ~0.015 GB/$ (≅$67/GB used) ([W] used RTX A6000 : r/homelabsales - Reddit) | Yes (NVLink 2-way for 96GB) | 300W, 2-slot blower. Ampere flagship pro GPU. Very high performance, 48GB ECC. NVLink (on Ampere version) allows combined 96GB (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums). Expensive but one of the best for 48GB. |
NVIDIA A100 (Ampere) | 40 or 80 GB | ~$10K–$15K (varies) | ~$5K+ (limited used) | 0.008 GB/$ (≈$125/GB for 40GB new) | Yes (NVLink; 2-way on PCIe, 8-way via NVSwitch in servers) | 250–300W, passive (data center only). Extremely fast for AI; 80GB variant has massive VRAM. Prohibitively expensive for most ($/GB is very high). Usually only in enterprise servers. |
GeForce GTX/RTX (others) | – | – | – | – | – | GPUs under 24GB VRAM (e.g. 16GB, 12GB cards) excluded from this table. For high-VRAM needs, 24GB is the minimum considered. |
Table Notes: Price (New) reflects either MSRP or typical current new price. Price (Used) is an estimate from recent secondary market listings. “VRAM per $” is calculated as GB divided by price in USD (higher is better); numbers in parentheses show equivalent $ cost per 1 GB. Multi-GPU scaling indicates if NVLink is supported for that GPU (which can aid in scaling or memory pooling).
Recommendations by Budget Tier
Finally, here are some recommendations depending on your budget and needs, keeping in mind value (VRAM/$) as the priority:
-
Budget Build (Under $500): Your best bet is to go with used datacenter cards from the Pascal era. The Tesla P40 24GB stands out as the top budget option – at roughly $180 each (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News), you could even get two for under $400 and have 48GB total (split 24+24). Performance will not match newer GPUs, but you’ll be able to load very large models on a shoestring budget. Do plan for cooling (P40s need server-style airflow or DIY fan solutions) and note the power draw (250W each). If you need 48GB in a single workload, Exo can partition the model across two P40s (no NVLink needed, just a slightly slower inter-GPU communication via PCIe). Bottom line: For pure VRAM per dollar, several Tesla P40s cannot be beat – they’re a common community choice for running large language models cheaply (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News). Just avoid going older than Pascal (e.g., Tesla M40 or K80) despite their low price, as their practical usability for AI is very limited (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News).
-
Mid-Range (Around $1,000): In this range, a used RTX 3090 24GB is an excellent choice. At
$800–$900, it provides a great balance of large VRAM, high speed, and modern features ( RTX 3090 Price Tracker US - Mar 2025). A single 3090 can handle most 24GB-requiring tasks and even some 30+ GB models with memory optimization. If you need more than 24GB, consider getting two RTX 3090s (total$1.5k). The Quadro GV100/Tesla V100 32GB offers slightly more memory (32GB) and good compute – if you find one near $1k, that’s a steal and arguably the best mid-budget single-GPU solution (32GB on one card, decent speed) (Cheapest (and best) used Nvidia GPU with 48GB VRAM for doing AI (LLM/SD)? : r/homelab). However, those deals can be sporadic. Generally, for simplicity, a 24GB GeForce like the 3090 is the go-to in this budget. It’s widely available and has the benefit of standard cooling (just ensure your case has space and airflow for a big card).$1,700) – this gives you 2×24GB and very strong performance. Two 3090s in NVLink could even present a combined 48GB to certain applications (though not all frameworks use NVLink to treat it as one memory pool; many will still require manual model split). Another mid-range path is older professional cards around $1k: for example, a Quadro RTX 6000 24GB ($1,100 used) (NVIDIA QUADRO RTX 6000 24gb GPU for sale online | eBay) or even a Quadro GV100 32GB ( -
High-End (>$2000): If budget permits a few thousand dollars, you have access to the 48GB-class GPUs. A used Quadro RTX 8000 (48GB) in the
$3K range would allow you to run 48GB models on a single card, no multi-GPU gymnastics needed (PNY NVIDIA Quadro RTX 8000, Black, Green, Silver (VCQRTX8000 ...). Similarly, a used RTX A6000 (48GB) for$6–7k total) gives you two very large-memory GPUs (each can hold 48GB model, or 96GB combined via NVLink for a single enormous model). It really depends on your workload. For pure value even at the high end, multiple consumer cards tend to win out in price/performance, whereas the professional cards win in convenience and sometimes reliability. One thing to note: RTX 4090s in multi-GPU – if you’re spending ~$4k, you could get ~3×4090 cards. They would outperform any of the options above in raw throughput, but you’d still be limited to 24GB per card (72GB total across 3). That might actually be a compelling high-end setup if your models can be split three ways by Exo (which is designed for distributed inference). So, for high-end budget, consider 2–4× RTX 4090 vs single or dual 48GB pro cards. The former gives more total compute; the latter gives more VRAM per GPU. As of 2025, the RTX 4090 at ~$1.6k new is actually better value than the pro 48GB cards in $/GB (because $/GB is similar but you get massive speed too). It’s only the memory limit that might sway you to pro GPUs if your models truly need >24GB on one device.$3.2K–$3.5K is a more powerful option ([W] used RTX A6000 : r/homelabsales - Reddit) – it’s Ampere-based and roughly on par with a 3090 in raw performance, but with double the memory. These cards shine for tasks like large diffusion models or video generation where 48GB on one GPU can simplify the workflow. They also typically use less power per GPU than running two consumer cards. If your aim is to build a multi-GPU server with 4–8 GPUs and money is less of an issue, you might consider mixing both approaches: e.g., 4× RTX 3090s vs 2× RTX A6000s – both give 96GB total VRAM. The 4×3090 route might be cheaper ($3.4k total used) and faster in aggregate compute, but the 2×A6000 route ( -
Extreme (Money-no-object): If you have a very large budget (tens of thousands), you might look at enterprise solutions like NVIDIA A100 80GB GPUs or even an NVIDIA H100. For example, an 8-GPU server with A100 80GBs would give you 640 GB of total VRAM (!) and the ability to tackle enormous models. However, the cost is astronomical and not remotely $/GB efficient – these are typically only justified in corporate or research contexts. With projects like Exo enabling distributed clusters, an alternative for an “extreme” setup is to network multiple cheaper servers together. For instance, instead of buying one $10k A100, one could buy 5–6 used 3090s and split them across two machines, using Exo’s cluster capabilities to utilize all of them on a single workload. That gives a huge combined VRAM (5×24=120 GB) and plenty of compute, albeit with some network overhead. The key point: scaling out with many mid-range GPUs can be more cost-effective than chasing the absolute top-of-line GPU. Unless your use-case specifically demands the fastest single GPU or the largest single GPU memory, it’s usually better to spend on multiple “value” GPUs and leverage multi-GPU software (which Exo makes easier).
Conclusion and Final Thoughts
For a high-VRAM AI compute server, NVIDIA GPUs offer a spectrum of choices. On the low end, older server GPUs like the Tesla P40 provide unmatched VRAM-per-dollar for those willing to tinker (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News). In the mid-range, the GeForce RTX 3090 (and its peers) strike an excellent balance of 24GB memory, strong performance, and reasonable cost ( RTX 3090 Price Tracker US - Mar 2025) – making it a workhorse for many AI enthusiasts. At the high end, 48GB professional cards (RTX 8000, A6000) become desirable if your workloads truly need that much memory on each GPU, though you pay a premium for the convenience (PNY NVIDIA Quadro RTX 8000, Black, Green, Silver (VCQRTX8000 ...). Multi-GPU considerations are crucial: think about how you will cool and power multiple cards, and whether your software (Exo in this case) can effectively utilize several GPUs without NVLink. The good news is that Exo’s model-parallel approach is specifically meant to “run larger models than you would be able to on any single device” by splitting them optimally (GitHub - exo-explore/exo: Run your own AI cluster at home with everyday devices ️⌚), so it will help you make the most of whatever GPUs you have – even if they are a mix of different types.
In summary, for best value at each tier:
- Budget: Tesla P40 24GB – for roughly the price of a mid-range gaming GPU, you could get 2–3 P40s and have 48–72GB of VRAM (just be mindful of their slower speed and higher power draw) (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News).
- Mid-range: Used RTX 3090 24GB – a single 3090 is often the smartest purchase, or two of them if you need ~48GB total ( RTX 3090 Price Tracker US - Mar 2025). They’re modern, fast, and plentiful second-hand.
- High-end: Used 48GB pro GPU (RTX 8000/A6000) – if your budget allows, getting that 48GB on one card is luxurious and sometimes necessary. Among these, whichever you find at a better price (Turing vs Ampere) – Ampere A6000 has the edge in performance if prices are close (Top GPUs in 2024 for Machine Learning Projects: Find Your Perfect Fit) (Which GPU Is The Best? RTX 4090, RTX 6000 Ada, RTX 3090, or ...). For pure performance at high budget, consider multiple 4090s, but remember you’re capped at 24GB per card.
Finally, rest assured that all NVIDIA GPUs discussed are compatible with multi-GPU setups and with Exo. The main differences are in how easily they integrate (NVLink availability, drivers, etc.), not whether they function at all. Exo will see your CUDA-capable GPUs and partition workloads as needed. So, choose the GPU (or mix of GPUs) that fits your budget, and you can leverage Exo to unify their power. With the right planning, you’ll maximize your VRAM per dollar and have a system capable of tackling those memory-hungry AI models. Good luck with your server build!
Sources: High-VRAM GPU pricing and performance data were referenced from NVIDIA’s official specs and community reports, including used-market price snapshots ( RTX 3090 Price Tracker US - Mar 2025) (Sorry for the off topic question, but does anyone know how to buy consumer hardw... | Hacker News) ([W] used RTX A6000 : r/homelabsales - Reddit), forum discussions on multi-GPU setups (Cheap used GPUs for multi-GPU hobby ML rig - GPU - Level1Techs Forums), and NVIDIA documentation on NVLink and GPU features (RTX A6000 ADA - no more NV Link even on Pro GPUs? - Raytracing - NVIDIA Developer Forums). These ensure up-to-date and real-world insights into the best GPU values for AI workloads.