GPU Plotting is Real – and Very Fast

  • Tháng mười một 21, 2023

Summary

  • GPU plotting is here from Chia and madMAx – and it is very fast. K=32 plots in 1.5 minutes on a single high-end GPU + 256GB of RAM.
  • GPU performance and i/o bandwidth are growing at a rapid pace
  • Plot grinding, spoofing plots with GPU, can be theoretically performed once phase 1 is under 28 seconds, but isn’t economical and requires a lot of continuous energy
  • Chia blockchain has constants set up to prevent plot grinding. The ultimate protection against plot grinding without hurting honest plotters is to keep it uneconomical
  • Chia is proposing a CHIP to reduce the plot filter to ensure that plot grinding remains uneconomical for the foreseeable future

Intro

GPU plotting is now real, with a few community plotters in the wild, and it is very fast. We are seeing single plot times of 80-90 seconds, with PCIe 4.0 x16 higher-end GPUs (RTX 3090 or 4070 Ti) which is about half of the single fastest plot time recorded on a CPU (Intel’s brand new 4th gen Xeon Scalable processors, codename Sapphire Rapids). This is exciting because a GPU is much more accessible and significantly less expensive than a top-of-the-line server, and this development reduces the barrier of entry for Chia farming, as plotting is typically one of the hardest parts of the entire process. GPU plotting will also reduce the energy consumption for plotting by up to 3x. The temporary memory requirement for a k=32 doesn’t fundamentally change, still around 256GB of temporary storage for an uncompressed phase 1 of plotting.

The way that most of these GPU plotters operate is by taking a table, splitting it up into buckets, similar to how CPU plotters work, and passing those buckets to the GPU to perform the sorting, matching, and compression functions required in the plotting phases. The GPU implementation is actually very fast at performing these with modern cards, and the limiter ends up being either raw GPU performance, PCIe bandwidth, or memory bandwidth. With a smaller amount of DRAM, disk io is needed to offload to a temporary file, and this ends up being a performance bottleneck and slows down the process by 2-3x, but it still is much faster than a desktop CPU at plotting.

GPU performance

GPUs have been getting faster every generation thanks to the increased amount of GPU compute units and the power efficiency of the silicon itself. The latest generation of NVidia 4090 cards have almost 2x the compute capability of the 3090, with an average CAGR (compound annual growth rate) of performance at 25% between 2016 and 2022.

Figure 1: CUDA Performance over time

PCI Express (PCIe) is the mainstream general-purpose i/o interconnect for high-speed components for compute, storage, and network. It is also getting a whole lot faster, with high-performance SSDs, CXL is now moving to the PCIe express bus for cache coherent memory expansion, accelerators, AI, and 400Gbs networking. PCI-SIG expects the i/o bandwidth to double every three years or so. PCIe 5.0 devices are now emerging, where a x16 link can do 64GB/s in each direction, similar to a few channels of DDR5 memory.

Figure 2: PCI Express Bandwidth. Source: PCI-SIG

Say, for instance, that we have infinite compute capability and perfect software efficiency and are only limited by the PCIe bandwidth. To complete phase 1 of a plot, around 500GB of data needs to be downloaded to the GPU, and about 360GB sent back to the host CPU. This puts the theoretical limit for a PCIe 4.0 x16 card at about 20 seconds and a PCIe 5.0 x16 card at 10 seconds. With a POC, we actually obtain around 24 seconds to transfer the exact amount of data required for phase 1 on a PCIe 4.0 GPU. The time from a PCIe spec released to many vendors shipping high-volume production parts is usually about three years. The PCIe 5.0 base specification was released in 2019. We are now seeing both AMD and Intel support PCIe 5.0 on the consumer platforms, as well as the recently launched server platforms. The first PCIe 5.0 devices, like SSDs, are hitting the market this year.

Figure 3: Phase 1 data transfer time

Cheaper and more efficient plotting

GPUs make plotting 2.5x more efficient than the previous most efficient in memory plotting with CPU, and 5x more efficient than desktop plotting. We are measuring plotting efficiency in energy (kWh) per terabyte plotted. This means farmers can get their space plotted faster, reduce the electricity cost for plotting, and reduce the global energy consumption for plotting.

We look at an example below

  • AMD 5950X, PCIe 4.0 x16, 128GB DDR4, 2x 980 Pro NVMe, 3060Ti
  • 265W during plotting
  • 4 minute k=32
  • 31.4 TB/day at 0.2kWh per TB
  • Total cost to replot 200TB is 6.37 days and $5.67
    • Compare to high end desktop CPU today with bladebit disk at 0.64kWh/TB and 26.5 days !!
Figure 4: Measured GPU power while plotting

You can compare estimated replotting costs and energy in a spreadsheet here.

A bit too fast – Plot Grinding

One can begin creating a plot after a signage point is released, and attempt to complete the plot before the infusion point. This is detailed on the Chia docs site. They then delete the plot after obtaining the quality rating (or after submitting the proof if it’s eligible). This would allow them to create a plot that automatically passes the filter, effectively allowing them to farm without storing any space. This only becomes feasible if phase 1 of a plot can be completed in less than 28 seconds (before the infusion). While this isn’t exactly an attack on the network posing a security risk, it is just an attempt to use compute instead of space. We refer to this as “plot grinding”.

Say a plot can be created in under 28 seconds that passes the plot filter. This would be the equivalent of having 1 * plot filter number of plots minus the two signage points missed. While the GPU is trying to grind the first one, it must ignore the other challenges. This gives a leverage factor of ⅓ * plot filter constant. The leverage factor is equivalent to the number of plots being spoofed, so we can easily calculate the amount of spoofed space in TiB or TB by multiplying by a k=32 plot size. There is no double dipping on compression because it doesn’t apply to phase 1, which is needed for plot grinding.

The real leverage comes in when a plot is created in under 18.75 seconds (realistically, there are probably a few seconds of overhead for filter grinding and others, so in practice, it is probably more like 15 seconds). It would seem like this leverage is ⅔ * the plot filter, because you miss one out of the 3 signage points, but there is a trick. At the second signage point, a plot is generated that can pass both filters for the first and second challenges by creating a plotid (by creating many BLS keys) and then putting them into the SHA256 filter hash that meets both criteria for passing the filter. At time t[2] you start plotting something where the filter passes challenge 1 and 2 (c1 & c2) at time t[4] something that passes c3 and c4. This trick can be extended to getting a phase 1 under the signage point time of 9.375 seconds, where all three challenges can be attempted if a plot id is created that hashes meet the criteria of passing all three filters (today 512^3). Thankfully it requires a very large cluster of GPUs and a tremendous amount of power to perform something like this today, and it is not economical even with the extended leverage.

 

phase 1 time plot filter leverage factor (plots) space spoofed (TiB)
> 28.125 seconds 512 N/A 0
28.125 512 171 16.9
18.75 512 512 50.7
9.375 512 1536 152.0
t <  9.375 512 9.375 / t * 3 or 3.5 Plots * 101.3GiB / 1024
Table 1: Plot Grinding Leverage and Spoofed Space vs. Plot Time

 

Today the filter is 512 plots, so spoofing those is around 55 terabytes of storage. The profitability of this is exactly the same as having 55 terabytes of honest space, but this requires more hardware cost (capital expenditures) and a tremendous amount more energy! 55 terabytes is three 18TB hard drives at 5.6W each, for a total of 16.8W. A mid-range GPU consumes 150-200W, plus the platform power (motherboard, CPU, DRAM), putting a single GPU system at about 20x the power of running disks.

Figure 5: Low plot times favor honest plotters reducing energy cost, but too fast enables plot grinding

Plot Grinding Economics

Like Chia farming, and other cryptocurrency mining, a total cost of ownership (TCO) model can help you understand the costs. The equipment required to attempt plot grinding would be a workstation platform, 256GB of DRAM, and multiple PCIe 4.0 GPUs, and the cost can be estimated easily. The speed of the plot creation determines how much space can be spoofed, and then the Netspace and xch price can provide the profitability.

If the GPUs are already owned, then power costs (operational expenditures) are the only cost. The profitability of spoofing capacity has to be greater than the cost of the electricity to run.

Plot grinding really gets concerning for Chia if it can be profitably performed on a single GPU, because a desktop with a PCIe x16 slot is readily and cheaply available. Servers and workstations that support PCIe 4.0 are still fairly expensive compared to a desktop. Most GPU miners for other coins don’t have these setups readily available.

Plot grinding Spreadsheet here

Plot Filter

Plot grinding is an attempt to turn PoST into PoW. Thankfully, Bram anticipated this, and we have many constants in Chia to protect against this. The important constants chosen in Chia are block time, minimum k size on the network, and the plot filter. The most important protection from this is to make it incredibly unprofitable and not economical, without hurting the honest plotters by requiring more resources and time to plot. Profitability is proportional to the plot filter! A reduction in the plot filter instantly makes plot grinding infeasible for years to come.

Here is a reminder of how the plot filter works in farming.

  • Farmer receives a challenge from VDF
  • Farmer sends signage point to harvester
  • Harvester applies plot filter to reduce the i/o required on disk (1/512)
  • For plots that pass the filter, harvester performs proof quality check
  • If quality meets required iterations (from difficulty) then proof of space is good
  • Fetch entire proof of space
plot filter bits = sha256(plot_id + challenge_hash + sp_hash)

The filter is extremely effective at reducing disk i/o, the farming storage workload analysis showed a disk farming Chia is 99.75% idle, and only consumes 0.5 IOPS, around 350x lower than the random seek capability of a modern hard drive. The constant of 512 was designed on the conservative side to make farming as energy efficient as possible. The only downside to the plot filter is its leverage for plot grinding since the plot filter can be calculated before a plot gets created. A 2x or 4x decrease in the filter would increase disk io by the same factor, which is not an issue for a modern hard drive farming. The filter does have a unique interaction with plot compression, though, as all plots that pass the filter have to go through a decompression step before fetching the full proof of space, generating missing matches on the fly during a proof quality check. A decrease in the filter would increase the plots that pass the filter, but fortunately, the impact is universal and hits the large farmers with high compression levels the hardest.

By Design – Keeping PoST Energy Efficient

There are many knobs in PoST that can prevent plot grinding. We evaluated changing the minimum k size on the network (32 today), reducing the plot filter, adding more plotting tables to the Chia proof of space, and changing the Chia Proof of Space algorithm entirely. Although we will not discuss the other proposals here, we did thoroughly analyze the options.

Summary: We recommend reducing the plot filter over time to ensure plot grinding is never economically viable

In the rows are the criteria we are attempting to impact, and in the columns are protocol changes we can implement. This is more for folks to learn how Bram and I thought through this plot grinding problem, and the protocol changes that we can tune to prevent it.

Bram & JM’s rationale.

 

Proposal 1: filter reduction  Proposal 2: increase k Proposal 3: plot grouping Proposal 4: increase tables Proposal 5: enhance harvesting
economic plot grinding proportional proportional proportional maybe small improvement no effect
51% attack plot grinding proportional proportional plus threshold proportional maybe small improvement no effect
honest plotting cost none proportional no effect significant no effect
required replotting none yes yes/some yes no effect
honest harvesting cost proportional no no effect small increase small improvement
custom plotting advantage no yes no effect small increase no effect
custom harvesting advantage yes (e.g. GPU) no no effect qualitative improvement significant improvement
hard fork yes no no yes no
min plot size no effect proportional proportional increase no effect

 

Filter reduction: proportional decrease viability of plot grinding with every effective halving of the filter. Decreasing the filter will impact disk io (which is already extremely low) and plot decompression (cpu/gpu cycles on harvester).

Increase K size: raise the limit to k=33 and increase every 2-3 years to keep pace with computing improvements

New Proofs of Space (Increase tables, plot grouping): Significant changes to the proof of space format or consensus: During the research for proof of space, many plot formats were evaluated. The current Chia plot format represents the best of that 2 years of research. There are consensus options like requiring proofs of space to come in pairs, or a plot format change to increase the number of tables. Neither of these has the same magnitude of the effect of changing the constants like the filter or k value.

Enhance Harvesting: Moving the decompression to the farmer, rather than the harvester.

Criteria

  • Economic Plot Grinding: spoofing fewer plots makes plot grinding proportionally less profitable, as xch rewards scale with effective space
  • 51% attack plot grinding: make it extremely expensive and infeasible (not enough available resources) to 51% the network via plot grinding
  • Honest plotting cost: reduce the physical compute, memory, and storage resources required to create a plot. Plotting efficiency (energy utilization, time) scales with resource use and is largely driven by the minimum k size.
  • Required replotting: will the change require old plots to be invalid
  • Honest harvesting cost: does the change make harvesting and farming require more resources and energy consumption
  • Custom advantage for plotting: will a custom solution offer significant gains over a regular plotter
  • Hard fork: is the change backward compatible with the network. If something was not valid in the current consensus, and we are making it valid, this requires a hard fork.
  • Minimum plot size: ideally the file size is a small as possible without impacting the security of the network to incentivize more participants

Summary

GPU plotting in the Chia network has emerged and it is much faster and more efficient than traditional CPU plotting. This will reduce the energy consumption for plotting by 2-5x, and significantly decrease the plotting time for farmers, decreasing the barrier of entry to Chia and increasing decentralization. GPU performance and i/o bandwidth are increasing rapidly, with future GPUs that can perform phase 1 of plotting in under 28 seconds, making it possible for plot grinding, despite not being economical/profitable. However, the Chia blockchain has constants set up to prevent plot grinding, and we propose a CHIP to further reduce the profitability of plot grinding in the future.

Introducing the CHIP for Plot filter

Plot grinding can theoretically be attempted with either two PCIe 4.0 x16 GPUs at full bandwidth, or one PCIe 5.0 GPU. The latter is not yet available from AMD or NVIDIA. We recommend changing the plot filter to stop plot grinding from ever becoming economically viable.
Plot grinding, the plot filter, and the proposed reduction thereof are outlined in the newly proposed CHIP.

original source – www.chia.net/2023/01/20/gpu-plotting-is-real-and-very-fast/

You must be logged in to post a comment.