Advances in storage technology have led to extremely large, fast devices. However, I/O speeds are not keeping pace and increasingly more time is needed to transfer data between storage devices and central processing units (CPUs). Computational storage, a form of near data processing, could potentially deal with this problem of data movement bottlenecks in edge infrastructure.
Computational storage moves a variety of functions, such as data filtering, compression, and image processing, closer to the edge where data is physically stored, according to Ian F. Adams, a research scientist on Intel Labs' Advanced Storage Research team.
“We are moving computation or functionality closer to storage, either in a storage server or in some cases all the way into a physical device, such as a solid-state drive (SSD),” said Adams. “It's a form of moving the compute to the data, except we're focusing on storage servers and storage devices.”
Computational storage allows parallel computation, which reduces I/O traffic and eases other constraints on existing compute, memory, storage, and I/O. For example, in a simple system with storage and computation servers communicating over a network, I/O links can quickly become clogged, starving the compute nodes and slowing down traffic. With computational storage, commands are sent to storage, returning much smaller results. This reduces the network load and improves speed, opening up additional resources and flexibility in how computations are distributed throughout the entire system.
Block-Compatible Design
However, computational storage has remained an elusive goal, according to Adams. While minimizing data movement by placing computation close to storage has quantifiable benefits, previous attempts have failed to take root in the industry. They either require a departure from the widespread block protocol to one that is more computationally-friendly, or they introduce significant complexity on top of the block protocol.
“We participated in many of these attempts and have since concluded that neither a departure from nor a significant addition to the block protocol is needed. Instead, we’ve used a block-compatible design based on virtual objects, which makes numerous offloads possible,” said Adams.
Adams and his team designed an extension to existing block protocols that does not interfere with existing legacy applications, while providing a mechanism for higher level applications and stacks to offload their computations. This extension is built around virtual objects. Like a real object, such as a file, a virtual object contains metadata needed to process the data. These virtual objects communicate file and object concepts to low level block storage for both input and output using the execute (EXEC) command. This command specifies the operations, such as image inference or a checksum. By returning results back to the application as regular data, a smaller amount of data is transmitted, significantly reducing traffic.
The team found that numerous offloads are possible using virtual objects, including checksum, word count, search, compress, video transcoding, image transformation, sorting, and merging. The same techniques can be applied to other forms of block storage, including disk drives, SSDs, disk arrays, and clustered storage systems.
Improves Bitrot Detection
The team tested the virtual object approach by evaluating the checksum offloads for bitrot detection in a non-erasure coded application, according to a paper by the team.
In bitrot detection, object data is hashed and compared against a previously computed hash to detect corruption from a variety of sources, including software bugs or latent sector errors. In traditional object storage with smaller direct attached storage devices, this overhead is relatively minimal. However, the trend toward larger storage devices and “diskless" storage stacks requires more throughput to effectively catch object corruption. This means that bitrot detection, traditionally a purely local operation, now incurs significant network traffic, which can contend with latency sensitive application traffic.
In experiments, the team integrated the checksum offload into object storage stacks. In addition to a 99% reduction in network traffic, scrubbing performance improved up to two-fold.
Adams and his team also are investigating several other application specific offloads, including visual data machine learning applications. In this experiment, the virtual objects method reduced the bandwidth required in an image inference pipeline by 90%. By performing image preprocessing inside the storage target and returning the smaller image ready for use in the inference engine, the amount of traffic in the inference pipeline was reduced.
The virtual objects approach may provide many advantages, including block compatibility and stateless protocol. This method does not modify the host file system or the block storage system READ/WRITE path. Instead, the application is given the option of using the stateless EXEC command. This simplifies the design and improves scalability. In addition, the solution extends to erasure coded block storage, and leverages the existing security mechanisms in the file and operating systems.
“Because this approach is scalable, cost-effective, and ecosystem-compatible, we believe that virtual objects offer a practical method for computational storage,” said Adams, whose team is one of many groups at Intel investigating computational storage methods.
Intel also is engaged in other computational storage efforts, including co-chairing the newly formed NVM Express computational storage task group, and participating in the Storage Networking Industry Association (SNIA) computational storage technical working group. Intel also is building exploratory prototypes collaboratively with customers and ecosystem partners to research and test the value of bringing compute closer into the storage layer in real-world data center use cases.
Adams will present a video demonstration on computational storage at Intel’s virtual booth at SC2020 on November 17-19.