Intel® oneAPI Deep Neural Network Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
Single op partition on CPU
This is an example to demonstrate how to build a simple op graph and run it on CPU.
This is an example to demonstrate how to build a simple op graph and run it on CPU.
Example code: cpu_single_op_partition.cpp
Some key take-aways included in this example:
how to build a single-op partition quickly
how to create an engine, allocator and stream
how to compile a partition
how to execute a compiled partition
Some assumptions in this example:
Only workflow is demonstrated without checking correctness
Unsupported partitions should be handled by users themselves
Public headers
To start using oneDNN Graph, we must include the dnnl_graph.hpp header file in the application. All the C++ APIs reside in namespace dnnl::graph.
#include <iostream>
#include <memory>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <assert.h>
#include "oneapi/dnnl/dnnl_graph.hpp"
#include "example_utils.hpp"
#include "graph_example_utils.hpp"
using namespace dnnl::graph;
using data_type = logical_tensor::data_type;
using layout_type = logical_tensor::layout_type;
using dim = logical_tensor::dim;
using dims = logical_tensor::dims;
cpu_single_op_partition_tutorial() function
Build Graph and Get Partitions
In this section, we are trying to create a partition containing the single op matmul without building a graph and getting partition.
Create the Matmul op ( dnnl::graph::op) and attaches attributes to it, including transpose_a and transpose_b.
logical_tensor matmul_src0_desc {0, data_type::f32};
logical_tensor matmul_src1_desc {1, data_type::f32};
logical_tensor matmul_dst_desc {2, data_type::f32};
op matmul(0, op::kind::MatMul, {matmul_src0_desc, matmul_src1_desc},
{matmul_dst_desc}, "matmul");
matmul.set_attr<bool>(op::attr::transpose_a, false);
matmul.set_attr<bool>(op::attr::transpose_b, false);
Compile and Execute Partition
In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior.
Create a dnnl::engine. Also, set a user-defined dnnl::graph::allocator to this engine.
allocator alloc {};
dnnl::engine eng
= make_engine_with_allocator(dnnl::engine::kind::cpu, 0, alloc);
Create a dnnl::stream on a given engine
dnnl::stream strm {eng};
Skip building graph and getting partition, and directly create the single-op partition
partition part(matmul, dnnl::engine::kind::cpu);
Compile the partition to generate compiled partition with the input and output logical tensors.
compiled_partition cp = part.compile(inputs, outputs, eng);
Execute the compiled partition on the specified stream.
cp.execute(strm, inputs_ts, outputs_ts);