• <More on Intel.com
Masthead Light

Intel Graph Analytics Solutions

Intel® Datacenter Software  |  Intel Graph Analytics Solutions


Intel® Graph Builder for Apache Hadoop* Software v2

Intel® Graph Builder for Apache Hadoop* Software v2 simplifies creation of graph data models, enabling data scientists to focus on solving business problems, rather than formatting data. Prebuilt libraries automate workflows for cleaning and transforming data and constructing graph models with high throughput parallel processing using Hadoop. Once built, the graph models can be operated on using a wide variety of graph databases, analytic engines, and visualization tools. By automating laborious custom workflows while substantially removing the complexities of cluster computing for constructing graphs from Big Data, Intel Graph Builder helps speed the time to insight for data scientists, using powerful graph analytics.

Intel Graph Builder libraries utilize the popular Apache Pig* scripting environment to simplify the data preparation pipeline, from data import and cleansing to feature transformation to graph construction. In order to process Big Data, data scientists typically employ MapReduce* programming techniques. Programming using MapReduce can be complicated for unfamiliar users and time-consuming for experienced ones. With Intel Graph Builder and the Pig environment, the process is simpler than writing a Java* application and creating custom routines. This lets data scientists more easily operate on data at scale and ensure it is clean, formatted properly, and transformed into the desired features with all graph connections properly assembled.

Intel Graph Builder can create several types of graphs to represent a range of real-world problems, including property graphs, where vertices and edges may contain additional pertinent information. The final graph output can be consumed by a wide range of graph analytics and visualization tools through the widely supported Resource Description Format (RDF). Intel Graph Builder also includes a connector that parallelizes the loading of the graph output into the Aurelius Titan* open source graph database—which further speeds the graph processing pipeline through the final stage.

Titan is scalable and thus can operate over multiple nodes, so parallel loading can be extended across multiple, clustered database nodes. Because Intel Graph Builder is open source, users can extend bulk load capabilities into graph database offerings.


Solutions Built with Intel® Graph Builder for Apache Hadoop Software v2

  Functionality

  Details

  Hadoop* Components 
  Required

  Built on Apache Hadoop 1.2.1 (MapReduce*, Pig*
  0.12.0, HBase 0.94.12)

  User Programming
  Environment

  Apache Pig scripts and/or Java* UDF extensions
  in Apache Pig

  Input Format Parsing Types

  Pig libraries for XML, CSV, TSV, JSON
  (+ other user defined)

  Type of Graphs Supported

  Directed and undirected graphs; multirelational
  graphs with vertex and edge properties and labeled
  edges

  Data Cleansing

  Pig libraries for:

  • String manipulation
  • Null checks
  • Table manipulations
  • Common math operators

  Output Format

  RDF triples (on HDFS), Edge list (text on HDFS)

  Graph Database Connector

  Aurelius* Titan Graph Database



Volume, Velocity, Variety, and Value

Intel is working closely with the open source community and ecosystem partners to ensure the value and potential of Big Data are accessible to everyone. With new software tools, enterprises and research initiatives will be able to access the data analytics capabilities and operational efficiencies previously only accessible only to large service providers.

Intel’s commitment to Big Data includes:

  • Providing advanced performance and optimizations for applications running on Intel® Xeon® E5 and Intel® Xeon® E7 processors
  • Gathering and integrating data from multiple data sources (structured and unstructured) in a common file system
  • Enabling disparate databases, software tools, and software stack layers to work together Lowering the barriers to adoption for organizations and developers entering the arena of Big Data
  • Speeding response times and throughput for graph analytics and machine learning at full scale
  • Enabling cloud, networking, and storage for Big Data stores

Intel Distribution for Apache Hadoop Software

The Intel Distribution for Apache Hadoop software is the only distribution built from silicon up to enable the widest range of data analysis on Apache Hadoop. It is the first with hardware-enhanced performance and security capabilities, and the only open source platform for Big Data with support from a Fortune 100 company. The code has been optimized for the latest hardware platform technologies, including crypto acceleration, SSD storage, and 10GbE networking, enabling deployments that support data confidentiality at minimal encryption overhead. Management capabilities have also been added to make Hadoop easier to deploy and operate. Learn more >

Delivering Real-Time Performance and Manageability for Enterprise-Ready Big Data

Big Data and Graph Analytics
Applying graph analytics to Big Data with Apache Hadoop* streamlines data analysis and yields powerful insights.

See how graph analytics gets value from Big Data >

Intel Graph Builder for Apache Hadoop* Software - Video

Intel® Graph Builder for Apache Hadoop* Software v2
Intel® Graph Builder for Apache Hadoop* Software v2 automates many data-preparation tasks and quickly readies data for powerful analysis.

See how Intel Graph Builder tackles Big Data >

Learn About Big Data

View More

Conversations