For the first part of this discussion, we will use a simple three-level abstraction to describe the following grid hardware:
- Nodes - A node is a computer in the traditional sense: a desktop or laptop PC, or a server in any incarnation: a self-standing pedestal, a rack module, or a blade, containing one or more CPUs in an SMP, NUMA, or ccNUMA configuration1.
- Cluster - a collection of related nodes.
- Grid - a collection of clusters.
The nodes in a cluster are connected via some fast interconnect technology. Before the introduction of InfiniBand* and PCI Express* technologies, there was a tradeoff between a relatively high-performance, single-sourced, expensive technology and an economical, standards-based, but lower-performance technology. Ethernet, a technology designed for networking, is commonly used in cost-constrained clusters. This setup introduces bottlenecks in parallel applications that require tight node-to-node coordination. The adoption of InfiniBand-based interconnects promises to remove this tradeoff.
The clusters in a grid can be connected via LAN technology, constituting an
intra-grid – a grid deployed within departmental boundaries – or connected by a WAN technology, constituting an
inter-grid that can span the whole globe.
This model includes boundary cases as particular instances: a grid consisting of exactly one cluster is exemplified by a cluster accessible to a large community, front ended with grid middleware. Through Web services technology, users in a HPC shop can submit jobs for execution through a single, local interface, not even realizing that the job may end up being executed thousands of miles away. In this way, it is possible for the supporting IT department to optimize costs across a number of facilities around the world, including outsourced service providers.
Conversely, a large cluster – even one that contains thousands of nodes – may not be a grid if it does not have the infrastructure and processes that characterize a grid. Remote access may need to be accomplished through relatively limited OS utilities such as
rlogin or
telnet or through customized Web interfaces
2.
A grid made up of single nodes defaults to the setup used in cycle scavenging. Cycle scavenging is discussed in another article in this series,
Part 2: Usage Models.
This three-tier node-cluster-grid model encompasses grids of greater complexity through recursion: grids of grids are possible, including grids with functional specialization. This functional specialization can happen at the lower levels for technical reasons (e.g., a grid might consist of nodes of a certain memory size) or for economic reasons (e.g., a grid might be deployed at a certain geographical location because of cost considerations).