Summary/Story at a Glance
|
People say 90% of the hard work in Big Data is data wrangling and data engineering. But after solving that, many hard barriers still remain. One such systemic barrier is access. Not everyone can get access to big data, nor can everyone afford the compute environment to use big data. But beyond these obvious and explicit barriers, there is a deeper, implicit barrier that fundamentally restricts who can access, interpret, and derive value from big data. That barrier is the domain knowledge needed to use and interpret data effectively. But without access, users cannot gain basic familiarity; gaining domain knowledge is simply a nonstarter.
What if there were technology that made access to big data available to anyone with a simple laptop, helping them gain familiarity, build expertise, and thereby democratize data-driven insights?
Dr. Jian Huang studies Visualization as a Service (VaaS) and leads a oneAPI Center of Excellence on that topic at the University of Tennessee, Knoxville. SINC America is their latest effort to use VaaS to democratize access to big data about society and empower user exploration.
→ Sign up for free and explore the SINC America application.
Overcoming the Complexities of Multi-Sourced Data
SINC America solves some chronic problems in the field:
- There is a mismatch between data collection efforts. Federal agencies and prominent foundations have many ongoing efforts to collect data nationwide and release it to the public. The US Census Bureau, Centers for Disease Control and Prevention (CDC), Federal Emergency Management Agency (FEMA), United States Department of Agriculture (USDA), and Agency for Healthcare Research and Quality (AHRQ) are just a few of the most well-known examples. Beyond the national efforts, there are many local efforts to collect data on topics important to each local community. Fundamentally, these datasets are collected with different levels of granularity, such as state, county, census tract, zip code, school district, or even just on points (i.e., a sparse set of geographic locations). Traditionally, efforts to fuse such disparate datasets have had limited scalability due to cost. SINC uses visualization as the medium of integration, such that users can visualize diverse data seamlessly in one location, regardless of the varying granularities in disparate data collection efforts.
- The existing databases are big and continue growing in size and complexity. SINC America already includes 4800 variables describing every location in the United States, but that total set is still expanding. While the value of big data is obvious to users, it has become infeasible to just release the data as the primary way of democratizing access, because no user can get value out of big data without having sufficient compute power. Using SINC, users only need a simple laptop or even just a tablet to access. They can visualize and interact freely with the full SINC dataset. Users can flexibly put in an address and query by location as well. Whether a user is creating a research report about their local community, studying the severity or distribution of a social issue, or evaluating alternatives to a company relocation or retirement destination, SINC America empowers users to visualize hard numbers as they derive insights.
Scalable Ease Access Visualization as a Service
The VaaS technology focuses on solutions for scalable parallel rendering using standard cloud environments. Similar rendering tasks, employing parallel compute acceleration, typically use conventional HPC systems. Due to the heterogeneity and runtime dynamics of elastic clouds, conventional methods for parallel scalability are insufficient for cloud-based parallel rendering. The VaaS technology resolved that challenge by developing new swarm-based methods for parallel rendering. In addition, the COE has developed the technology toolchain, which includes flexible deployment to public cloud, such as AWS, and on-premises clusters. The toolchain is publicly available as an open-source project, Substrate. The development process of SINC America has leveraged Intel® Tiber™ AI Cloud and Intel® Rendering Toolkit.
The CoE focused on the Intel Rendering Toolkit and the development of the toolchain, Substrate, to deploy Rendering Toolkit-based rendering services through Substrate. In that context, the team used Intel Rendering Toolkit and Intel Tiber AI Cloud extensively—the former as the core rendering engine and the latter as the testing platform, including support for Intel GPUs in the rendering services. The CoE’s VaaS technology are demonstrated through cloud-based 3D renderings of a large-scale city scene from a dataset called Model America and a large-scale terrain scene of the Great Smoky Mountains National Park to highlight species presence and biodiversity.
As a demo of the VaaS cloud architecture serving the geosocial use, SINC America provides succinct APIs so that geographic and social data visualization can become interactively available and scalable at low cost. From this respect, SINC America demonstrates the scalability and versatility of cloud-based insight delivery, while also working as an independently useful tool for anyone curious to ask questions about their communities.
Dr. Huang’s research group, Seelab, at the University of Tennessee, plans to maintain SINC America long-term. They are inviting users with additional public-domain datasets to contribute to the ongoing growth of the full SINC datasets, so that our entire society can benefit from easy access and visually compelling representation of its vast collection of geosocial information.
|
|
|
|
Figure 1: Sample visualizations about Austin, Texas on a census tract level from Sinc America. The variables shown are from US Census American Community Survey. Full list of Sinc America’s variables is accessible at: https://sinc.seelab.org/variables. Users can freely query location by address or by directly interacting with the map.
Performance Results & Benefits
Using VaaS technology, SINC America has made it trivial for users to interact with data that used to be hard to use, static, and disconnected. Compared with users independently writing standard monolithic Python* code to render a single variable in the SINC dataset, SINC’s parallel rendering system delivers rendering results 50x faster. In addition, this ability enables users to effortlessly choose multiple variables to visualize, hypothesize, and compare to draw potential correlations. The development cost is also dramatically lower by using the VaaS architecture. The development process for the SINC America system took only 2 person-months on a half-time basis.
Solution Summary
SINC America was created to demonstrate the vision of oneAPI, faster prototyping speed, lower deployment cost, flexibility of iteration, and ease of integration. However, as demonstrated, this new technological vision also enables broader social impacts. It enables users and developers to think bigger about what could be “affordable” to all, opening new frontiers of democratizing access. As SINC American continues to develop, especially when combined with the latest AI research at the University of Tennessee, the oneAPI vision will continue further.
Acknowledgement
The University of Tennessee oneAPI Center of Excellence would like to acknowledge great capabilities enabled by the Intel Tiber AI Cloud and support from the engineering support team. The CoE’s work on VaaS has considerably benefited from the oneAPI Rendering Toolkit, which is now open source.