As a Data Scientist with a PhD in physics, Intel® Software Innovator Daniel Whitenack has a knack for using his physics background to solve industry problems through data-driven applications and data infrastructure. With his focus on Google’s* Go language and working with open source project Pachyderm, Daniel has quickly began to thrive within the innovator program. Daniel took time out of his busy schedule to let us learn a bit more about him and what he does.
What got you started in technology?
I remember when we got our first desktop computer at home. I loved it. Mostly I played games, but I really enjoyed spending time on the computer. Then in high school I started taking some independent study sort of courses at our small school with the guy running IT. He taught us some visual basic, networking, etc. I think that was really the start that got me thinking about the things I could do with technology.
Tell us about your background.
My background is originally in physics. I did my Ph.D. in theoretical/computational physics at Purdue University, and, during that time, I developed a new many-body theory for calculating excited states of atoms and molecules. Eventually, however, I decided not to stay in academia. I did some consulting for a couple of years before I stumbled upon this world of data science. I learned that data scientists were using the skills I had from Physics in industry to solve interesting and challenging problems. That really appealed to me. Ever since, I have been developing machine learning models, data-driven applications, and data infrastructure.
Tell us about Pachyderm.
Pachyderm is an open source project for data pipelining and data versioning. You can create scalable data pipelines in any language or framework and even combine languages or frameworks in a unified data pipeline (e.g., modeling in Python and visualization in R). Further, by versioning the input/output of each stage of your pipeline, you have complete reproducibility and provenance. You can know with exact certainty what model produced what result and what training data was used to train that model, for example.
What projects are you working on now?
Right now I have quite a few projects going on. Of course, I'm working on the Pachyderm core and the corresponding ecosystem (examples, docs, tooling, etc.), but, outside of Pachyderm, I am working on a Go kernel for Jupyter*, and some Go wrappers for Intel® machine learning libraries. I'm also working on a lot of online and video content around containerized data science, data pipelining on modern infrastructure, and machine learning techniques.
Tell us about a technology challenge you’ve had to overcome in a project.
I think most of the technology problems I have experienced have been around workflow management. As a data scientist, you are constantly juggling data sources, formats, scripts, code, etc. It is easy to lose track of things and have breakdowns in integrity. This was particularly a problem at one position towards the beginning of my career as a data scientist. Eventually, a lot of these issues were solved by embracing modern architecture, languages, and frameworks like Docker and Pachyderm (I was actually a user before becoming a contributor and employee!).
What trends do you see happening in technology in your field?
I see a trend that is putting a lot of pressure on data scientists and data engineers to develop a better working environment and merge to some degree. Data scientists should be able to produce innovations and be able to efficiently push those into production pipelines. That is, they should have more end-to-end ownership of their workflows. This will greatly reduce inefficiencies and general friction within data-focused organizations.
How can Intel help innovators like you succeed?
Intel has already provided me with amazing support as an innovator. Because I work full time on an open source project, community involvement and collaborations are essential to my work. Intel has provided me with the opportunity to speak at relevant events, get feedback from the community, and collaborate with other innovators. Moreover, Intel provides me with access to and support on technologies such as Intel® Data Analytics Acceleration Library (Intel® DAAL), Intel® Math Kernel Library (Intel® MKL), and Intel® Nervana. These collaborations will continue to be valuable to the development of Pachyderm and my other projects.
What are you looking forward to doing with Intel as a part of the Innovator program?
In the near term, I'm looking forward to collaborating with Intel® Nervana on a great workshop at this year's PyCon. I think it going to be amazing. Other than that, I'm looking forward to creating some really interesting blog posts and getting connected with other members of the community.
Outside of technology, what is your creative outlet? Hobbies, etc.?
I love being outside and playing music. Ideally, I love playing music outside. I'm a big old time music buff, and I play banjo, mandolin, guitar, and a few other instruments.
Want to learn more about the Intel® Software Innovator Program?
You can read about our innovator updates, get the full program overview, meet the innovators and learn more about innovator benefits. We also encourage you to check out Developer Mesh to learn more about the various projects that our community of innovators are working on.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.