In this episode, Darren talks to returning guest Gretchen Stewart, Chief Data Scientist, Public Sector, Intel, about operationalizing AI projects.
Gretchen is an excellent example of someone who continually learns and adapts. Her undergraduate degree is in mathematics. She has a Master’s degree in business, and she completed a program at Harvard just a few years ago focusing on data science which led to her position as Chief Data Scientist at Intel in the public sector. She’s worked in the technology field for over 20 years, starting with software engineering, and spent 15 years in the federal space.
She finds working in the public sector especially rewarding because it makes a difference in everyday citizens’ lives. In addition, the federal government has the most data on the planet, so it’s perfect for someone who loves to be awash in data and continue to learn more.
There are many terms surrounding AI. First, it’s essential to understand the difference between artificial intelligence (AI) and machine learning operations (ML ops). ML ops are techniques that are part of AI; they are a subset. ML algorithms derive their strength from an ability to learn from available data. So primarily, you are learning from either supervised or unsupervised data.
The simple difference between supervised and unsupervised learning is the data label. In supervised learning, the datasets are labeled. This means that what the data looks like is already mapped out. It makes it much easier to classify and predict. In unsupervised learning, you are trying to find patterns in the data; the machine is learning to create relationships between data based on finding common ways, similarities, or differences.
An example of supervised learning would be an online store recommending an item that a customer might want to buy based on their shopping history or a streaming service recommending a movie based on someone’s viewing habits.
Many terms now have the abbreviation “ops” at the end. For example, people say “DL ops” for deep learning operations, a subset of machine learning. Why the “ops”? First, it’s not as sophisticated as DevOps. Instead, it is influenced by the widely adopted idea of the DevOps approach to creating and customizing applications. People are trying to develop a set of practices to help optimize the reliability and efficiency of machine learning design, development, and execution. So it would be almost like a marketplace where you can create and operate custom applications and then share them with others.
Many models and algorithms are already optimized and available in tools such as Converge.io or C3 AI. These methodologies and technologies can help you streamline your machine learning models. The best way to do that is with many tools that are either open source or specific vendor-created software to make creating, developing, designing, executing, and flow much more accessible.
AI development is similar to where software development was 30 years ago. Many of the steps are still manual and will hopefully be operationalized soon.
In previous episodes, Darren and Gretchen discussed how many AI and ML projects are science experiments done once. Then the data scientist moves on to something else, and it’s never operationalized. Counter to this, ML ops is moving toward deploying the model to provide real value after training and learning.
Some companies are explicitly leveraging those tools. Domino Labs, for example, almost creates that marketplace. Work in the public sector, say, on nuclear subs doing object detection or clustering classification, could be applicable in the Air Force or other auxiliary so that work could be cataloged to help operationalize and build agile environments. You could leverage some algorithms and weigh them differently depending on the results. You could tweak it based on the differences in datasets, but at least there are…starting points? Commonalities? Shared tools?
Security is always concerned with open-source software and models, and AI has unique circumstances. For example, how do you know the developer hasn’t trained it to ignore its face in a facial recognition model? There is an expectation now that people document things, for example, where a dataset came from.
There is also the issue of ethics and responsibility. The Tay chatbot and the bias found in facial recognition programs were great examples of AI gone awry without malicious intent. For a long time in ML ops, it was a single person doing the work and producing the results. Now, the idea is that you need a diverse team of people in different capacities with different worldviews.
The first conference to discuss AI and ML was in 1956 at Dartmouth College. The truth is that many AI basics, such as log regression, linear regression, clustering algorithms, etc., are math equations that have been around for a long time. Of course, there have been brilliant additional frameworks such as TensorFlow from which to build, but the basics are/were still the foundation. We’ve added computing, storage, 5G, and unique capabilities. Once you’ve done all the training, you’ve got the data and information next to the technology as opposed to having to bring it all to the technology. Bringing the technology to the data opens up some fun and exciting problems we can now solve.
But conversations surrounding how the model was trained, what was the original data, and accounting for model drift must always be happening. After a time, you need to retrain; maybe you need to bring in a different algorithm or weight the current one differently to get more accurate information because there’s more data and more data that is more diverse. This is all good because it increases your level of accuracy.
So with the movement toward ML ops, you can do this continuously. Just like software development went toward continuous integration and deployment, the same will start happening in AI or ML, where models will be updated and become more and more accurate.
Notices and Disclaimers
Intel® technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.