Canonical’s Data Science Stack and AI’s Open Future

In this episode of the Open at Intel podcast, host Katherine Druckman spoke with Andreea Munteanu from Canonical about their data science stack and out-of-the-box machine learning solution for data scientists. They talked about Kubernetes, KubeFlow, and why open source is the future of AI. Enjoy this transcript of their conversation.

“I asked, what's the problem that you've solved? You've been tasked to enable a GenAI application in your organization or in your team, but what's the problem that you're solving?”

— Andreea Munteanu, AI/ML & MLOps Product Manager, Canonical

Katherine Druckman: Hey, Andreea, thank you for joining me in my little KubeCon fishbowl here. I appreciate you taking the time out of the event to come and talk to me.

Andreea Munteanu: Hello. Thank you for taking the time. I'm super excited to be here.

Katherine Druckman: Please introduce yourself and tell us what you do at Canonical and what you're doing here at KubeCon.

Andreea Munteanu: I think I'll start by introducing Canonical itself. We've been around for 20 years, but often people know us as Ubuntu, so Canonical is the publisher of Ubuntu. Beyond that, we have open source solutions across all areas of the stack. I'm the AI/ML product manager in Canonical, so I look after our AI/ML portfolio at all scales, from workstations to data centers all the way to the edge. Our main objective is to enable organizations to deploy their ML projects in production at any scale.

Katherine Druckman: What are you featuring here, and what are you hoping to talk to people about while you're here at KubeCon?

Andreea Munteanu: When it comes to KubeCon itself, and when you look at the industry, Kubernetes is the platform. Organizations are looking at cloud native applications, to run on top of it for different use cases, AI being one of the major ones. I came to KubeCon to gather feedback to see where people are, where developers are, and what the challenges are. We are also part of the community. We're active members of some of the CNCF projects. If I think of Kubeflow itself, it's part of CNCF, so meeting the community again is important for me. And then last but not least, it's important to see what's coming up next. I want to see the challenges, trends, and to see how the landscape is evolving as well.

Katherine Druckman: Tell us a little bit about your project. I believe it's called Data Science Stack?

Andreea Munteanu: Yes, that’s right.

Katherine Druckman: Something fairly recently released, if I'm correct.

Data Science Stack Introduction

Andreea Munteanu: Correct. For those who are not familiar, Data Science Stack is an out-of-the-box solution that enables data scientists to get an ML environment up and running in only three commands on Ubuntu. It's a solution that we launched in September this year, so it's fairly recent, and it's a tool that aims to lower the barrier to entry for data scientists or machine learning engineers. I'll start with a bit of a back story. I started 10 years ago in data science, and building models was nice, but we were spending way more time on tinkering with the tooling, integrating tools, and setting up our own ML environment because of all sorts of compatibility issues and version constraints. 10 years later, it hasn't changed, and there are still jokes and reports on the market about how data scientists spend 80% of their time on tooling, rather than building models.

That was the mindset we had in mind when we started. We started working on Data Science Stack to lower the barriers to entry and make it seamless for everyone. It has open source tools and it's fully open source. You can get access to Jupyter Notebook to develop models, you can get access to MLflow for experiment tracking, model registry, and it's just easy. That was the idea. I remember we had a couple of beta users and they said, "But it's just easy." And that's the idea. And it takes away the burden of the compute power. You don't have to worry about how you access your GPUs on the workstations. It just works.

Community and Collaboration

Katherine Druckman: Interesting. Given that it's a relatively new project, tell me about the community around it. Do you get a lot of contributions from outside of Canonical?

Andreea Munteanu: It depends on how you define contributions. It is a project, or a product nowadays, that we launched just two months ago. But the most valuable contributions that we got was around feedback on how to improve it, from improving the CLI, to gaps that we had in the documentation. There were a lot of pieces of feedback that we received and community support that we received. For us, the next step is to work way more with, on one hand, the Ubuntu community, but on the other hand also with other communities such as OPEA, CNCF, to get even more support and to get developers who are aiming to move towards data science to get acquainted to it.

Katherine Druckman: You mentioned OPEA, the open platform for enterprise AI. How does all this fit together? These are all open source projects we're talking about, and how much cross-pollination is there between them, and how do they fit together? Do they work together? Can I use all of these things together?

Andreea Munteanu: I think I'll start with how we work with OPEA? And the truth is that both Canonical and OPEA have a very similar mission that goes hand-in-hand: enable open source tooling for AI projects in enterprises in a secure, reliable, scalable manner and so on. Data Science Stack, it helps organizations to get started without worrying about anything else. What OPEA brings on top of it, or in addition to it, is GenAI use cases that can be easier rolled out in production, reference architectures, blueprints that are highly appreciated. I used the RAG blueprint in some of the public talks that I've done, and everyone loves it. It brings light to the world.

Now also one other thing where the intersection is, is the user experience, the same easy user experience that we have. You get Data Science Stack in two commands on Ubuntu. You should get some of the applications that OPEA features as seamless as possible, in three commands, if possible even less, and then be able to roll them out in production. Furthermore, if I look a step further, Canonical's portfolio includes more than Data Science Stack. We have Microcades or Canonical Kubernetes. We have KubeFlow as an MLOps platform, and those are tools that enable OPEA's use cases to be optimized, to be run in production, to be scaled easily. And that’s the second pillar where the cross-pollination happens, of course.

Getting Started with Generative AI

Katherine Druckman: Let's say I'm a developer, or I'm part of an engineering team that's tasked with creating a new generative AI application, or maybe I am tasked with adding some generative AI capability to an existing application. How do I get started with these tools? Can you walk me through just a little bit of that?

Andreea Munteanu: That's a good question. Usually whenever data scientists come to me, and it comes probably from my own background as well, I ask, "What's the problem that you want to solve? You've been tasked to enable a gen AI application in your organization or in your team, but what's the problem that you're solving?" If you don't have a problem to solve, just don't do it. But assuming you have a problem, you know you need to parse 20 years’ worth of customer service history in order to create a chatbot, for example. The next step, or the next usual step, is to ensure that you have the data, which obviously you have, and then look at the existing infrastructure. Because building data centers, or just for one use case, or getting stuck that you cannot begin experimenting because you don't have enough compute power, I don't think is the way to go.

Try to find resources internally for the compute power that you need. Often you just need to look on the internet, look on OPEA's website, because they have examples for RAG, which probably solves your problem. And for those who are not familiar, RAG comes from retrieval-augmented generation. Look on the internet for those examples and try to deploy them on your machine or on the infrastructure where you have, and fine-tune them with a smaller subset of data, not with the 20 petabytes of data that you have. Just take a smaller one. And that's for GenAI use cases. If you have traditional ML, you just need to build a model that does a system recommenders for whatsoever reason, employ DSS. Look around for examples for system recommenders, build it on a small subset of data, optimize the model, and after that, try to scale it to roll it into production.

Andreea's Journey into Data Science

Katherine Druckman: Tell me a little bit more about you. How did you get into the data science field and then work your way into the open source community?

Andreea Munteanu: That's a very interesting story. I have a telco engineering background. I'm originally Romanian, and in Romania back then, data science or machine learning was not a thing. You would do engineering if you wanted a good job and a good future if you're good at mathematics. So, I said Telco engineering it is. After two years in the university, I realized it's not for me. I don't have enough patience for what was going on there. My very first job, I got into a telco company. I knew I wouldn't like it, but my manager at the time said, "Here's this data. Why don't you play around with it?" I started playing, I started building some models. I built a lot of models for optimizing the radio layer based on the data and the parameters that they built, and then it grew steadily.

However, over the years, I gathered a lot of frustrations on what data scientists struggle with when it comes to data, and when it comes to tooling. So, I said, "Okay, I want to change something." And that's when I moved towards product management. How did I get into open source? I think along the way, I started using Ubuntu, and then in general, I used a lot of open source tools. The ML space is very heavily open source. If you think of Jupyter Notebook, MLflow, PyTorch, TensorFlow, they're all open source, so it came naturally to be active in the open source communities, and the moment I joined Canonical and Ubuntu, it felt like home.

Katherine Druckman: If you were to describe your feelings on the importance of openness as it relates to AI development, what are your thoughts on that? I have my own personal feelings, but I feel like I am so heavily biased toward openness because I've been doing it so long. I wonder what your thoughts are, coming from a little bit of a different path.

The Future of AI and Open Source

Andreea Munteanu: I think that the future of AI is open source. Now, everyone might say, "Hey, you're biased. You work in an open source company; you've been there for almost five years. You for sure like open source." But the truth is that whenever I look at the latest innovations in AI compared to any other industry or any other big revolution from the past, everything is open. Everything is out there, so people can try it, and contribute to it. People make it an important thing or just kill it because it's not a good idea or they just don't like it. I think the future of AI is going to be open source. At the same time, I think that the future of AI is going to be open source in a secure manner, and that's where it becomes interesting, because often innovation in the past was not that open because of security concerns, which are solved these days with confidential computing. So that's why I think it's exciting as well that there are not going to be many barriers to stay open and towards openness in the AI/ML space.

Katherine Druckman: That is interesting. Because there is a conversation, I think, to be had about protecting one's data, because data being the topic of interest here. And so the conversation again around openness in AI is an evolving one, I think, and an interesting one. But I personally think that everyone's interests can be preserved and still maintain openness.

Andreea Munteanu: At the same time, I do think that... Now I'm going to be devil's advocate.

Katherine Druckman: Yes, please.

Andreea Munteanu: We need a bit more regulations, not regulations as in laws, but a bit more guidance. It often feels…

Katherine Druckman: Guardrails?

Andreea Munteanu: Yes. It feels a bit like a Wild West.

Katherine Druckman: Sure.

Andreea Munteanu: So, the guardrails, a bit of guidance on how to use them. Innovative technologies such as confidential computing are going to help organizations feel more comfortable with the openness that AI/ML brings to the world, I think.

Katherine Druckman: That's an interesting perspective. What do you hope to see happen with the platforms you're working on, like the Data Science Stack, like your involvement in the OPEA project, all of those things? What do you hope to see in the next six months and the next year and even beyond?

Andreea Munteanu: I'm a dreamer.

Katherine Druckman: Awesome.

Andreea Munteanu: But when it comes to what I hope, it is what I hoped right when I started, that we provide solutions that are accessible for everyone in an easy manner at all scales, from students who are getting started all the way to large organizations who deploy AI/ML at scale. I hope to see a better user journey for developers in general. I think that's where Canonical, OPEA are going to work well together in terms of fostering that path. We're on the right path for sure, but there is still room to improve and there are still question marks. I look these days at the inference story and AI at the Edge. Last year everyone was talking about GenAI. This year everyone is talking about inference. Building clarity and building solutions that are easy to use and easy to adopt at different scales is something I hope to see. And then I always give this example: when I was in university, I ended up using Raspberry PIs because they were so easy, so cool and so accessible. That's my dream for our AI/ML portfolio. If it could become as loved as the Raspberry PIs are.

Katherine Druckman: The Raspberry Pi of machine learning. I love it. Is there anything that you wanted to talk about that we didn't get to?

Andreea Munteanu: I think we talked a lot about Data Science Stack, about OPEA. One thing that we didn't talk much about is the data center part. When it comes to data center and to DSML, or data science and machine learning platforms, organizations and developers often think of all the tools that run on the cloud and often forget that there are projects such as KubeFlow that can easily enable developers to build their MLOps pipeline, to automate their workloads, to do fine-tuning. There’s also nothing to stop them from contributing towards this project. That's another thought that I have—and maybe a bit of a regret—is that I wish I started contributing way earlier in my career in open source.

Katherine Druckman: A lot of people feel that way. I feel that way, too.

Encouraging Open Source Contributions

Andreea Munteanu: Once I started working in the industry or whatever, in the last year of university, I became active. But I think the opportunities that you have when you're younger to be able to contribute, to be able to engage with so many like-minded people within the open source space, and especially in the AI/ML space, it's incredible.

Katherine Druckman: It is, yeah. I sympathize. I understand completely because in the past, I was, let's say, a user of a tool for many years before I considered contributing. And then I did, and it was incredibly rewarding. But there is a little bit of a barrier in terms of working up the courage sometimes, I think, to contribute.

Andreea Munteanu: That's a challenge that we have in the KubeFlow community, because it's quite a massive project. It often feels difficult. And one thing that we were discussing was that contributions are not just code per se.

Katherine Druckman: Yes, absolutely not.

Andreea Munteanu: Contributions are documentation. Contributions are feedback towards the developers on how the platform is used. Contributions are suggestions or feature requests as well. Contributions are bug reports. And those things are so easy to do because even if you're a user, you bump into problems. How are we going to know your problems if you don't let us know about your problems?

Katherine Druckman: I think that's a really excellent point, and I'd like to see people get more involved in the projects that they rely on, because again, if you rely on a thing, it is in your best interest to do everything you can to see that project or tool succeed. I'd like to see people take the initiative to try and contribute where they can.

Conclusion and Final Thoughts

Andreea Munteanu: Interestingly enough, a bit later today we are going to have a panel discussion about it on how to contribute to KServe, which is a project that focuses on model inference and model serving. I think it's important to look at it because if I think of where the trend or where the market goes towards AI at the edge, it's going to make a big difference. We were a group of people discussing the challenges that we had when we started contributing, there were all sorts of challenges from, ‘oh, I don't think I'm good enough to write code’, to ‘the time zone was not good enough for me so I was avoiding joining community calls.’ And in the end, we all find our way through. But also it takes maybe a bit of guidance to know that, especially when you're young, there are way too many ways to contribute for you to say, oh, I cannot.

Katherine Druckman: Absolutely. Well, thank you so much. I really appreciate it. This has been a really great discussion and I've learned a few things. I appreciate you sharing your machine learning expertise.

Andreea Munteanu: Thank you very much, and thanks for having me. I really enjoyed the discussion.

Katherine Druckman: You've been listening to Open at Intel. Be sure to check out more about Intel’s work in the open source community at Open.Intel, on X, or on LinkedIn. We hope you join us again next time to geek out about open source. 

About the Guest

Andreea Munteanu, AI/ML & MLOps Product Manager, Canonical

Andreea Munteanu helps organizations drive scalable transformation projects with open source AI. She leads AI at Canonical, the publisher of Ubuntu. With a background in data science across industries like retail and telecommunications, she helps enterprises make data-driven decisions with AI.

About the Host

Katherine Druckman, Open Source Security Evangelist, Intel

Katherine Druckman, an Intel open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism. Find her on LinkedIn.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in