Understanding Milvus: The Power of a Vector Database

What does scaling AI, vector databases, open source, and German FAX machines all have in common? Host Katherine Druckman chats with Zilliz’s Stephen Batifol about all this and more in this episode of Open at Intel.

“But for normal software engineers, they don't really know what they need or how to do it. Suddenly, everything is like... everyone talks about embeddings, everyone talks about LLMs, everyone talks about F1 score, and you're like, ‘What the heck is that?’"

— Stephen Batifol, Developer Advocate at Zilliz

Katherine Druckman: Hey, Stephen, thank you for joining me. I really appreciate it. I appreciate you taking a little time to educate our listeners about, well, honestly, not just about the work that you do, but about who you are.

Stephen Batifol: Thank you very much for having me as well.

Katherine Druckman: I think we kind of do similar-type work, which is funny. I've had this conversation recently, right? You are a... What is your title exactly? It's developer relations of some sort, but what is your exact title?

Stephen Batifol: Developer advocate. Simple.

Katherine Druckman: There you go. Okay.

Stephen Batifol: Straight to the point.

Katherine Druckman: Yes. There are so many words for this. How did you end up in this role? How did you get here?

The MLOps Community in Berlin

Stephen Batifol: Yeah, so it's been quite a journey to arrive here, I would say. I think the first developer advocate I met was when I was living in Paris, so something like seven years ago. He was working for Algolia, which is a search company, and they were doing meetups and stuff in Paris. And then I learned about the role and I was like, "Oh, that's a cool role, actually." But then I continued my life being an engineer, working in the ML world.

And about two years ago now, I moved to Berlin and I started creating meetups because it was right after COVID. There were no meetups, basically, online. And I created the MLOps Community meetup in Berlin. I don't know if you know this community, it's quite a big community in the ML world.

Katherine Druckman: I do not.

Stephen Batifol: Well, like, I don't know, 23,000 members.

Katherine Druckman: Wow! Okay. Well, that is quite a community.

Stephen Batifol: Worldwide. So, then, yeah, I created the Berlin chapter for the meetups and hackathon and stuff, and then doing that on the side, doing podcasts as well and everything. And then I was like, "That's fun. Maybe that should be my job."

Katherine Druckman: Yes, "Maybe someone will pay me to do this."

Stephen Batifol: Exactly.

Katherine Druckman: This is fabulous. Right? Yeah, yeah.

Joining Zilliz and Working with Milvus

Stephen Batifol: I was looking for a job after I quit my job, my previous one, and I found a job at Zilliz where I am now, and I'm officially a developer advocate.

Katherine Druckman: You are all Milvus, all the time.

Stephen Batifol: Exactly.

Katherine Druckman: Yeah. It's funny how we all have similar stories. Most of us come at this by a more circuitous route, but it's like one day your life is an IDE, and the next day, or the next couple of weeks, suddenly you're like, "Oh, this is different. I get to talk to humans. This is fun."

Stephen Batifol: Yeah.

Katherine Druckman: You've been involved in the AI/ML community for quite a while.

Stephen Batifol: Very.

Katherine Druckman: Tell me a little bit about what attracted you to Milvus, the project, but also Zilliz, the company behind it, and how that fits in with your background.

Stephen Batifol: Basically, I saw the job ad on LinkedIn, and Milvus actually belongs to the Linux Foundation, so it was donated a couple of years ago now. Back then in my previous job, I was working on the machine learning platform, which was fully running on Kubernetes and everything open source. I talked a couple of times at KubeCon, I saw the whole ecosystem, and then I was like, "Oh, actually, that's a cool one. Working in open source and having this whole ecosystem is really interesting." So that's first how I came to be interested in Milvus.

And then what interested me even more was the scale it can reach. Milvus, which is a vector database, for the people who don't know, can really scale up to billions and billions of vectors. That's really how it was created. And that, for me, was one of the most interesting parts. Also, having some very cool customers, like Bosch, or IKEA, that was really cool.

And then Zilliz really got my interest after I interviewed with different people, but really in particular with the CEO and founder. The year that I came, I was supposed to have an interview, I think for 30 minutes, and we talked for 50, and it was like... he created Oracle Database. He's the founding engineer of Oracle Database. We're talking like 20 years ago, you know?

Katherine Druckman: Mm-hmm. More.

Stephen Batifol: Even more, yeah. I was probably even there. And then he was like, "Okay, but now I want to create my own thing." And then he came to the idea of, “Actually, there's nothing in unstructured data, so can I create a decent company there?” And it truly is how I described it. It's this American dream to create a company that then has a big impact on the rest. I got really attracted to that, and I've been there since March now.

Fun and Creative Demos

Katherine Druckman: Awesome. The type of work that you do lends itself to a lot of creativity, right? You're constantly making really cool demos, and I wondered if you could just tell us a little bit about what are the most fun applications?

Stephen Batifol: I think it depends. Usually the demos that I build, I basically build what I want to talk about and something that is…

Katherine Druckman: Ah, at Open Source Summit, you used it?

Stephen Batifol: Yes.

Katherine Druckman: Ah, the topic of your upcoming talk. How cool.

Stephen Batifol: And then it's really like, "Okay, I have a demo and I want to either showcase and prove that what I'm talking about is actually real." Making slides is easy, but it's actually writing the code and having the demo that is impressive usually for people, so I always try to do live demos. And the other part is when you work in AI at the moment, every day there is something new. Every week there's a new model.

Katherine Druckman: Yes, that is accurate, yeah.

Stephen Batifol: Multiple times a week, there are new models, so that it's also like, "Okay, I've got to check out what it can do and what it can't do." And then I have my own problems, like, I'm very lazy. So then I try to create a solution for that, and then if it works, then I have a demo with it. I have one fun demo, which is chatting with the Berlin parliament.

Katherine Druckman: I like this.

Stephen Batifol: I don't know if you know, but in Germany, nothing can be in English. All the official documents have to be in German, and I was quite interested in knowing what they were talking about. But then it's not only German, it's official political German, so it's boring.

Katherine Druckman: Right, yes. All of the words, are like, 20 letters.

Stephen Batifol: Exactly. And then I was like, "Oh, actually, I want to know what they talk about," so then I built a whole demo. There is a multilingual demo where I can ask questions in English and in German about what's happening, and then it tells me what's happening. And then I have a fun fact, which is that Berlin still uses fax machines.

Katherine Druckman: Wow. Okay.

Stephen Batifol: Yeah, yeah, yeah.

Katherine Druckman: I haven't even thought about a fax machine in a really long time.

Stephen Batifol: Exactly. Well, and they still use them. And then I was really wondering, "Okay, so you have," I think if I remember correctly, "like 350 services in Berlin," part of the administration. And then I was like, "How many require the use of a fax machine, actually?" And the answer is 189 require the use of fax machine on a daily basis. Yeah.

Katherine Druckman: That is fascinating. See, this was not what I expected to learn today. I thought I would learn something new. This was not what I expected.

Stephen Batifol: Not that, yeah.

Katherine Druckman: This is great! Oh, I like that.

Stephen Batifol: So usually, yeah, that's kind of the way I do demos or either…

Katherine Druckman: That's really fun.

Stephen Batifol:... I'm wondering about something, or I know that a lot of people have been wondering about it, so then it's like, "Okay, I learned that two weeks ago. I'm going to show you now a bit."

Katherine Druckman: Tell me this. I mean, I would guess that when you're doing your demos you experiment with new models. Being that you work for a company that supports Milvus, maybe you swap in a different vector database, different pieces here and there just to experiment around and see what fits together and what fits for your use case? Now that you're not building production apps anymore, how do you maintain that connection with the developers who you need to communicate with? What are their needs and how do you meet them?

Stephen Batifol: Yeah, I think that's a very good question. It's something that I'm trying to balance at the moment. I have those MVPs that I'm building, and then everything is cool, everything works. But then to relate to them and to be able to understand them, I also build demos where things run at scale and run on Kubernetes, for example.

Last week, I was working on my AWS cluster, on my EKS cluster on AWS, and then I installed Milvus there fully, and then I was pushing 30 million vectors just to see how it works at scale. And, by following the documentation and checking out, "Oh, I know that because I work for Milvus, but it's not written in the documentation, so how can we fix that?" I also had a problem at one point on how to import some vectors. I had to ask the team, but then it's like I have access to the team, but you know other people don't. So, it's usually that.

And then otherwise, it's really trying to put myself into the feet of others and then check in, like, "Okay, how would you do that?" Well, then people ask me during meetups, so I do a lot of events, so then people ask me those. It's either I know, and then if I don't, I'm going to look and then I'm going to learn more about it and then really nerd out about something in particular, I would say.

Challenges in the AI/ML Community

Katherine Druckman: Yeah, I always think that if you can fix some documentation, that's one of the most heroic things you can do in our world. I'm wondering, given that you interface with all these developers, what are the big problems that developers in this space are most concerned about right now?

Stephen Batifol: I think one problem is that people don't know what they want or what they need. In the whole AI world, as we call it now, no one was using vectors, or basically no one was working in natural language processing who are now working in AI. So, you had those data scientists really working on an LP and they've been working on it for years, so to them, everything's very clear. But for normal software engineers, they don't really know what they need or how to do it. Suddenly, everything is like... everyone talks about embeddings, everyone talks about LLMs, everyone talks about F1 score, and you're like, "What the heck is that?"

Katherine Druckman: Re-ranking them, yep.

Stephen Batifol: Exactly. And it's really, I think that's the main problem. Then the second one is really you'll have to optimize for search or search quality. And then it's like, okay, how do you do that? How do you do filtering? How do you make sure you get the good results? Because in my previous job, we were working with Snowflake. You want some data? You do your query. That's it. Either you get the data, or you don't. If you do that in a vector database, what happens is that, yeah, you will get something.

Katherine Druckman: Something.

Stephen Batifol: Maybe the distance will be far, but you will get something, and I think that's also the thing people have to get used to a bit. But I think those are the main ones. It's also people don't know what they want up until they see it either on LinkedIn or on Hacker News or something. Then they see it and they're like, "Hey, do you have that?" But they don't know if they need it. It's more like someone wrote about it and then they want it.

Katherine Druckman: Yes, we're like kids in a candy store. Right?

Stephen Batifol: Exactly.

Katherine Druckman: There's again, like you say, there's something new coming out every week.

Stephen Batifol: And they're like, "Yes. What? You don't support that? What?"

The Importance of Open Source

Katherine Druckman: "We've got to try this new thing." Yeah, it's funny. You mentioned a word, “distance.” I've been thinking about how the concept of distance is so important in this kind of work. This segues me into thinking about something else I wanted to talk about, which is the idea of openness. We're open source people here. You're working on an open source vector database under the umbrella of the Linux Foundation, a very established open source project. Why do you think, for developers, is being part of a more open ecosystem important? How does that serve their needs better than the alternative?

Stephen Batifol: I'm going to take my other hat, which was my previous job, like, you know, ML platform. Back then when I joined my previous company, it's a food delivery company that now belongs to DoorDash, there was no ML platform, and so I had to look for those. Back then there was Vertex AI. Either you go on Google, or you go on SageMaker, or something, like on those Google, AWS, or whatever, which is fine, but then you're also always dependent on what they do for your platform.

That's the main problem I had. I was like, "Okay. But then maybe I want to implement something new and then I don't want to wait on them. Or I want to contribute directly. Maybe we have something that is very special for us that I can help implement."

I think those are the main points where I think open source is very important. Also, you have more transparency, you tend to have more security because more people can have a look at it. Those are the main things and it's really... I mean, let's also be honest. Engineers will love sometimes to build things and to have a look at it, and then maybe it would be cheaper if you were to use something else. But then it's like, "Yeah, but no, it's my thing." You know?

Katherine Druckman: That's fair.

Stephen Batifol: I think you also have a better integration. At least for us, we had… everything was open source. The whole platform was open source. And then you have better integration with other tools like DataHub, or other tools that you wouldn't have because they're not part of the Google Cloud stack or AWS.

Katherine Druckman: I feel like I've been in the open source world for so long that I sometimes have trouble identifying with the alternative, right? It's like, "This is all I know."

I personally think it's very important to create a thriving and sustainable open software ecosystem. Open source software, open AI, that's a conversation maybe we can have about the definition of that or maybe we can save it. But regardless, the concept of openness is an important one. It is important to realize that… is it the easier path? Is it the most cost-effective path?

Stephen Batifol: Yeah. I don't know if it's the easier one, to be honest. I think, in the long term, it might be, but short term, clearly not usually. I mean, let's have a look. If you need to use a really big database for a data warehouse, either you can go open source, the whole Spark and everything, but then you can also use Snowflake or Databricks. And sometimes... So those tend to be expensive, but then they can also save you so much time as well. It can be... It's all about the balance, I feel like. I think in the long term, then it makes it easier… or you're less dependent on them, so that's the good part. But also, I feel like it depends on the engineers you hire and the kind of engineers you hire. Not as the skill, but more like are they…

Katherine Druckman: Sure, sure. How innovative are they?

Stephen Batifol: Exactly. Or do they…

Katherine Druckman: Are they interested in innovating or just kind of following the thing?

Stephen Batifol: Exactly.

Katherine Druckman: Yeah. To me that's always, again, a benefit of maintaining that kind of interoperability that you get with open ecosystems is…

Stephen Batifol: Exactly.

Katherine Druckman: ... it lends itself, I think, to creativity and innovation and staying on top of whatever the goal is with whatever it is you're building. But again, I'm a little bit biased.

Stephen Batifol: Yeah, but I feel like, I mean, at least when I talk to people, what they like is that they can also have a look at the code. Especially now, it gets easier with different systems that can actually explain it to you. Back then, it was really hard sometimes to understand what was happening. I feel like now, if you want to have an overview of what's happening in the code, you can just ask a LLM and then they'd be like, "Hey, this is great. Da, da, da, dat."

Katherine Druckman: Unless it has guardrails, I guess, right, that won't let you do that?

Stephen Batifol: Yeah.

Open Source Summit Presentations

Katherine Druckman: I don't know, maybe. Oh, that's funny. We hinted at something a little bit ago and I want to make sure we talk a little bit about it, or as much as you'd like to talk about it, which is your upcoming Open Source Summit presentation, but by the time this comes out, you may have already given it. Tell us what you're working on for it.

Stephen Batifol: I am planning on showcasing how Milvus works at scale so that... It means different things. Milvus and vector databases in general come under a lot of pressure by people, especially Hacker News people. You know the famous, "Oh, why do you even have Dropbox? You can use a USB key." We kind of have the same for vector databases. "Why even use a vector database when you can use…I don't know…NumPy cosine?" I get that a lot, which is fair, which is a very valid question, a question I also love. It's basically me trying to answer this question as when it makes sense. And then, if it makes sense, what can you do with it?

So, I start the talk like that, and then the idea is that, okay, so you have some vector database, which is cool, but then, how do you actually reach the scale? How do you reach 10 billions, 100 billions of vectors? That's also another point. Because we have some customers with 100 billion, so it's not like I'm inventing that, you know?

And then I'm going to talk about how it works, how the system is distributed. That's a plan. I also plan to have a demo. The demo will use Zilliz Cloud just because I'm a bit lazy.

Katherine Druckman: That's fair.

Stephen Batifol: And Zilliz Cloud is like... it's basically hosted Milvus, so it just makes my life easier. And then I have a demo where I have, I don't know, some millions of vectors and I'm going to show how fast it can be, even on millions of vectors, and the different things you can do.

I really try to go into the details about the different indexes as well because sometimes people talk a lot about HNSW, which I don't know if you're very familiar with the whole vector world, but a typical algorithm part…but then you have other indexes…and then you have some GPU indexes supported by NVIDIA. I know we have some with Intel as well working on some specific Intel CPUs. It's also talking about those, being like, "Okay, you have the very famous one, but what if you have 100 billion vectors? Or what if you have, I don't know, 20 euros per month to spend only on your vector database? Then you might choose another index." That's basically what I'm going to talk about.

Future of AI and LLMs

Katherine Druckman: Interesting. Aside from that, beyond this very cool demo that you will be giving, what are you most excited about? What are you excited to work on and what are you excited to talk to people about in the next year or so? What do you really hope to see happen in the next year, even?

Stephen Batifol: One thing I'm excited about is to, not to the AI part to slow down, but more to actually have things running in production. A lot of people…

Katherine Druckman: The maturity level, yeah.

Stephen Batifol: The maturity level. Yeah, exactly. A lot of people before have been like, "Yeah, you build this POC, you build that, blah, blah, blah, blah, blah," but no one … the customers I talk to, the enterprise, are actually just starting now building their things at scale. That's one thing I'm quite interested in.

Another part is agents. LLM agents, I find them fascinating. I don't know if you played with them, like Claude Artifacts, for example. It's mind-blowing what it can do. That's something I'm really interested in in the future to see what agents will be able to do, how much software engineering is going to change as well, or at least like POC levels. I feel like that's a good part.

And, yeah, also, how much search is going to improve. Now we have mostly multimodal data and different things, but most of the searches are happening on text, and I want to see how much better we can get with the voice. And same with video. Video can be divided into images, but that, I'm more excited about.

Katherine Druckman: Right. Yeah, the more natural, human-like the interaction becomes…

Stephen Batifol: Exactly.

Katherine Druckman: ... it kind of evolves in an interesting way. But yeah, that's really cool. I should mention also, by the way, that my colleague, Ezequiel Lanza, and I are also co-presenting in the AI track at Open Source Summit. That's going to be really fun, and what made me think of it, because I should have thought of it anyway because isn't that why we're here is to plug the other things that we do, but also you mentioned just this kind of rapid speed of innovation, that you're looking forward to the maturity level. Everyone's just out there playing with stuff and it's aggressively innovative, which is great. It's very cool and it's hard to keep up with, but what are the consequences of that? We don't really have standards yet. We need to get there.

We're working on something. Zilliz is actually a partner, I should mention. We've talked about this quite a bit, but it's the thing that consumes a lot of my brain space lately. Have you heard of this? It's the Open Platform for Enterprise AI.

Stephen Batifol: Yes, yes.

Katherine Druckman: Yes. Okay, so…

Stephen Batifol: How do you pronounce it? Opea? Opia? Opya?

Katherine Druckman: Well, the official pronunciation that was voted upon at a community event was OPEA, but that sounds like the letters O-P-A.

Stephen Batifol: Yeah.

Katherine Druckman: Most people just say, "OPEA." I don't know. I think this is one of those…

Stephen Batifol: OPEA, okay.

Katherine Druckman: ... if this is going to be GIF-JIF thing, I don't know. We'll see. But officially, it's “OPEA” and the concept of our talk is this should address that issue of standards, or it should at least help. And again, we maintain that creating these standards together via community collaboration is the way to go, right?

Stephen Batifol: Yeah.

Katherine Druckman: Because we are all out there pioneering and building and being really creative, but you get in this, when you develop in silos, you get increased complexity, and you get madness. If we hold together, we will sort through the madness. That's what we're talking about.

Stephen Batifol: Before you had the whole MLOps world, so it kind got standardized. You had your DAG, you had some kind of ML flow in the middle or whatever, and then you were serving your models. And now it's like, yeah, everything is, "Just run it and no standards, we don't care," which is refreshing a bit, but now I think we can have a bit of both. You know?

Katherine Druckman: Yes, right? Yeah. It's like we're in the AI party phase, the machine learning party phase.

Stephen Batifol: Exactly.

Katherine Druckman: And it's like, well, at some point, we're going to have to get some order here and clean up our mess, clean up our technical debt or whatever it is that we're creating. Right?

Stephen Batifol: Exactly. All those notebooks, time to put them in functions.

Katherine Druckman: Yeah. It's funny. Okay. Well, cool. Is there anything else that you wanted to talk about that we didn't get to?

Stephen Batifol: No, I think, yeah, I'm really excited for the standards and different things to come up, I think, in the future.

Katherine Druckman: Yeah. Yeah, me too. We'll get to that teenage phase any day now.

Stephen Batifol: I think I'm excited about the LLMs as well. Now, it's the race to bring the prices down so they can kill the smaller LLMs, basically. But I feel like it's quite interesting. We'll see. Then in the future, some smaller ones, more being more focused, I feel like, instead of having those gigantic ones that are expensive.

Katherine Druckman: More purpose-built, more specific.

Stephen Batifol: Yeah.

Katherine Druckman: It's going to be an interesting year, two, five, I don't know, forever.

Stephen Batifol: Five, six. Yeah, whatever.

Katherine Druckman: The world is changing rapidly.

Stephen Batifol: We still use fax machine here, so no worries.

Katherine Druckman: Yes, I know! Except for in the German government where they use fax machines, and that is the most interesting bit of trivia I've learned today, so thank you for that. Thank you so much for joining me and chatting and nerding out a bit, and I will see you in Vienna.

Stephen Batifol: Thank you very much.

About the Guest

Stephen Batifol, Developer Advocate, Zilliz

Stephen Batifol is a developer advocate at Zilliz. He previously worked as a machine learning engineer at Wolt, where he created and worked on the ML Platform, and previously as a data scientist at Brevo. Stephen studied computer science and artificial intelligence. He is a founding member of the MLOps.community Berlin group, where he organizes meetups and hackathons. He enjoys boxing and surfing. 

About the Host

Katherine Druckman, Open Source Security Evangelist, Intel

Katherine Druckman, an Intel open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism. Find her on LinkedIn.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Understanding Milvus: The Power of a Vector Database

The MLOps Community in Berlin

Joining Zilliz and Working with Milvus

Fun and Creative Demos

Challenges in the AI/ML Community

The Importance of Open Source

Open Source Summit Presentations

Future of AI and LLMs

About the Guest

About the Host

Product and Performance Information