Understanding Observability with OpenTelemetry

author-image

By

Join us as we sit down with Austin Parker, director of open source at Honeycomb.io to discuss observability with OpenTelemetry, explaining its importance in cloud native software and discussing the OpenTelemetry project's growth and community contributions. He shares insights on the evolution and adoption of OpenTelemetry, its impact on the software industry, and the collaborative nature of its development.  
 

“If you think about it, lock-in is really bad for businesses because it lets them just kind of rest on their laurels. If I'm only competing with myself and not everyone else, then why am I going to innovate?”

— Austin Parker, Director of Open Source, honeycomb.io 

 

Katherine Druckman: Hey, Austin, thank you for taking some time out of your KubeCon to sit with me in what I like to call the fishbowl. 
 

Austin Parker: Yeah, it's really exciting because I feel like having an audience for podcasting is really interesting. Podcasting, a famous visual medium so everyone can see us waving at people as they wait in line for their coffee. 
 

Katherine Druckman: Yes, it's great. So first, tell us who you are and what you do, and then why are you here at KubeCon? 
 

Austin Parker: Great questions, all three. My name is Austin Parker, and I am director of open source at Honeycomb.io. I am also one of the early contributors and maintainers on a project called OpenTelemetry, which is a CNCF incubating project. And our goal is to make high quality telemetry a built-in part of cloud native software. 
 

Katherine Druckman: Fabulous. For anyone who doesn't know, could you tell us what incubating means? 
 

Austin Parker: Yeah, that's a great question. In the CNCF, projects have one of three maturity levels. The initial one is sandbox, which is where newer, smaller projects go and live. Then as you grow and mature, then you move into incubating, and finally you graduate to graduated. Projects like Prometheus or Kubernetes are examples of graduated projects. 
 

Katherine Druckman: Helm. 
 

Austin Parker: Helm. Where you have many, many users and you're in production everywhere. OpenTelemetry is a very mature incubating project. We are in production at probably tens of thousands of organizations worldwide. Many of you've heard about, including companies like Microsoft. 
 

Katherine Druckman: I've heard of that. 
 

Austin Parker: Yeah, little… 
 

Katherine Druckman: Little startups. 
 

Austin Parker: Little startups in Washington. But yeah, a lot of financial companies are using OpenTelemetry some places like eBay, Shopify, Intuit, JPMorgan Chase, Vanguard, yeah, the list goes on and on and on. And so, it's becoming more and more natively integrated into other software. If you're a JavaScript person, maybe you saw this year Next.js 15 released, and it has built-in OpenTelemetry support, right? We're seeing it really gain a lot of adoption, and I think that is a sign of how useful it is and how much we're filling this need in the community, the cloud native community for observability. 

The Importance of Observability 

Katherine Druckman: Awesome. Tell me, if you could talk to the ideal target audience for OpenTelemetry, who is that? Who do you want to gather together and really get them excited about the project? 
 

Austin Parker: I mean, really everyone, right? If you're writing software, you need observability. Let's even take it back a little bit and talk about observability. We use the word a lot, but what does it actually mean? 
 

Katherine Druckman: Yeah, what does it mean? 
 

Austin Parker: You'll see a lot of definitions. I like the sciencey one, right? It comes out of this idea of control theory, which is how do you control a system, any kind of system, how do you actually understand what the system is doing? Observability, the classical definition is: It's your ability to understand the inner workings of a system based on its outputs. If you think about software, if you think about microservices or any kind of service, all software is just a set of boxes that we put something into. 
 

Katherine Druckman: Schroedinger’s microservice. 
 

Austin Parker: Right. You put a value in, and then a value comes out. And wouldn't it be great if you could figure out what was happening that got you from input to output?  In the simplest microservice example where you have the adder service and it takes two parameters, A and B, and then returns C, which is the value of A and B added together. 

Okay, what if one out of 5,000 times instead of getting A plus B, you get A plus B plus one? How do you figure out why it's happening? You can run it as much as you want on your local machine. You'll never see the exact circumstances that cause that sort of random error. The idea behind observability is we need really high-quality detailed data about the inner workings of our services and systems to understand what's going on in production in order to kind of figure out the weird emergent behavior that sort of happens in systems as they get bigger and more complex. 

Challenges and Innovations in Observability 

Katherine Druckman: We talked about how many companies are here. They're either startups or not, working in the area of observability. Why is there so much innovation in that area right now? 
 

Austin Parker: That's a really good question. I think there's kind of two or three reasons. One is that we've really seen over the past five or six years, this real revolution in observability where everyone has been like, oh, actually yeah, we do need this. Applications get more complex, your systems get bigger as you add in Kubernetes and you're going to the cloud and you're adding in serverless, and all this stuff just increases the complexity of a system to the point where it's not enough to just have some logs. It's not enough to just have a bunch of metrics. You need something more. You need to be able to take all of this data that you're getting from your system, your logs and your metrics, and your traces and continuous profiles. And you need to organize that data. You need to be able to analyze it across multiple different axes. 

You have your system logs, your application logs, your system metrics, your app metrics, you're in the cloud, and you have managed services. Then those services are going to spit out a bunch of logs and metrics. And a lot of people are now adopting distributed tracing, right? Where you have a way of looking at an entire transaction, entire request as it moves through a system. You can think of it as logs with additional kind of context and structure. And all of these things together are extremely valuable and make a big difference in how you can operate your system safely at scale and understand it at scale. But looking at them in isolation makes it really challenging. I think a really great analogy here is, imagine if I told you to describe an elephant, but I'm going to have you describe the elephant by sending three people into a room. And they can only go in one at a time, and each of them can only use one sense, right? 

One person can only hear the elephant, they can't see it, they can't touch it. Second person can only see it, but they can't touch it, they can't hear it, and the third person can only touch it, right? They can't see it. They only have those three senses. Each of them has three senses. And then they come out of the room and you tell them, "All right, tell me what you saw. Tell me what's in there." And the person that could hear but not see is going to give you a completely different answer than the person that saw it but couldn't hear it, right? Or touch it. And you, as the person that is getting that information, you have to synthesize it. 

You have to say, okay, the person that saw it told me it's really tall and it's gray. And then this other person told me that it kind of makes snuffling noises, and the person that touched it said it was rough. So that could be an elephant. Sure. But that could be so many other things. And if you're thinking about operating your systems from that perspective I have my metrics and I have my logs and I have my traces, I have these three pillars. If you only have these pillars and they're not connected, then you're the person trying to figure that out after the fact. 

To really get ahead of this, rather than having to look at those pillars independently, what if the data you got, what if one person went in that had all their senses and could tell you like, “oh, it's a big gray thing, it's got a trunk and it's rough and it's stompy”, whatever. That's the OpenTelemetry idea. That we need all of this data and it needs to be really structured. We need to have really defined schemas and conventions about what data things emit and the shape of that data and what various metadata in it means. And by giving you that kind of data, you can get much higher quality observability than you would have by looking at all this stuff independently. And I think that is why you're seeing a lot of innovation in the space because now, OpenTelemetry is here to give us that high quality data and make it easy for people to get. 

The OpenTelemetry Community 

Katherine Druckman: Very cool. So, we're open-source people here, right? We're talking about an open-source project, and open-source projects are built by humans, a lot of them. And hopefully there's a community around the project. Tell me a little bit about the OpenTelemetry community. 
 

Austin Parker: It's getting big. I was pulling numbers on contributors yesterday, and I think we actually just hit a new record. 1,283 unique contributors in the past month, which is a very big number! We pretty continually look at our graphs of people that are coming in and opening PRs, and we have a lot of new people coming in. It slowly grows over time. You can kind of look at the line. I think there's really two different groups though. There's the end user community, and that's really big and diffuse. 

That's your SREs, DevOps people, developers, those are the people that are using OpenTelemetry. They're embedding it in their library or they're integrating it into their software, right? They're running Kubernetes and they're using hotel and they just want things that work. And then the contributor and sort of maintainer community, it's definitely smaller and it's definitely more biased towards people that work in the industry. One of the things that I think is really cool about OpenTelemetry that you don't necessarily see often is that it is really a community of rivals, right? A lot of the original maintainers and governance and everything came from competing vendors in the space. 
 

Katherine Druckman: Sure. Yeah. 
 

Austin Parker: Right now, actually, we have a booth here that you can go look, and the top 10 contributing companies are up there for the year. It's Splunk, Honeycomb, Elastic, Microsoft, I'd have to look, but it's a bunch of companies that are competing with each other, right? 
 

Katherine Druckman: Sure. It's common in open-source projects. 

Challenges with Vendor Lock-In 

Austin Parker: It's super common. And I think one of the things that we've done is that we've actually really leaned into this because ultimately, it's really good for all our users. Before OpenTelemetry, you had to generally use some proprietary agent, and it kind of locked you into a given vendor. You would pick vendor X and install their agent. And then if you wanted to go to vendor Y, then you have to rip out the vendor X agent and put in the new one. And hopefully, vendor Y supports all of your libraries and supports your version of whatever thing you're running. And hopefully you didn't build any custom integrations on top of that. Because now you have to migrate those to vendor Y agents. This probably surprises people, but in a lot of cases, vendors don't actually like a lock-in. In some ways, some vendors do. But if you think about it, a lock-in is actually really bad for businesses because it lets them just rest on their laurels. 

If I don't have to worry about competing, if I'm only competing with myself and not everyone else, then why am I going to innovate, right? It makes more sense to just focus on getting more value out of the people that are already using me versus doing new things. And it's also very expensive to maintain all of that bespoke instrumentation and duplicate it across all these different languages. All the vendors in the space kind of really came to this conclusion of: “Hey, this is just commodity data, right? We don't need to lock people in. We can actually build more value for our customers”. And by having a vendor-agnostic framework to work with, we can compete more with other people in the space. It becomes easier to switch, right? It's kind of a two-way sort of lock-in. If it's easy to get out, it's also easy to get in. 
 

Katherine Druckman: Okay, I could see that. 
 

Austin Parker: Right? 
 

Katherine Druckman: Interesting. Keeping projects of any stage or any size going takes a lot of humans. And in many cases, that means a lot of new blood. Eventually people get a little burned, they switch priorities or they move on to other things. How do you go about encouraging new contributions? 

Encouraging New Contributions 

Austin Parker: Oh, that's a great question. I think we're still figuring it out honestly, but some of it is, we are definitely looking for how we can broaden what the project is doing. How can we make it more accessible for people that aren't observability or telemetry nerds? Because I think that’s one of the real challenges you have, especially with projects like OpenTelemetry or anything that has a really high level of domain expertise. For example, there're parts of OpenTelemetry that are super math nerd stuff. Things like calculating sampling rates or algorithms to define histograms. And these are things that do matter, but to really, really get into them; you have got to really want to be in it. You have to go to bed with copies of calculus textbooks under your pillow. I love the people that work on this stuff, by the way, they're dear friends of mine. 
 

Katherine Druckman: The best kind of nerd, yeah. 
 

Austin Parker: But they are the best kinds of nerds because they're deeply into this one specific thing, and they talk to me about it, and I'm like, "Okay, that sounds cool. You lost me at lambda-P values or whatever, Q values. But okay." It's hard to find people like that, right? It's very difficult to just find people off the street that are going to jump into things like that. What we're trying to do is expand the scope of the project in very deliberate ways so that we can get people where they are, meet people where they are, right? Because maybe this is interesting to you, and you want to start, and you want to do something and be a part of the community. So over time, we've done things like participate in various intern programs like Outreachy and stuff to have targeted contributions. One thing we started doing this year is, we started localizing our documentation. 
 

Katherine Druckman: Okay. Excellent. 
 

Austin Parker: That's been a really powerful way to get new contributors in. We literally don't have a ton of people that speak non-English, right? So you have to bring people in. You have to find them and nurture them. But the cool thing about it is, a lot of times people that speak different languages are not in America. They're in other parts of the world. So you have to think about... 

 
Katherine Druckman: Time zones. 
 

Austin Parker: Time zones and meetings, but there's a flip side to it. Now you have people in these other time zones, so now you have to start doing meetings in those time zones. But that means that there are people that maybe, or in those time zones, that didn't feel super welcome because everything was Pacific Time, but now there's a local time, right? There're office hours that's in their time zone, and they can come in, they can ask questions, and we can get them connected, right? We can start to make more deliberate decisions about how are we going to increase the scope or increase the size of the contributor community by piggybacking off of simpler things. Like, “hey, we want to localize documentation”. That's one example. I think the other thing that we're trying to do is we're trying to get better about recognizing contribution. 
 

Katherine Druckman: Oh, great. 

Recognizing Community Contributions 

Austin Parker: We're actually announcing it later today, but we have our first community stars, we call them OpenTelemetry stars, and it's just people that got nominated by the community for their contributions to OpenTelemetry. That could be anything. It could be that you wrote a really great blog post that helped me understand this thing. Or you helped me with this issue in Slack, or your Stack Overflow answer was really helpful. So being able to recognize people that are putting in that kind of work, it's something we’ve wanted to do for a while, just never got around to it, I guess. But we're starting that this year. I was really impressed with how many responses we had! I think people want to recognize people. 
 

Katherine Druckman: Sure. Yeah, yeah, yeah. 
 

Austin Parker: But you have to ask, as a maintainer, as a leader, you have to give people ways to give that feedback. 
 

Katherine Druckman: Yeah. I love it. I think it is really important to show gratitude and appreciation for the people that keep projects going. And my personal feeling is if you use a thing and it is critical to your work, you should contribute. Because you want to sit at the table and you want to have some kind of contribution toward the direction of the project. And frankly, you want to see it keep going. All of those things, if all of those things help drive that, I think that's fabulous. 

Final Thoughts 

Austin Parker: I think the trickiest thing so far has been getting people over that bridge from end user community to contributor community. And if anyone listening to this has any brilliant ideas or has cracked the code there, feel free to look me up on Bluesky. 
 

Katherine Druckman: Yeah, I think a lot of people are working to solve this. A lot of people in the open source world. 
 

Austin Parker: It's tricky. I think we could talk for an hour specifically about this one thing, right? 

 
Katherine Druckman: Right, that's fair. 

 
Austin Parker: I've been to big companies, small companies. There is so much that goes into how organizations approach open source. 
 

Katherine Druckman: Someday we'll solve this one. But you know what? As I said in a previous episode, just raising the right questions is half the battle. 
 

Austin Parker: Exactly. Yeah. 
 

Katherine Druckman: Well, thank you so much. I will let you get back to the rest of your KubeCon, and I appreciate it. Until next time! 
 

Austin Parker: Thank you so much. This was super fun! 
 

Katherine Druckman: You've been listening to Open at Intel. Be sure to check out more about Intel’s work in the open source community at Open.Intel, on X, or on LinkedIn. We hope you join us again next time to geek out about open source.  

About the Guest 

Austin Parker, Director of Open Source, honeycomb.io 

Austin Parker is director of open source at honeycomb.io, an OpenTelemetry maintainer and governance member, author of several books, and all around great person. 

About the Host

Katherine Druckman, Open Source Security Evangelist, Intel  

Katherine Druckman, an Intel open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism. Find her on LinkedIn

1