Niki Manoledaki and Stephanie Hingtgen from Grafana discuss their open source community roles and contributions toward environmental sustainability. Niki serves as a co-chair of the Green Reviews Working Group within the CNCF Environmental Sustainability Technical Advisory Group, focusing on promoting energy and carbon efficiency. Stephanie works on both the open source Grafana project and Grafana Cloud, emphasizing the value of contributing to open source.
"Even on a software level, one initiative we started having was looking at, okay, when you're deploying a new feature, how much more memory and CPU are you using with this new feature?" — Stephanie Hingtgen
In this podcast, we discuss the importance of energy consumption metrics in technology, the use of Kubernetes for event-driven autoscaling through KEDA, and efforts to enhance operational and environmental efficiency. Niki and Stephanie also share insights on scaling applications and the relationship between cost reduction and environmental sustainability, introducing several projects like Karpenter and Kepler.
Katherine Druckman: Niki and Stephanie, thank you both so much for coming and talking to me because I know you're very busy giving some talks and connecting with the community as you do at these in-person events. I know you both work for Grafana, and you both have some interesting community roles that I wondered if you could introduce a little bit.
Niki Manoledaki: I'm part of the Green Reviews Working Group of the CNCF Environmental Sustainability TAG. That's the technical advisory group (TAG). And as part of that, in the past few years, we've been promoting and advocating for energy efficiency and monitoring and carbon efficiency and monitoring. There's also some relationship between cost and general resource utilization.
Katherine Druckman: And Stephanie?
Stephanie Hingtgen: Thanks so much for having us on. I work at Grafana on the open source part as well as Grafana Cloud.
Open Source Contributions and Projects
Katherine Druckman: It's nice to get paid to contribute to open source projects. It's more and more common the older I get. But it used to be a rare privilege to get to do that. A lot of times open source projects in the olden days were passion projects, right? And they were communities outside of the things that people did for their day jobs. And it's nice that now we're at a place where it's valued appropriately. So, let's start with the environmental sustainability group because I think that's pretty interesting. There is a moral benefit to keeping sustainability in mind, but there are so many other ones that I think people don't realize. I don't think a lot of people involved in this technology immediately think about the amount of power that we are all consuming when we are doing all this massive compute that we rely on for our day-to-day lives. So, tell us a little bit about that. Give us a picture of just how important that work is.
Niki Manoledaki: It's a new... it's an emerging field, and I think we're still building the use cases. What I would say, though, is especially energy consumption, which can then be converted into carbon if you have information about the geolocation where your application is running in the cloud. But starting with energy, it's actually a utilization metric. There are different types of metrics that you can use for utilization and energy consumption.
Katherine Druckman: Performance around energy has always been a concern. People don't want to spend money, right?
Niki Manoledaki: But now the attribution with pods, for example, is new, especially in the Kubernetes cloud native environments. There are also top-level requirements, where companies are trying to reduce their carbon footprint. One way to do that is through carbon monitoring and reduction. I did a talk last year, which was comparing the energy consumption of Flux versus Argo. And so, energy consumption can be one way to compare tools, for example, to decide which one you're going to use as an application, as a developer, or in the platform capacity. I'm in the platform department at Grafana, and one thing we want to know is, what's the utilization per team. One way to do that would be through energy, for example, which then is a direct progression of carbon as well.
Exploring Environmental Sustainability in Tech
Katherine Druckman: Before we talked, we had some emails going back and forth, which is lovely. And you provided me with a lot of really nice information about what it is that you do, and I appreciate that because not everybody does. You threw out a few project names, and I'm not familiar with any of them. It's been a while since I've had my hands on any role in this infrastructure. But tell me about Keda. I don't know what it is and what it does. I wonder, could you tell us a little bit about that and what your experience is?
Stephanie Hingtgen: So, Keda is Kubernetes event-driven autoscaling, and one thing that's cool about it is, usually in the HPA (HorizontalPodAutoscaler) in Kubernetes, you can only scale on CPU or memory, but Keda allows you to scale on anything. So, you can scale on things like Prometheus metrics or if your queue is getting too backed up, then you could scale on that.
Katherine Druckman: How does that fit into your work at Grafana?
Stephanie Hingtgen: We recently implemented an HPA using Keda, and before we had just done VPA (Vertical Pod Autoscaler), and it's a little tricky to scale on both. And so that's how Keda came into the picture; it allowed us to find something that wasn't correlated with needing to vertical autoscale as well.
Katherine Druckman: Let’s talk about scaling. Those of us who've been in the technology world for a while remember a very different world where things were less reliable than they are today. Does anybody remember the fail whale? Anyway, so scaling is not a new challenge, for sure, but I feel like we're getting better at it. What do you think? And how do these tools that you work with come into play?
Niki Manoledaki: That's in the platform department, my team is operating Keda and VPA. VPA stands for vertical pod autoscaler. It's part of SIG Autoscaling. SIG is a special interest group. The Kubernetes tools, the projects that encompass Kubernetes, like the scheduler or autoscalers, like the Vertical Pod Autoscaler, they all have their SIGs, or special interest groups, and then, later on, that's how TAG (technical advisory group), broke off from the SIGs because the TAGs have theme-based approaches like TAG observability, security, and environmental sustainability versus the SIGs.
Katherine Druckman: Can you contribute back upstream a lot to many of these projects that you use, and a lot of the communities are involved in? I know you must take a leadership role on environmental sustainability, but what other kinds of contributions can you make?
Niki Manoledaki: The sustainability work is also a passion project for me. I can do that as part of my day-to-day, but it's a little in-between for me. Last month, I contributed to the Kubernetes descheduler upstream, because there was a missing feature.
Katherine Druckman: I remember the first time I had an actual pull request merged into an open source project, and I felt I'd finally arrived. There are so many other ways to contribute, but it feels good because you're contributing to a thing that people use and depend on, and you're making the world a better place.
Niki Manoledaki: Especially for Kubernetes. For me, it's the first time I'm directly contributing to a Kubernetes project. I've been working with Kubernetes for a handful of years. So, for me, it's a career and personal achievement.
Katherine Druckman: Does it take a little bit of courage to work up the nerve to open a pull request? Because you worry that everybody's going to see that... I don't know.
Stephanie Hingtgen: Especially when it's a project that you don't work on in the day-to-day. Now, I've gotten used to opening pull requests that are open in the wild, but then, when I go to a different open source project and you don't know the maintainer sometimes ... so it's putting yourself out there.
Katherine Druckman: There's a little bit of diplomacy to that too, right? Because you're like, here's a thing that I would like to see happen. So, there's a thing that happens, a human element that's important, and it's the nicer side sometimes. Not always, but these days, it's usually the nicer, human interaction part of open source communities. Then you open the PR and people comment and review your code, and it turns out maybe you do know a thing. And it's like, wow, I know a thing. And it actually worked. And then it got merged. And, yeah, it feels pretty great.
Niki Manoledaki: It took five reviewers for my PR because it was large, and it took two weeks to review. It was intimidating. Also, with the diplomacy part, especially for Kubernetes, CNCF Slack is so important.
Scaling and Autoscaling: Insights and Challenges
Katherine Druckman: It used to be IRC back in the olden days, but now it's very much Slack. Slack and GitHub—that's where open source happens. So, again, you mentioned another project called Karpenter. Tell us a little bit about that. What is that?
Niki Manoledaki: Karpenter, I think originally, was an AWS project, and now it's been donated to the CNCF. I think to SIG Autoscaling. It's the same one as descheduler and VPA. I could be wrong but the ecosystem is hard to keep up with.
Katherine Druckman: Its massive, and that's a whole other episode, right there.
Niki Manoledaki: On the Grafana blog, we have more information about Karpenter. And can we link a resource? It's an autoscaler that helps to bin pack and land things on the right nodes. I have less experience with it, but it's definitely one of the cost optimization tools that we've seen a lot of success with.
Stephanie Hingtgen: It has helped a lot with reducing the number of nodes we are running in our clusters. My team specifically works on a unique workload in our cloud infrastructure. One problem we ended up having is we ran a fair number of things with one replica. And the way that Karpenter does autoscaling is it does it through the eviction API. So, if you only have one replica, all of a sudden that replica goes away. We ended up building these packing peanuts, which is what we call them. We deploy these small little pods that are the first to be evicted from the node so they can spin up the new node in time for our true workload to move over to it. We had to do a few tinkering things here and there to make it work.
Katherine Druckman: Did you run into headwinds, limitations, or problems?
Stephanie Hingtgen: That was the main one we ran into. For some of our workloads, we ended up scaling them up to two replicas and doing the pod disruption budget. That ensures that both pods are not evicted from the same node.
Cost vs. Environmental Sustainability
Niki Manoledaki: Fun fact, there is work being done by someone in the TAG, in environmental sustainability TAG for Karpenter to be carbon aware.
Katherine Druckman: Oh, okay.
Niki Manoledaki: That's new. We don't use that yet, but it is happening, and growing in momentum.
Katherine Druckman: We keep kind of hinting at it, but let's talk about the relationship between cost and environmental sustainability. Do you feel like it's a side benefit or that most people consider environmental sustainability and concern a side benefit of reducing costs? That it’s a second-class citizen a little bit?
Niki Manoledaki: Definitely. I think we lead with cost and then sustainability is a nice to have and the side effect of that. So, in many ways, it is directly related. In other ways, it is not related at all or has the opposite effect. The happy medium everyone wants to reach is the ways that there is a positive effect on both, and that’s usually doing more with less. Classic customization, bin packing, right-sizing, descheduling, deleting unused resources, monitoring idleness, and reducing idleness—this makes everyone happy in terms of cost, less emissions, and efficiency.
Katherine Druckman: Are there any examples of things that are cost-cutting measures that don't relate to sustainability?
Niki Manoledaki: Saving plans are beneficial for reducing costs but don't really affect sustainability. Then on the flip side, one thing that might be better for sustainability, but not for cost, for example, is moving to a region that has a lower carbon footprint, like in the Nordic countries, because there's more renewable energy. However, these may be more expensive in terms of the cloud service providers and what they offer in those regions. So, there's an inverse relationship there.
Katherine Druckman: My mom always likes to say that nothing adds up faster like a column of small numbers. There's something to think about in terms of costs and cost and sustainability—these small savings in various spots add up tremendously, especially when you're talking about massive scale. And I would imagine that this conversation involves shifting left, which is addressing things like security, documentation, and everything, earlier on. Now, of course, one might argue, if everything shifts left then what's to the right? That's another episode. Anyway, I wondered if you had any wisdom about how you address cost and sustainability savings, let's say, along the various points, and what points are there and where do you need to revisit? It almost seems the same way you would think about security vulnerabilities, right?
Stephanie Hingtgen: Because you can start from the node level and reduce the number of nodes that you're running in general, and bin pack with that. Then, on a software level, one initiative we started having was looking at when you're deploying a new feature. How much more memory and CPU are you using with this new feature? Are there ways that you can make that code more efficient and reduce your impact? Because at the scale, like you were saying, it can explode exponentially.
Katherine Druckman: We rely on all these technologies at a massive scale, right? We don't even know or see all the places where Kubernetes is used in our daily lives, or Linux, or any other open source software that's achieved wide adoption. And so, when you think about the enormity of the ecosystem, all of those places can be a point of failure or not in terms of cost savings or sustainability, and if we get it wrong, there are consequences. Spending money, or burning power has a negative impact. So, another thing that you mentioned is Kepler, and I wanted to ask you about it.
Niki Manoledaki: So, Kepler surfaces energy metrics from the kernel. If it doesn't have access to the kernel, for example, in a cloud environment, it has a pre-trained model, so it's already trained on different types of compute, and it can detect the architecture and make some educated guesses on the on the kernel level energy consumption, and then it attributes. That is not new. But what is new is the part where it attributes that energy consumption to a pod. And, I believe, it goes through the process ID in the kernel, or the PID, if it has access to the PID, and it associates energy and consumption per pod. So, it has a cool mechanism, especially on bare metal. It's very impressive. And it's a sandbox project in the CNCF, and we use that a lot in the Sustainability TAG as well.
Katherine Druckman: I'm going to pivot a little bit. I'd like to know more about how each of you got involved in open source software. What was your point of entry?
Personal Journeys into Open Source Software
Stephanie Hingtgen: I used Grafana at my previous company and started contributing to it there. Then ended up getting into Grafana to work on it and was excited about that. Here we're encouraged to be able to contribute whenever we can.
Katherine Druckman: That reminds me of my own path. And how about you, Niki?
Niki Manoledaki: I co-founded an open source application for NGOs that distribute donated items. It was a very simple application. It was called DropApp at the time then it turned into BoxWise. That was the first experience I had with open source. But then when I went into the cloud native world, I was a maintainer of the EKS CLI while I was at WeaveWorks. And that was in collaboration with AWS. So, Kubernetes releases new features and new releases and EKS implements them. The EKS CLI has to maintain feature parity. I learned a lot through that experience, especially with facilitating communities and responding to contributors, PRs, new issues, feature requests, help requests, web reports, and facilitation.
Closing Thoughts on Open Source and Sustainability
Katherine Druckman: It’s a lot of underappreciated glue work, as we call it, right? It goes to keeping these things running that we all depend on. I appreciate both of you coming in and doing this. I enjoyed this. I think sustainability is an important topic. We just did an interview with Marlow Weston at Intel. So, it's something I'm learning more about, and I'm trying to understand. I feel like I understand the impact. It's just that there are a lot of moving parts. I hope that other people appreciate the work you are doing.
Niki Manoledaki: Marlow is a big part of the community, and the wonderful people we get to work with, and like you, Marlow, and so many other amazing people, I love the open source community.
Katherine Druckman: Yeah, me too. Open source people are the best people. I agree with that a lot and people who are listening are tired of hearing it, but it is the truth. Thank you both so much. You've been listening to Open at Intel. Be sure to check out more from the Open at Intel podcast at open.intel.com/podcast and at open at Intel on Twitter, we hope you join us again next time to geek out about open source.
Guests
Niki Manoledaki is a software engineer, environmental sustainability advocate, keynote speaker, meetup organizer, and community facilitator. She advocates for environmental sustainability in the CNCF as a lead of the CNCF Environmental Sustainability TAG where she co-chairs the Green Reviews WG.
Stephanie Hingtgen is a senior software engineer II at Grafana Labs. As a member of the Grafana as a Service team, her focus has been on orchestrating thousands of Grafana instances in Kubernetes for Grafana Cloud. Her previous experience includes developing a private cloud platform to provision Kubernetes resources for engineers at Comcast.
About the Author
Katherine Druckman, Intel’s open source security evangelist, hosts the podcasts Open at Intel, Reality 2.0, and FLOSS Weekly. A security and privacy advocate, software engineer, and former digital director of Linux Journal, she's a long-time champion of open source and open standards. She is a software engineer and content creator with over a decade of experience in engineering, content strategy, product management, user experience, and technology evangelism.