Home

AI Inferencing Everywhere: Scaling Enterprise AI from Core to Edge

AI Inferencing Everywhere: Scaling Enterprise AI from Core to Edge

As AI moves into production, enterprises must solve for distributed execution across core and edge environments. This conversation explores how infrastructure is evolving to support scalable, real-time AI inferencing.

AI is moving out of the lab and into real-world environments, and the challenge is no longer building models, it’s running them everywhere.

At NVIDIA GTC, hosts Patrick Moorhead and Daniel Newman sit down with Vlad Rozanovich of Lenovo and Jon Alexander of Akamai to explore how distributed infrastructure is enabling enterprise AI inferencing from the core to the edge.

The conversation unpacks how enterprises are shifting from centralized AI architectures to highly distributed environments where performance, consistency, and security must hold across locations. Through Lenovo’s collaboration with Akamai, AI workloads are being deployed on infrastructure that spans data centers to edge locations, redefining how organizations think about cloud, latency, and execution. As new performance metrics like time-to-first-token gain importance, the discussion highlights how infrastructure decisions directly impact real-world AI outcomes.

Key Takeaways

🔹 AI deployment is shifting from centralized models to distributed inferencing environments
🔹 Time-to-first-token is emerging as a critical performance metric for AI workloads
🔹 Unified infrastructure is key to maintaining consistency across core and edge
🔹 Lenovo and Akamai are redefining what “cloud” means in the AI era
🔹 Edge AI is enabling faster, more secure, real-time decision-making

As AI scales, the ability to execute models consistently across distributed environments will define enterprise success.

Watch the full conversation at sixfivemedia.com and subscribe to our Youtube channel for more insights from NVIDIA GTC 2026.


Listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

Vlad Rozanovich:
So when you look at environments of what Akamai is trying to do to create that kind of agentic edge, it is how do you make sure that the performance, the latency, sovereignty, the data management portion of that is so critical because we're starting to see now that enterprises are actually valuing this performance level.

Patrick Moorhead: 

The Six Five is On The Road  here at NVIDIA GTC 2026 in San Jose. We are in the Lenovo booth here. Daniel, the show has been pretty much everything we expected. It's AI on all the time. Sure, we've moved from LLMs to agentic, but we're now about making it a reality.

Daniel Newman: 

Well, yeah, I mean, as we ushered in, in a practical sense, the Blackwell era, Office 4.6, GPT 5.4, your favorite for Black City computer. You're building right now, aren't you?

Patrick Moorhead: 

I would if I could.

Daniel Newman: 

So we ushered that in, but then we came here, and now we get to hear about the Rubin era. And I'm pretty sure we were told that all that Blackwell now needs to be upgraded. So it's been fun to have some of these conversations with all those people that are going to be doing the upgrading.

Patrick Moorhead: 

But then out of the other side of the mouth, we're talking about that Pascal is still over capacity and prices are going up.

Daniel Newman: Amper?

Patrick Moorhead: 

Yeah.

Daniel Newman: 

A100?

Patrick Moorhead:

Well, A1 and then Pascal. Crazy. I know it is. Hey, one of the things that has been a key thread here in the conversations we've been having, particularly in this booth, is there's one thing to say, hey, we're architecting it. Here's the paper it on. And the other thing is actually going off and scaling AI. Yeah. One thing we also talk a lot about, and it's easy for us to talk about as analysts, is, oh, look at the edge, look at the edge. Well, there's a lot of different definitions of the edge depending on who you talk to, but one for certain that we can all agree to are CDNs, right? And any, if you've ever pulled in any content anywhere, I know my website goes through a CDN, they have become pretty much the fabric sitting on the edge of fast response. And I can't imagine two better people to talk through that than Vlad. Vlad, great to see you.

Vlad Rozanovich: 

You too, Dan. I feel like I'm a regular now with you guys.

Patrick Moorhead: 

You're the third host. But the guest of the moment here is John from Akamai. Thank you.

Jon Alexander: 

Great to see you. No, it's a pleasure to be here. Thanks for the invitation.

Patrick Moorhead: 

How many GTCs has this been for you?

Jon Alexander: 

This is my first. Quite an experience.

Patrick Moorhead: 

What?

Jon Alexander:

 It's awesome. Yeah. No, this is all new for me.

Patrick Moorhead:

 Yeah, yeah. My first one is in 2011. Wow. But I was part of the chip business, so of course I would be here. He was only in his late 50s at that point.

Daniel Newman: 

So I look good. I'm Brian Johnson, actually. So nice to meet you. He's doing great. John Akamai is in a pivot right? I mean that's right that you CDN and there was one time when Akamai CDN was synonymous but now you're is inference edge cloud. You got this story clearly it's blowing up so we talk about kind of AI inference anywhere and everywhere you know. How does that proliferate in your world? How is AI decision making at the edge helping time to value? And what else is really driving that?

Jon Alexander:

 And by the way, that pivot for your business. Long question. A lot of parts.

Daniel Newman: 

A lot of parts. That's so Daniel.

Jon Alexander: 

I'll start with part one. Take them in order. Important transition for us. Akamai started off as a CDN business, built out a very large distributed edge platform. we led security on top of that and built a very large successful security business. And so now what we see our customers coming to us and saying, they're making a pivot into AI, they want us to help support them. And so a lot of this is customer driven for us. And we see that the industry is at this interesting pivot now where the foundational models have been trained, they exist, services like ChatGPT obviously drove huge adoption, Clo Code has been immensely successful. And now what we're seeing is our typical enterprise customer, the typical an enterprise that works with Akamai for our CDN security, they're adopting AI in their public-facing services. They've been using it internally to optimize things like their coding assistance, maybe enhance some of their business process. Now they're bringing it into their public-facing services, and that's where Akamai can really help them. And so we view our place in the ecosystem as a place To distribute that inference, bring it closer to end users where it needs to run, it needs to be performant, low latency, real time. That's what end users expect. And not every AI use case needs that, so maybe we should be clear. We're focused on the use cases that need to be highly distributed. That's where we've got our strength. And that's what… We're focused on, so as we build out the next wave of our infrastructure, bringing GPUs onto the Akamai platform, it's enabling the distributed workloads.

Patrick Moorhead:

 Yeah, I've been following the edge. I think Daniel said I started working in the 1950s with mainframes. I was mid-career. OK. But then the edge was 3270 tubes going directly into a cluster controller into a mainframe, right? Thankfully, we've evolved. Uh client server web cloud environments to distribution where it makes sense and one of the things that that keeps the edge from getting even more bifurcated is you have to have a management execution layer where there's consistency. What do you want this edge node to do versus the mothership, versus the endpoint, and how does that all sync? So, Vlad, this question's for you. It's a 10-parter. No, seriously, how are you together creating a unified execution environment where this node on the edge knows what it needs to do, this one in either the hyperscaler into the enterprise or the sovereign cloud knows what to do?

Vlad Rozanovich: 

Well, Pat, one of the things, you know, we're here at GTC and a lot of the things you hear about these vertically integrated large language model, frontier model, GPU, compute systems. The edge is not that. The edge is all about exactly what John just said. How do you look at low latency? How do you look at potential sovereignty of that data? And if you look at an environment like Akamai, which what, probably over, you know, 4,000 or so That's right, 4,400 now. See, 4,400 data centers. Think about that concept. And so some of those data centers, probably some are maybe in colos, some may be in enterprises, some may be at Akamai-owned facilities. And you're going to have this situation where maybe your power requirements are going to be different for some. Maybe you're going to have different requirements from a global services perspective. And so from a Lenovo perspective, these are the type of customers we love. Because with the global Lenovo network of operating in 180 countries worldwide, having manufacturing sites in 30 locations around the world, and having service locations around all of these countries around the world gives us an opportunity to service customers like Akamai where we understand what their edge requirements are. And now when you start looking at things like NVIDIA RTX 6000s, making sure that we're creating and building the most optimized server for Akamai so that they can actually go roll out that server in the most efficient, most power effective, and most performant way that is servicing their customers. Makes sense.

Daniel Newman: 

So Vlad, I want to rehash something we've talked about before because not everybody, sadly, not everybody watches every one of our shows. They should though. They should though. I have. They really should. And you have. Yes, you have. I do. Thank you. I comment a lot. Thank you. We said that at the exact same time.

Patrick Moorhead: 

Some people think we're the same person. So, I get that.

Daniel Newman: 

AI performance, meanwhile. Historically measured in flops, right? And now we're seeing, you know, remember we did was flops to tops. Yes. Now it's flops to tops to performance per watt, per dollar, per time to first token. We've talked about time to first token. But I'd love for you to kind of reiterate this, because it seems like this is the new metric everybody is chasing. Why is that becoming the benchmark that everyone cares about? And what is your view on what infrastructure needs to deliver that?

Vlad Rozanovich: 

Well and this is interesting when you start looking at that large language space time to first token is paramount because you are living within a certain power envelope to say how much performance can I get at can I get out of it however when you start looking at the edge it's really performance per use case. It's not about time to you know time to first token because it's not about creating that token but it is performance to use case and so when you look at environments of what Akamai is trying to do to create that kind of agentic edge. It is how do you make sure that they the performance the latency sovereignty the data management portion of that is so critical that. The performance level is important because we're starting to see now that enterprises are actually valuing this performance level. If Akamai's customers have a latency of some type of an agentic model or some kind of a simulation, and if the latency is too long, you know what? Customers, the enterprise is going to get disappointed. And so making sure you have that performance level at the edge is so critical. And it's important for us to make sure that we're working closely with companies like NVIDIA to say, here's the power level requirements that we think we can get to to have the most optimal performance machine at that edge. Yeah.

Patrick Moorhead: 

So whether we're talking to CIOs or even infrastructure, people who are in infrastructure, we talk about scaling AI. It's classic, right? You do experiments. You do POCs. Then you scale it. Then you rinse and repeat about 5,000 times for what you do. And hopefully, you are where you want to. The two of you are at both of those levels. The two of you helped each other scale your AI, particularly at Akamai. And I'm curious, can you talk a little bit about how that came to be, how the two of you worked together to scale you, and for bonus points, how you're doing that to help your ultimate end customers?

Jon Alexander: 

Yeah, absolutely.

Daniel Newman: 

I'll begin and then… The seven-part question. You're like trying to break… Because I asked the first one, I'm like, this is long. And ever since then, every question you've asked has gotten longer. Dan, I just want to be like you.

Patrick Moorhead: 

So, fair.

Jon Alexander: 

So, first of all, we've had a very long relationship with Lenovo. And so, that's the first thing. We build very deep relationships. We co-design from the ground up. And so, Vlad talked about the power as one example. There's a lot that goes into this. But… Certainly with the GPUs, they consume a lot of power. And so cooling those servers, making most effective use of the data center space, and obviously being effective use of power is incredibly important to us. And so this is something that Lenovo did amazingly well for us, is an incredibly efficient, dense server that was able to cool, and it's an air-cooled chassis as well. And so we're kind of trying to ride that curve in the moment where a lot of our facilities aren't ready for liquid cooling. You mentioned we're in 4,400 locations globally. So that ranges from tens of kilowatts up to tens of megawatts. Right. And so obviously like the larger, more modern facilities, we've got liquid cooling available, but we're in a lot of telco networks around the world, like deployed inside the ISP last mile network. And so air cooling with relatively low power density is all that's available. So that was one thing. Supply chain, time to market. I mean, this industry, when you look around, everything going on here is moving incredibly fast, even if you go back six months. When we announced the Akamai Inference Cloud at GTCDC, the first time we really made a commitment to launch our AI product lines, that was six short months ago. And so we called Vlad up shortly after and said like, hey, can you ship us a huge number of servers, and we need them. On site quickly and that's where Lenovo was able like they own so much of the supply chain about the distribution centers we've got our teams really well coordinated to make sure that we're able to deliver to customers quickly and so I think that's the final piece of like how do we translate this into meeting customer needs. Customers can't tell us where they're going to be in three months' time. And so the requirements, people are knocking on the door every day saying, hey, I need this over here, and someone else needs something else and something else. And so we need to be able to respond very, very quickly. Lenovo's been a great partner. It's been able to adapt to those fast-changing requirements.

Vlad Rozanovich: 

Makes sense. I think one of the other things we saw when we first started working together, Akamai has such a requirement on security. John mentioned it early on, is that when they started doing CDNs and then layering security on top of that, The process that Lenovo and Akamai went through on the security side was years, years in the making. I mean really getting down into let's make sure that we look at firmware requirements from a security perspective. Let's make sure we are operating in a way that Akamai's customers' privacy and security is probably the most important part of their business. And it's one of the reasons their industry leaders at it. And so that process alone of making sure that we understand and Akamai understands our firmware, our stack, all of the inner workings of the server, not just the hardware and the physical cooling and the power, but actually understanding the firmware level to ensure that this is going to be the most secure server that we can deliver, that was an important part of the conversation as well.

Daniel Newman: 

So if Akamai is becoming this next generation distributed cloud, maybe the world's most distributed cloud, I think you do. Yeah, I think so. And you're powering that infrastructure. Is there a new term? Do we need to change? Is there a new definition of what cloud is in this AI era, Vlad?

Vlad Rozanovich: 

Boy, I think it's a combination of the agentic secure edge, right? Because that's That's really what we're trying to accomplish. Ace. Ace. There we go. There's only two hard problems.

Patrick Moorhead: 

You heard it here first, folks.

Jon Alexander: 

Naming and cashing. But naming's number one. I'll let the marketing team come up with the names.

Vlad Rozanovich: 

Yeah, see, I'm not the marketing guy.

Daniel Newman: 

Well, I figured it was fair to ask you a tough question. I think this CMO guy is out there somewhere in the crowd. Probably is. He's probably judging or wondering if he's safe.

Vlad Rozanovich: 

I'm surprised he hasn't texted me. He may have.

Daniel Newman: 

You turned your notifications off. Well, Vlad, I appreciate you and John taking the time to chat to us a little bit. Of course. I hope you both have a wonderful GTC. And Vlad, third chair. Third. I love it.

Vlad Rozanovich: 

I'm going to be a co-host. You are. The tri force. Now wait a second.

Patrick Moorhead: 

By the way, trust me, it doesn't take a lot of skill to do what we do.

Vlad Rozanovich:

 You guys do a great job. And bringing stories like what Akamai is doing on the cloud and on the edge, bringing that to your viewers is really awesome. It is great.

Daniel Newman: 

Well, congratulations on all the progress. Have a great rest of your show. And of course, all of you out there have a great show as well. We appreciate you being part of this 6.5. We are on the road here at GTC 2026. In San Jose, stay with us for all our coverage here on The Six Five. Check out all the other great conversations we had right here in this booth and of course across GTC. But we gotta go for now. We'll see you all later.

MORE VIDEOS

Rethinking Data Security, Governance, and Resilience for the Agentic Era

Anand Eswaran and Rehan Jalil join Patrick Moorhead and Daniel Newman to discuss how enterprises must rethink data governance, security, and resilience as AI shifts toward real-world deployment. The conversation explores why unstructured data is central to AI and how organizations can build trust at scale.

How Autonomous IT Is Redefining Enterprise Operations

Matt Quinn, CTO of Tanium, joins Patrick Moorhead and Daniel Newman at RSAC 2026 to discuss how Autonomous IT is transforming enterprise operations, shifting from reactive systems to real-time, AI-driven decision-making at the endpoint.

Resilience in the AI Era: Why Security, Data, and Recovery Must Converge

At RSAC 2026, Commvault’s Anna Griffin and Michelle Graff join Patrick Moorhead and Daniel Newman to discuss how AI is reshaping resilience strategy. The conversation explores ResOps, platform unification, and why security, identity, and recovery must converge in the AI era.

See more

Other Categories

CYBERSECURITY

QUANTUM