Emerging Trends & the Future of Tech by Sharbani Roy

Sharbani Roy
Vice President of AI Services

Reviews

5
Average: 5 (1 vote)
Automatic Summary

The Future of AI: Insights from Arm's Sharboni

Welcome to our exploration of the evolving landscape of artificial intelligence (AI), as shared by Sharboni from Arm. With decades of experience spanning Amazon, Google, and now Arm, Sharboni emphasizes not only the technological advancements in AI but also the importance of human-centric solutions. This article provides an overview of the key concepts, trends, and foundational elements driving the future of AI.

Understanding the Evolution of AI

The journey of AI began with rigid, rule-based systems where developers had to program every possible scenario. The landscape dramatically changed with the advent of machine learning (ML), allowing systems to learn from data and identify patterns autonomously. This transition has revolutionized AI capabilities in areas like:

  • Natural language understanding
  • Image recognition
  • Complex game strategies (like Chess and Go)

Furthermore, advancements in deep learning and neural networks have mimicked human brain structures, enabling breakthroughs in image and speech recognition. We are now in the era of generative AI, where AI systems learn from unstructured data to create new content across modalities such as text, images, and music.

Key Areas Driving AI Innovation

As we delve deeper into this technological transformation, four critical areas emerge as driving forces:

  1. Compute: The evolution of compute paradigms to support increasingly complex models.
  2. Data: The significance of quality data in training models effectively.
  3. Models: The innovation in model architectures, including small language models and multimodal systems.
  4. Applications: The practical use of AI across diverse sectors and devices.

The Role of Compute and Data in AI

To build efficient models, especially in generative AI, we require not just raw data but also superior training methodologies. The convergence of multiple types of data (text, images, and code) enhances the richness of AI responses. When examining compute capabilities, it is essential to consider:

  • Programmability: Modern compute must be adaptable and scalable.
  • Optimization: From silicon to software stacks, everything must be finely tuned.
  • Real-time inference: Enabling quicker responses and smarter device interactions.

Applications of AI: Moving Towards the User

The evolution of applications is steering AI from centralized and cloud-heavy structures to more decentralized, user-focused solutions. Today, applications can be built and deployed with seamless efficiency:

  • AI can be accessed directly within browsers, enabling real-time collaboration.
  • Autonomous systems or agents require minimal human input to complete tasks.
  • Applications are expected to operate intelligently, whether on smartphones, embedded systems, or cloud environments.

For instance, Arm's recent Arm Accuracy Super Resolution technology boosts graphics performance while conserving battery life, demonstrating how AI can enhance user experiences even on mobile devices.

The Shift to Device-Native ML

As the focus shifts towards device-native machine learning, developers are moving away from oversized models. Instead, there's a growing trend of utilizing smaller, more efficient models that can run effectively on consumer hardware. An example of this innovation is:

  • Stable Audio Open Small: A generative audio model designed to operate efficiently on devices with limited resources.

This paradigm shift emphasizes the need to rethink scale not as a constraint, but as an innovative opportunity, leading developers to create solutions that run seamlessly anywhere.

Building for the Future: Inclusive and Human-Centric AI

As we conclude our discussion, it's essential to consider the implications of these insights. The question posed by Sharboni remains crucial:

What big problems are you passionate about solving?

Whether you are developing the next generation of enterprise agents, mobile applications, or creative AI tools, the foundations laid in compute, modeling, and user-centric design are here to guide your efforts. A few final thoughts:

  • Prioritize accessibility and context in AI applications.
  • Develop systems that are inclusive and deeply human-centric.
  • Embrace the challenge of building scalable solutions that can adapt to diverse user needs.

Let's build a future where


Video Transcription

Thank you so much. Hey, everybody. I'm Sharboni, and I work on AI and developer platforms at Arm.Over the last few decades, I've evolved my career to focus on the intersection of systems, platforms, and people from building cutting edge consumer products to developer ecosystems to scaling state of the art AI for real world impact at companies like Amazon, Google, and now ARM. 

So what's inspired me along the way is the opportunity to build not just technology, but momentum for others, creating tools, pathways, and platforms that enable more people to thrive in tech. So over the next fifteen to twenty minutes, I'm gonna dig into a quick overview of technical foundations of AI, a little bit of history and the key areas behind these breakthroughs, some insights into a few trends in AI and impact on the tech landscape that might be relevant for you, and some of the emerging, tech in key breakthrough areas. 

But before we get started, I wanna ask y'all something. What are the real needs for real people that you're passionate about solving? Each of you is taking the time to listen to me today and all of the other amazing speakers because you probably got some problems top of mind that you're really, really wrestling with. So we can all be on top of those latest trends really deep in the tech, but it's also important to always root ourselves back into the big problems that we're working on. So I encourage you to hold this top of mind as I take us through a few, different areas. So quick overview. Evolution of AI. Let's ground this conversation and how we got here. In the early days, AI systems were entirely role based. Developers had to explicitly program every possible scenario, if this, then that. That made it extremely rigid and limited in handling real world unpredictability. 

But machine learning algorithms enable AI systems to learn from data, identifying patterns and making predictions without the need for explicit programming. This shift has revolutionized AI's capabilities, allowing you to excel in tasks like natural language understanding, image recognition, and even playing complex games like chess and go. A few decades ago, there was a notable shift in toward machine learning, which significantly enhanced the flexibility and applications go for the AI. So at the forefront of this transformation are deep learning and neural networks. Deep learning is a subset of ML and involves neural networks with multiple layers mimicking the structure of a human brain. And this has led to remarkable achievements, particularly in fields like image and speech speech recognition. 

So neural networks can now identify objects and images with exceptional accuracy, paving the way for applications such as autonomous vehicles, facial recognition systems, and medical imaging analysis. And now we've entered the age of generative AI, where systems can learn from unstructured data of all modalities and generate new content across multiple modalities such as images, text, and even music, and begin chaining these together to better understand the context and actions we wanna take. 

This represents a new frontier in AI development, promising even greater capabilities and process possibilities for the future. Seems pretty simple. Right? So when I think about all of the future technology, especially in AI, it's critical to understand the underlying trends that have converged to drive today's innovation. These aren't isolated breakthroughs. They're compounding forces that set the stage for real inflection points. So I think about four key underlying areas, compute, data, models, and applications. So spoiler alert, I'm not gonna go left or right here. We've already briefly touched on some of the model evolution, so feel free to pause and rewind if you missed that section. But what's really exciting today in modeling is how the compute paradigm is evolving to keep pace with increasingly complex AI models. 

So beyond LLMs, we're seeing a ton of innovation in small language models, SLMs, multimodal models, and everybody's favorite, agentic frameworks. And all of these are pushing the boundaries of what models can do. They require energy aware compute, full stack optimization, and even new tool chains designed for things like autonomous agents. We're also seeing a blending of engineering and science workflows fundamentally changing what are the jobs to be done, what are the workflows to be done, and what is to be used separate disciplines, so engineering and science, are now intersecting at the model level. So the item I wanna go into is data. You can't have great models without great data, and that includes not just the raw data, but how you're training the models, providing top quality eval or evaluations, and helping the models learn from the best data at the right time. 

So this is a big trend behind generative AI because for meaningful applications further up the stack, we need multimodal data, which means a wide range of inputs and outputs, images, text, code, and even video, sometimes taking it and maxing it all together. This allows AI systems to engage with information more holistically, producing richer, more context aware responses, leading to new possibilities in various industries. And then we're gonna go to compute. So you wanna do all of the things that I just talked about, but you wanna do it really fast and on really complicated data. You need a lot of power and compute. And it's really more like a black box to many software developers and even scientists applying the models, but it's kind of by design. It's not just about horsepower. Modern compute is incredibly complex from heterogeneous silicon to multilayered software stacks. 

To support the scale of today's models and the agentic workflows of tomorrow, compute has to be programmable, scalable, and optimized from silicon all the way up. And that's what enables real time inference, lower latency, and smarter on device on device AI, which is what feeds meaningful applications. So put simply, I think about applications as to how developers and researchers and end customers can practically use, this technology. Major improvements in accessibility have come from progress in infrastructure, compute, and especially user experience, which is fundamentally changing and we're gonna dig into a little bit later. Now, I mean, I remember doing research twenty, twenty five years ago and often you need remote access to a supercomputer located in another state just to run some very basic models. But today, applications are simplifying AI development and deployment, making it easier for individuals and businesses of all sizes to leverage AI capabilities. 

For example, today, I can submit up a model in my browser, collaborate in real time, and deploy to production, sometimes all within a single ID or integrated developer environment. Even more impressive, I can sometimes just talk to that environment. I mean, think about how many of us are just chatting directly with our favorite large model. And that's how we've come so far with natural interfaces and agent powered workflows. This is what's lowering the barrier to AI native development, tools and platforms that simplify the stack, reduce friction, and enable everyone from start ups to enterprises to build with AI. Now you've all been probably talking about AI agents, which is super exciting. And these are autonomous systems that can complete tasks, make decisions, even collaborate with other agents, all with minimal human input, ideally. 

These are becoming the building blocks for the next generation of applications, and that's the trajectory we're on. It's what's really enabling this next wave of AI native applications. So let's dig a little deeper, into some customer applications. Whether the cloud or edge, end consumers want their applications to work wherever they are. Think about your favorite application you've just been using. It's probably on your smartphone, maybe one of your connected devices. But your expectations every time you're using it is probably you'd expect it to be smarter and smarter. Oh, and probably a lot faster, real time or near real time. So maybe you are thinking about something when you're thinking about your favorite applications about advanced graphics and rich real time experiences. Maybe you've been watching a movie or maybe you've been playing some of your favorite mobile games. 

Your phones are smaller than, you know, a big data center that you might access in the cloud, And you need to be shrinking models or leveraging key tech like graphics upscaling with AI to have optimal performance on your device. And this is one of the big trends. How can you make more things operate on your device? So for example, at Arm, we recently released Arm accuracy super resolution, which boosts frames per or FPS up to 30% while maintaining visuals using less battery. We do that by rendering certain stages of a frame at a lower resolution and subsequently upscaling them. So ARM ASR reduces your GPU workload, which is one form of compute, and power consumption while enabling complex graphics and lighting effects on mobile devices. So how are some of these trends showing up in other real applications? 

We're seeing advancements in graphics, shrinking models, in generative I, and leading all more to a robust potential future for Genta k I. Things that are emerging on the in the scene that are probably on the tip of your tongue are how models hold more and more advanced levels of context, such as with model context protocol or MCP, when interacting with LLMs, and the future of agentic support and software ecosystems that will be able to take care of increasingly complex tasks. 

In productivity and communications, which we're all using every day, real time agents are running inference on phones and laptops and not just responding, but actually planning. And this is really the shift to consider in applications from AI as an API call to AI as a runtime environment, one that adapts to context, device, and interaction. So what about all of these underlying areas that we just talked about in some of these application examples do we really need to consider? What data do we need to make these a reality? How do our model strategies need to react? And what are the underlying compute trends that will power all of this? So when I think about models, we we can talk a little bit about that. We mentioned that applications seem to be smarter and faster, but also available wherever consumers are, especially on their device, in their pockets, on their hands. 

For a long time, we were chasing bigger models, hundreds of billions of parameters, and we still have huge models coming out that are revolutionizing the industry. But in many use cases, especially for a a much larger majority of developers, especially on device, that's not what we need. We see a big trend moving towards small, large models. So an example that we recently have been working on is in collaboration with Stability AI, where we help them launch Stable Audio Open Small, which is a fully open source generative audio model designed to run efficiently on consumer hardware. It, uses a combination of n eight and f p 16 quantization, which is basically a strategy, for how you can make models smaller and and run on in compute constrained environments. And it can generate ten seconds of stereo audio in seven seconds about. And it runs entirely on device with less than three gigabytes of RAM and no Internet connection. 

So imagine you can be generating music, loops, texture, and effects. And it's built with deployment in mind, not just research. It's showing us what is possible when we rethink scale as a constraint to innovate around, not a target to chase. And that's not just a performance stat or cool techniques. This is a shift in philosophy. Many developers are moving from centralized compute heavy systems to lightweight creative accessible models that can run anywhere. And data. So going back to data, it's so important. This is really the key to everything. Good data in, great model out. But how we handle it, it's kinda changing. So sometimes, you don't have access to the data you want because it's on device. 

So going back to the device side, we're saying that sometimes, you know, the dev data needs to stay on device for a number of reasons. So why? Why would we think that would happen? Well, there's a number of reasons. We talked about lower latency when you want a true real time experience. Like, we're on this video call and I want certain things to be happening real time or if I'm taking a video or photos. There's a bunch of stuff that I might wanna stay on my device. I'm sure if y'all pause for a you can probably think of a few examples for yourself. That's a lot around improved privacy as well. And also, not having to have dependency on the cloud. Think about when you lost your connectivity, like when you're on a long commute, subway, you get stuck in a tunnel, or you're on a plane. 

So here at Arm, we sometimes like to do things behind the scene to make things just kinda work better in the apps for you further up the stack. So take, for example, Clity dot ai, which is a core piece of enabling AI to run efficiently at the edge without depending on cloud compute or specialized accelerators. It brings serious a k API AI capabilities to the most pervasive compute platform in the world, the ARM CPU. So quick stat, about 99% of all smartphones are powered by ARM. And we expect that close to 50% of all new server chips, shipped to top top hyperscalers in 2025 will be ARM based. It's pretty good reach. So it's designed our Clarity software is designed to optimize AI model execution on those ARM CPUs, again, which are super pervasive, especially for on device low latency inference in mobile, embedded, and edge computing environments. 

It's optimized with your favorite ML frameworks like PyTorch, Onyx, RT Lite, which is BTF Lite, or MediaPipe, which run directly on ARM CPUs. And so that means that the data never has to leave the device, and inference can happen in milliseconds even on low power hardware. This opens up AI to more context to help you do everything day things that you need around productivity, capturing memories, instantly improving the photos and videos you take. But it also, you can think about the breadth of applications that it this opens up, like rural health, personal journaling, personal medical tracking, and even offline AI agents. We're also seeing new data pipelines that will support continuous learning and evaluation, so not just batch training. And this will become essential for agents' agentic workflows that adapt in the real world. Now compute. So compute used to mean buy a bigger chip. 

Now it means optimize the entire stack. And this shift from cloud to edge is pushing us to rethink. How do we use CPUs, GPUs, and MPUs together? How do we design run times that are lightweight but flexible? And how do we scale down models without sacrificing usefulness? So at Arm, we're focused on device native ML, as much as possible. That means that quantized models, modular runtimes, instruction tuning, and developer stack that abstracts the hardware but still unlocks its performance. We're working across the ecosystem to make this real, be it in data centers or on your favorite connected devices or even in your car. We support your favorite technology, all working on a Gentic, multimodal, and fast systems on top of optimized compute stack. Arm powers AI everywhere. 

The Arm AI compute platform is a scalable, power efficient, and high performance computing foundation designed to enable AI everywhere from across all those devices that we talked about. And it combines ARM CPUs, MPUs, GPUs, and optimized software to accelerate AI workloads across diverse applications. And it isn't about exotic hardware. It's about making everyday chips more useful, more intelligent, and leveraging software to unlock the power of hardware. So pulling this all back together, what do we do with all of this? We build for scale, for accessibility, and for real world context. Go back to that question I asked you at the very beginning. What is the big problem that you're trying to solve? Whether you're obsessing over building the next enterprise agent, a mobile app that works offline, or a creative tool that writes music, the foundations are here. 

Compute that scales, tooling that abstracts complexity, models that adapt, and data live that lives close to the user, to the consumers. But most importantly, let's remember who we're building for. Build systems that are technically excellent. Yes. But also context aware, inclusive, and deeply human. Let's go build the future together for everyone, everywhere.