Accelerating value delivery by empowering autonomous teams

Jennifer McVicker
Senior Enterprise Technical Architect
Automatic Summary

Accelerating Time to Market with DevOps Culture Strategies and Tools

Hello everyone, I am Jen McVicker, a senior enterprise technical architect at Atlassian with over 25 years of experience in the IT industry. In this blog post, I will share how adopting DevOps culture strategies and tools can significantly reduce time to market and boost developer productivity and satisfaction.

Understanding the DevOps culture

In the realm of software engineering, there's traditionally been a see-saw struggle between the frequency of deployments (the developers' side) and site reliability (the operations' side). Merging these two seemingly diametrically opposed groups into a cohesive unit with a shared set of goals was proposed back in 2009 by Patrick Du Bois, coining the term DEVops.

The importance of high-performing autonomous teams

Autonomous teams comprise a cross-functional group of individuals that can swiftly execute tasks as they do not have to wait for other teams. These teams are independent, value-aligned, and empowered, which is crucial for reduced dependencies and faster problem resolution. Such teams need to have a deep level of trust to function cohesively and produce substantial dividends.

Adopting an outcome over output approach

Rather than solely focusing on the output, the idea of rewarding outcomes becomes a significant difference-maker. People are incentivized to deliver what gets measured because they are held accountable for those metrics. Therefore, it's crucial to measure the right parameters.

Overcoming the continuous delivery vs continuous deployment confusion

While continuous delivery and continuous deployment might appear similar concepts, they're uniquely different. Continuous delivery implies that the software’s changes can be released on-demand and swiftly, yet safely. Continuous deployment extends this principle and automates the software's release to production. An efficient continuous delivery process is vital for successful continuous deployment.

Implementing the right tools

Software sprawl, a common issue with loosely coupled architectures, can be managed effectively by employing a tool such as Atlassian's Compass, which brings clarity to your Software Development Life Cycle (SDLC). It points out the status and risk level of services and elucidates the dependencies between various components.

Conquering the challenge with DevOps

By integrating loosely coupled architectures with continuous delivery and documenting associated risks and dependencies, you can pave the path towards increased deployment frequency. Empowering autonomous teams to focus on outcomes rather than output will also result in faster feedback loops, ultimately boosting productivity and customer satisfaction.

Key DevOps Takeaway

The most crucial takeaway from this blog post is that you get what you measure, so ensure you are measuring the right parameters. Delving deeper into DevOps requires an in-depth exploration beyond this blog post, so, feel free to get in touch if you wish to empower your teams with mature DevOps tools and processes.

Your time is greatly appreciated. I am absolutely thrilled to have had this chance to share my experiences with all of you through this post.


Video Transcription

Everyone. Hi, everyone. Thanks for joining me here today. I'm Jen mcvicker and I'm a senior enterprise technical architect at Atlassian. I've been in it for over 25 years and I've done everything from hands on development to program management and agile coaching, but I've only got 20 minutes here.

So I'm gonna jump right in today. I'd like to share how you can reduce time to market and increase developer satisfaction and productivity by adopting devops culture strategies and tools. So let's jump right in with devops culture in the world of software engineering. There has historically been constant tension between deployment frequency on the development side and site reliability. On the operations side. When you have separate operations and development teams, this can result in a tug of war. The operations team is trying to limit deployments because deployments introduce change which always carries some risk of failure. But on the other hand, product development teams are trying to deliver value as quickly as possible to start capitalizing on improvements. So how can we resolve this struggle?

Back in the late two thousands, software development and operations communities started raising concerns about this tug of war. They realized that these apparently diametrically opposed groups could work much more effectively if they worked together as a cohesive unit with a shared set of goals.

The term DEV ops was coined by Patrick Du Bois in 2009 to represent this new way of working. But it's not as simple as just mashing together two groups and telling them they're now a team. You've heard of forming Storming norm and performing, right? It's called Tuchman's model.

And it describes the life cycle of a team when a new team is created, they don't immediately begin working together as a cohesive unit. They don't know each other well enough yet to have developed trust. This is when it's important to develop working agreements because sooner or later conflict starts to arise within the group. And this is actually a good thing. It means that your team is starting to trust each other when disagreements and conflict crop up, it's perfectly normal.

And as long as you have team agreements in place to ensure that everyone treats each other with respect and that each person's opinion is considered valuable. You can harness that conflict to drive the collective team to deliver better outcomes. Why? Because the knowledge of the collective group is greater than that of any one person. When the group starts to really coalesce as a team, they'll begin sparring on problems and come up with solutions that take into account the diversity of experience among all team members. But teams need stability to develop this level of trust. That means not swapping team members in or out very often because every time the team changes, they're going to regress for a little bit. But if you're lucky enough to have a long lived stable autonomous team that has developed close bonds, they'll get to this point performing. This is where you start to see real dividends pay off. I snuck something in there. Did you notice autonomous? What does that word mean?

In terms of software development? Well, autonomous teams are independent, an autonomous team is made up of a cross functional group of people who can move quickly because they don't have to wait on other teams for dependencies. Ideally, all the skills needed to perform the work are encapsulated within the team. The key is to eliminate as many dependencies on other teams as possible while keeping your team at a reasonable size about 5 to 10 people. Now another key element here is value aligned. And by this, I mean that the team is responsible for the entire flow of a stream of value to the customer. A classic example is an e-commerce solution. It needs to include the product catalog mechanism for adding products to the shopping cart and the checkout process. But it might also include things like customer tracking to gather data around customers who abandon their cart. And finally autonomous teams are empowered, empowered to experiment, learn and pivot as needed without a lot of bureaucracy, empowered to decide how they will work and what tasks to prioritize.

Most of all, they're empowered to decide the best way to achieve the outcomes they've been asked to deliver. Now, this involves a lot of trust the team will deliver on their objectives. And it sounds scary to give up control. But if you empower your teams to be autonomous, you will be astounded at what they can accomplish. There's a secret to getting the best results from an autonomous team though. And I just mentioned it outcomes over output, simply put rewarding outcomes over output means that you're measuring the right things. People are incentivized to deliver what gets measured because they're held held accountable to those metrics and reviews. So make sure you're measuring the right things.

Let's take an example. You ask your team to build an avatar feature for the user profile section of your community website. You think customers will come back more frequently to the site. If they can see each other's faces or cartoon robot faces, the team takes a few sprints to deliver the feature because they need to implement a new backend process to optimize and store the images. So you finally launch the feature and it results in a very mild uptick in monthly active users for a few months, but then it drops back down. So all the effort that went into building that feature may ultimately not produce the desired outcome. What if instead you ask the team to increase customer engagement on the website? Do you see the difference? In the first, you're telling the team how you want them to solve the problem? And in the second, you're telling the team the problem you want them to solve. How did they know how to solve the problem though? Well, it's not easy but it is simple experiments much like with A I generated art, you are unlikely to get the outcome you're looking for on the first try. If A has taught us anything and I hope that it has over the past couple of decades, it's that we have to iterate learning to capitalize on the things that work and pivoting to something different.

When the feedback tells us we're going down the wrong path until we get to the outcome we ultimately want. This is the power of experimentation and autonomy. When the people who are closest to the work are also closest to the feedback, they can learn from that data and make the best decision about what experiment to try next without having to get approvals and buy in from three levels of management and a dozen other stakeholders first by making teams accountable to outcomes instead of output, you incentivize them to quickly learn what works and what doesn't, which naturally Luke leads to a learning centered culture.

The faster you can get feedback from customers, the faster you can learn what works and what doesn't. So let's take a look at how to get that feedback faster. Almost a decade ago, Google decided to research why some companies have high performing software teams and why others fall short. Google's research team known as Dora released the first state of devops report in 2014. And in it, they introduced four key metrics that correlate to strong engineering performance.

Those metrics are deployment frequency, which measures how often you deploy changes to production change lead time, which measures how long it takes from the time a developer starts writing code to the time it is released to end users change failure rate, which measures what percentage of changes pushed to production cause a failure and meantime to recovery, which measures how long it takes to restore service after a failure together.

They're known as the Dora metrics. And you'll notice that these are metrics that tie directly to strategic outcomes. Deployment frequency and change lead time point to how quickly you can run an experiment and get feedback change failure rate. And meantime to recovery tied directly to code quality and site reliability. Remember you get what you measure so measure the right things by focusing on these four metrics, your organization can reduce downtime and development cost while delivering value to your customers more rapidly.

And while all four of these are important, I'm really just gonna focus on two of them today. Deployment frequency and change failure rate by focusing on just these two elements, you can build a robust process that mitigates risk and reduces time to market resulting in faster feedback loops.

So let's take a look at a common bottleneck for deployment frequency that I see in a lot of organizations. The change approval board or cab cabs are usually made up of leaders across different areas of the company's technology stack. Those cab meetings often don't happen more than once or twice a week because it's really expensive to have a lot of highly paid people sitting around to give the thumbs up or down on a laundry list of changes. Because of this, many changes are delayed by several days before they're actually released into production. So what's the solution? Do we eliminate cab altogether? Heck, no cabs serve a very important purpose. Their primary focus is site reliability. Remember that yellow ops side of our infinite loop.

The cab is a key player there. Not only do they review proposed deployments in order to identify potential downstream effects or dependencies, they can also ensure that resources from any affected systems will be on call to address problems that might crop up during deployment such as database or network admins.

And this is really critical for complex deployments that affect many areas of an application. Ideally, though those dependencies and downstream effects are already known and communicated throughout the development process and the people deploying the changes would have the ability to resolve any problems arising with the change because that change would be isolated from other applications and services.

So how can we reach this ideal state? First step is a loosely coupled architecture. Now, a loosely coupled architecture is a fancy way of saying that the software is made up of multiple independent services, sometimes called components that interact with each other through API S.

For instance, let's say you have an online store, you might have a service that allows people to search for products that search service would call your product catalog service. And it does so by calling a specific URL, which is the catalog's API end point and passing in search criteria such as a keyword to get the results. The catalog service API then returns zero or more records that match that search criteria. Now, the search service doesn't care how the items are stored in the catalog, how they get added removed or updated. All the search service cares about is that when it sends a keyword to that URL, specific values will be returned such as the name price and thumbnail image of the item. Now, this also means that the catalog service can be updated at any time without affecting the search service at all. As long as the parameters that get sent and the values returned by the API stay the same. You could add in more product information such as customer ratings or even swap out the database altogether for a new one as long as those api endpoints don't change and the same values are returned, the search engine will continue to function.

So there's major benefits to a loosely coupled architecture, but there is one big drawback it turns into software sprawl. So software sprawl occurs when the number of applications or software components within an environment rapidly grows and changes. Sometimes this term is used in reference to the tools that your organization uses. But today, I'm talking about the applications and services that your organization creates sprawl can make it really difficult for traditional software project management to scale including cabs.

And it can be a nightmare for engineering teams to keep track of where dependencies exist, who owns what service and when changes are being deployed. So how can we solve this? Well, hang tight. We're gonna get there in just a minute. But first we need to, to take a little detour and talk about service tiers. Let's face it. Not every service is as critical as the next service tiers are a great way to manage risk. So I'm gonna do a quick walkthrough here of what service tiers are. So we're all on the same page. Tier one includes your most critical services. Any downtime for a tier one service is going to have a significant impact to customers or the company's bottom line. An example of a tier one service could be a login service or your credit card processor. Tier two services are still very important to the day to day operations. A tier two failure may cause serious degradation to the customer experience but it doesn't completely block customers from interacting with the service. Tier two services might include that search service we just talked about so customers could still add items to the cart by browsing through categories on the website. But they even if they can't search for specific products. Tier three services have a very minor impact to customers.

In fact, they might not even notice it or limited effects on internal systems. A tier three service might be something like that avatar display we talked about earlier. And finally, we come to tier four, these services would really have no significant effect on customers or internal business users.

A tier four service could be something like a sales report if it fails to generate. Um a short term failure isn't going to have a major impact even though that, that report is important to the business, uh it just might take a little longer to run it. So we have one more point to cover before we move on to tools. And that's the difference between continuous delivery and continuous deployment. These are related concepts that are often mistakenly used interchangeably, continuous delivery is defined by Google as the ability to release software changes on demand quickly safely and sustainably. It does not mean that the code has actually been deployed to production rather it means that the main trunk of your repository in your staging environment must be ready to be deployed to a live production environment at any time. Any testing or scanning that needs to happen before code is released must be done before the code is pushed to that preproduction environment. Remember our change failure rate. This is how we move the needle on that metric by ensuring that our code is deployment ready and fully tested before merging it into our staging environment. Continuous deployment takes this one step further and automates the release to production.

Now, it doesn't necessarily mean that as soon as the code is released to staging, it gets deployed there to production. Deployments might run on a schedule such as at the end of each sprint or at the end of each business day. But an automated deployment is dependent on a strong continuous delivery process. OK. Now we can talk about tools, remember our nemesis software sprawl. Well, software sprawl can be tamed by adopting a tool to catalog and document metadata about a service at Atlassian. We use compass.

This is a tool that we built specifically for the purpose of bringing clarity to loosely coupled microservices architecture. Other companies have built similar tools like ops level or backstage because we're all trying to solve the same problem. A lack of visibility into the status and risk level of services and the connections between them, for instance, identifying the service tier and whether or not this service is active and the dependencies between different components. Now suppose you're working on a service with dependencies on several other services rather than waiting to deploy all the changes at once, you can have each service deployed to production separately behind a feature flag and then enable all the changes at once. When you're ready to launch feature flags can minimize risk by allowing you to deploy changes to production in a dormant state and enable them later and they give you the flexibility to quickly back out changes that aren't performing the way you expect. So let's look at this all holistically. Now, when software is designed with a loosely coupled architecture, we reduce the risk of introducing changes to a single service in the application. If we assign an appropriate tier to the service and document any dependencies, we can identify which services are low risk.

When the DEV ops team is practicing continuous integration and continuous delivery, we know that code is in a in a consistently deployable state, having already passed security scanning and testing with feature flags, we can control when the new change takes effect separately from the process of deployment.

And if that DEV ops team is responsible for the end to end life cycle of the service and is held accountable for the quality of their deployments through tracking the change failure rate. They are naturally going to focus on ensuring that this metric is as favorable as possible.

If all these things are true. There's no real benefit to having cab review changes for low tier services prior to deployment. Even if a deployment does fail, you've ensured that it's only gonna have a minor effect on customers or internal business processes at the very worst.

And it can be rolled back quickly using feature flags by implementing a continuous delivery process for services with a loosely coupled architecture and documenting risk levels and dependencies. You can create a set of standards for preapproved changes which will unlock the ability to deliver value to customers on a more frequent basis. And by enabling an autonomous teams that are empowered to solve business problems. By focusing on outcomes or output, we can ensure that they're incentivized to learn quickly from mistakes, take ownership of the success of the application and create fast feedback loops. Now, if you only take one thing away from this session, I hope it's this, you get what you measure.

So measure the right things. Now, we've really just scratched the surface of DEV ops here today. There is so much more than can be covered in a 20 minute presentation. So if you're interested in have in helping your teams develop a more mature devs tool training process, feel free to scan this QR code to send me an email or look me up on linkedin. Thank you so much for your time today. I am so grateful to have had the opportunity to present at the women tech conference and to share my experience with all of you. I hope to hear from you soon.