ShipTalk - Lean into pipelines for speed, efficiency, and reliability - Bryan Finster - Defense Unicorns Artwork

ShipTalk - SRE, DevOps, Platform Engineering, Software Delivery

ShipTalk is the podcast series on the ins, outs, ups, and downs of software delivery. This series dives into the vast ocean Software Delivery, bringing aboard industry tech leaders, seasoned engineers, and insightful customers to navigate through the currents of the ever-evolving software landscape. Each session explores the real-world challenges and victories encountered by today’s tech innovators.

Whether you’re an Engineering Manager, Software Engineer, or an enthusiast in Software delivery is your interest, you’ll gain invaluable insights, and equip yourself with the knowledge to sail through the complex waters of software delivery.

Our seasoned guests are here to share their stories, shining a light on the do's, don’ts, and the “I wish I knew” of the tech world.If you would like to be a guest on ShipTalk, send an e-mail to podcast@shiptalk.io. Be sure to check out our sponsors website - Harness.io

All Episodes

ShipTalk - SRE, DevOps, Platform Engineering, Software Delivery

ShipTalk - Lean into pipelines for speed, efficiency, and reliability - Bryan Finster - Defense Unicorns

February 22, 2023 • Jim Hirschauer • Season 2 • Episode 1

In this episode of ShipTalk (The SRE Edition), Bryan Finster discusses how to ensure reliability and resiliency across the SDLC by making everything part of the pipeline.

Introductions
Just for fun #1 - Bryan's favorite hobby
Main topic - Leaning into delivery pipelines to ensure reliability and resiliency while creating efficiencies.
Just for fun #2 - Bryan's worst IT mess up
Closing

Jim Hirschauer: 0:10

Hey everybody. Welcome to Ship Talk, the SRE edition. I'm Jim Hirschauer, your host for today. Ship Talk is a DevOps podcast brought to you by Harness, the software delivery platform, and the SRE edition focuses on reliability topics. My guest today is Brian Finster. Thanks for joining us. Brian, you're from Defense Unicorns and welcome to the show.

Bryan Finster: 0:31

Yeah, thanks so much for having me, Jim.

Jim Hirschauer: 0:32

Brian, why don't you take a minute to share your background. Tell us a little bit about defense unicorns, and talk about any other projects you might wanna fill us in on.

Bryan Finster: 0:40

Sure. So I've been a software engineer for nearly 30 years now. I, I still do that most of the time in supply chain software, which really kind of informs my attitudes towards delivering software because it's, you know, it's, it's a supply chain problem, just a different kind of thing. We're, we're moving for the last several years I've been working on how do we improve the flow of software delivery, focusing on implementing continuous delivery as a workflow. You know, what are the challenges around that? Yeah. And you know, I spent a long time at Walmart doing that, and then I, I got stolen away by defense unicorns and. Defense unicorns, you know, we're trying to solve some really hard software supply chain problems. You know, how do you deliver you know, a hardened secure solution to top secret environments, you know, that are air gap where you can't, you don't have access to normal tools, that sort of thing. Yeah. You know, just really trying to solve. That, that problem space, if you can solve that delivery problem, everything else is easy.

Jim Hirschauer: 1:39

Yeah, that's a very difficult problem for sure.

Bryan Finster: 1:41

But that's, I love working really hard problems and this is a real fun one.

Jim Hirschauer: 1:45

Great. Sounds exciting. All right, so look, I appreciate the background. I feel like you're gonna drop some knowledge on us today. I'll try. Before we get there, though yeah. On this podcast, we'd like to have a little bit of fun first. And so what I'd love to do is to hear about your hobbies. Everybody has hobbies outside of work. And why don't you pick one of your favorite hobbies and, and fill us in.

Bryan Finster: 2:05

Well, motorcycling I think is, is one of these things that I really geek out about. I mean, I'm like a, you're not gonna catch me saying this brand is the best, cuz I'm like a nerd about motorcycles, new bikes come out and I'll go and try to test ride'em. Not because I wanna buy one just cause like, Hey, what's this new bike like right now? Nice. Yeah. Love motorcycles. And I, I love distance riding. I, I, I ride a BMW GS adventure. And I've taken that thing on thousand mile trips in a day. You know, it's, it's fun. It still kind of gets the adrenaline junkie part of me going as well. I love diving into corners.

Jim Hirschauer: 2:40

Nice. Yeah. Well, I don't know a whole lot about motorcycles. I definitely know that BMW makes some fantastic motorcycles for sure. And it sounds like you have an adventure bike. Those are the ones that you can kind of take on road offroad, right?

Bryan Finster: 2:51

yes. But I'm realistic. They make bikes that are slightly more reliable than Harley Davidson's and much less reliable than Honda's. You just need to know where you're sitting. Okay. In a reliability space. Yeah.

Jim Hirschauer: 3:03

Good to know. So any, any fun stories to share about riding your motorcycle on these long trips?

Bryan Finster: 3:08

Just, you know, I love getting there and I love going places where there, where I can be challenged. You know, there's a road in north North Carolina, a Tennessee border called the Dragon Us 1 29. I've heard of it. Yeah. 318 corners and 11 miles. Yeah. And taking a big adventure bike like that, and a big guy like me, I'm six six, I'm not a small person. And diving into those corners and, and just challenging how do I do this corner better? How do I do this corner better? I mean, it's a lot like how I try to deliver software. It's like, you know, small incremental improvement all the time. You know.

Jim Hirschauer: 3:40

Awesome, awesome. Diving into corners.

Bryan Finster: 3:42

Yeah. You know, and, this is something that I sometimes try to use this as an analogy because there's habits that you have to develop. Motorcycles are very counterintuitive. In a car, sometimes the safest thing to do is to hit the brakes. You're coming into a corner too fast in a car, you, you wanna slow down and reduce your energy coming into that corner so you don't run off the corner. The problem is, is that if you apply habits you learned in a car to a motorcycle, it doesn't work the same way. You need to understand the physics of the motorcycle. I mean, motorcycles turn by leaning. Yeah, right? Yep. And if you hit the brakes on a motorcycle, the bike stops leaning, it stands up. And so if you're coming into a corner too fast in a motorcycle, you need to just be smooth and just lean harder. And go through the corner because the most dangerous thing you can do is just slam on the brakes, stand the bike up, and they just run right off the corner. And this is a lot like the behaviors I see when people are trying to transition to continuous delivery workflows is, is that they, you know, they'll hit a bump along the way and they'll start slowing down delivery, which means that the delivered batches get bigger. But the safety from CD comes from delivering smaller batches and getting faster feedback and it, it's the same sort of mindset. It's counterintuitive, but you need to. Deliver more frequently in smaller chunks rather than slow everything down and panic because you hit a roadblock.

Jim Hirschauer: 5:14

Yeah, I love that analogy. So, leaning into your, your software delivery practices, that's a great segue into the main topic for today's show. You know, the SRE edition of Ship Talk, we like to focus on reliability and resiliency. Right now we're having some conversations industry wide, about efficiency and on the show we did last week, I spoke to Matt Schillerstrom about efficiency and I wanna have that similar conversation with you today and maybe expand into the, the software delivery portions of this. So, you know, Overall efficiency is a really hot topic right now, and especially given the current economic environment and as companies, yeah, yeah. As companies are looking for different ways to reduce their costs and increase productivity, automation, reliability, resiliency. These are becoming top priorities. So I'd love to hear from you, what's your take on how companies can be successful given the fact that efficiency and reliability, they're often working against each.

Bryan Finster: 6:11

Well, I think that a mistake a lot of companies make is they say, we're gonna become more efficient so that we can be more productive. And by that they mean generate more output. Mm-hmm. But you know, the perspective I have, like I said, I've been a developer for a long, long time, and except for a small well, It's been a long career except for a portion of time when operational responsibility was stripped away from me. Where for, to improve developer productivity, make support of my application, go to another team. I have had operational responsibility for what I build. Right, and I don't mean I own the infrastructure and all the tools and everything. I mean that I'm building the application and if it breaks, I'm the one getting woken up in the middle of the night to fix the application. And ops is the reason for all of this. We want to become more efficient about how we deliver software. so that we can, you know, make smaller batch sizes of of work so that we can fix production faster. Right. So the smaller batches of work allow us to uncover the pain points preventing us from delivering high quality change. Right? Right. It could be processed, it could be how we test whatever, but shoving down those batch sizes, uncovers all that pain. But to do that, we have to. Anytime we find waste in our flow of delivery, we need to remove that waste handoffs to other teams and lack of access to product information we need, whatever that is, how do we improve the supply chain of communication and optimize our processes so that we can keep shrinking that change set size down so that we can every single day build the muscle memory. Of fixing production. I mean, this is the thing about cd. It's not about speed. It's about building that muscle memory. We are delivering daily. Yeah, hopefully multiple times a day if we're on a high performing or a team, so that when we wake up at three o'clock in the morning, we're Oh, and, and I left this out, and we're only ever using our emergency flow. We're using our hot fix process to deliver every single change because we are validating that our hot fix process works very, very well. We're stressing that system of delivery to uncover. Broke breaks during the normal business hours, so we don't hit those breaks in the middle of the night when something breaks because something will break when we're asleep. Yeah, right. And I focus on CD because I wanna go to sleep at 3 0 5 when I woke up at three o'clock with a system that's down. Right. I've spent too many years on pager. It was too many sleepless nights not to do continuous delivery.

Jim Hirschauer: 8:53

So you said something really interesting. You said that it's about ops. It's about being able to recover your systems as quickly as possible. Yes. But what does that mean for Dev? Because at the end of the day, you also said in your time in Dev, you were responsible for what you deployed. Right. That software that you deploy.

Bryan Finster: 9:10

I am. I am still a dev. Yeah. And here's just the hard truth. I was talking to Andrew Clay Shafer about this yesterday. This whole thing about coddling developers to take away responsibility for the things that they build so they don't feel the impact of their decisions drives poor quality. If an organization wants high quality software, they will give ownership to teams. And ownership means taking responsibility for the, your architectural and, and tech stack decisions. Mm-hmm. you don't wind up with a team voluntarily building something with five different languages if they also have to fix it in the middle of the night. Right. They're gonna optimize for clean code, clean architecture, easy to understand. Simple and effective. Right? Right. They're gonna optimize for not being woken up and that's what drives quality. And when you separate those two things, you kill that quality feedback loop. And developers don't know that they're breaking stuff and a some will just start playing. But the, I'm a developer, I don't, I have no problem holding my peers accountable for being professionals and taking responsibility for the work that we do.

Jim Hirschauer: 10:22

And how does that translate into the software delivery life cycle as a whole? You mentioned continuous delivery, but when we talk about like reliability and resiliency, that shouldn't start in production, right? By the time you get to production. It's there, right? And so whatever's there is there, and it's gonna be as reliable as you've made it. And whatever users are gonna do to it, they'll do. So where does it start and how do you make that part of your software delivery life cycle?

Bryan Finster: 10:45

Well, it starts with your pipeline. The, the purpose of a CD pipeline is not to deliver software. The purpose of a CD pipeline is to prevent bad software from being delivered. Mm-hmm. And so we need to identify what, number one, what is our definition of deliverable. Can we even describe it? If we can't, we already have a problem If we need to have a definition of deliverable. It's a certain level of security, a certain level of performance for what it is we're delivering on. You know, this particular use the problem we're trying to solve, and then codify that in the pipeline. And then we start shipping down the pipeline to find out how wrong we are about defining our definitions in the pipeline and hardening our pipeline. When I, when I've worked with teams in the past on this, I tell'em that, look, you're a product team, you own this business problem and solving this business problem, but your primary product is the pipeline delivering it. Your job is to, whenever something breaks, because it will, your response is, Where do we put something in the pipeline to quickly make that not happen again? You know, and this comes to, we have to design efficient tests that are capable of giving us a signal when things broke, about where it broke so we can fix it quickly before it went to production. And when things break in production, putting a another test in place that will keep efficiently doing that, we need to keep track of how long that takes to run that pipeline. Because if it takes you four hours to run your pipeline, you're not going to, when there's an emergency you're gonna come up with some hot fix process that's not well tested and pour gasoline on a dumpster fire. I know I've done it, right? Yeah. So, yeah. You know, and, and so it's, you're constantly going, okay, we're gonna ship, we're gonna find out what breaks, we're gonna just continue improving the things that break.

Jim Hirschauer: 12:32

When you put all that together, it sounded like you were saying our pipelines are not just about delivering new features. Right. They're there to make sure that we're delivering quality software to our end users.

Bryan Finster: 12:41

They're our safety net. Yeah. Yeah. And, and we are, we own our safety net. We will make mistakes. The automation is there to help us. Avoid making them twice.

Jim Hirschauer: 12:52

So, from that perspective, it sounds like you're saying that resiliency engineering should be part of this overall process, and not only in production, but baked into your software delivery pipelines as well.

Bryan Finster: 13:04

Everything. If you're starting a new team, working on a new product, Or even if you're just starting a new service for a product that exists. Mm-hmm. right? Feature zero is your pipeline. Yeah. If you don't have a way to get to production, to invalidate your assumptions about what you're building, everything else you do is waste, and you're just piling defect on top of defect without knowing it.

Jim Hirschauer: 13:28

You know, it's interesting. We were just working with a company out in Australia that Yeah. Was releasing a new product, right? This is an established company, but they're releasing new product and their take on resiliency engineering was that they needed to get it implemented before they released their new product. And that was honestly for me, that was the first time I had ever heard a company make that statement. Their opinion was if we launched this new product, and it provides a poor customer experience. It's worse than not launching the product at all for them. They didn't want to to create that bad first impression.

Bryan Finster: 14:01

I totally agree. Yeah. But you also need to figure out how I can get this into production to validate some of our assumptions about it running in a production environment without also exposing it to people. In a way that's not good. Right? I mean, you want, you, you, you, you want to deploy. There's, there's many, many times when I'm deploying the production things that nobody, but I know, I mean, it's, it's being tested by the fact that it's not blowing up and dying in that production environment. One of the things I work on is I'm a contributor to minimum cd.org and we have just a list of problems to solve. Right. It's not implementation do CD this way. It's, here's problems to solve that if you solve these problems, now you're doing continuous delivery. Right. One of those problems is production-like test environments, but it's production-like. Yeah. the odds of you having a test environment that matches production are zero. Yeah. Absolutely right. Yeah. And so, you know, getting something out there in a way where you can do some validation that it will actually run and doesn't break anything without hurting anything while you do it, is critical.

Jim Hirschauer: 15:06

So is this validation, is this strictly the realm of chaos engineering type of solutions? Or is there more than that that goes into it?

Bryan Finster: 15:14

I mean, it's far more than that. Not only is it resilient, but does it perform it, is it functional? Right? I mean, what tests can we throw at it in production? Yeah. To validate its does in production what it should do. I mean, we, there's all sorts of mistakes we can make. We could have configurations set incorrectly. We only find out when we deliver.

Jim Hirschauer: 15:33

Yeah. Are you, so you're advocating testing and production?

Bryan Finster: 15:37

Absolutely. But testing production doesn't, doesn't mean we only test in production. It means we also test in production. I, I think people really need to understand that all of this is an integrated system. You know, there's this. People see these different words being used. They'll hear agile and DevOps and continuous delivery, and that they'll think that these are all different things and they're not. What we're trying to do is we're trying to implement a system of delivery that gives us. Rapid feedback from idea all the way to delivery, right? Every single step in between rapid feedback that we have a problem.

Jim Hirschauer: 16:12

Brian, I love your, viewpoint on this holistic way of thinking about software delivery because in reality, this is what our end users, it's what they experience. At the end of the day, everything that led up to delivering that software to an end user, it is what their customer experience is.

Bryan Finster: 16:31

You know, and, and, and by the way support documentation, all of those are part of, those are all features. Those are all part of the product. Make sure they're good.

Jim Hirschauer: 16:39

Absolutely. Yeah, I couldn't agree more. There's so many times where I've personally been looking at docs to try and figure out how do I use this thing, right? And what I'm looking for isn't documented All right, great. So listen, let's take a quick transition here because on this podcast, we also do another fun segment at the end. So, one of the things I love to ask and, and I found that people really do like talking about, because at the end of the day, we're all humans and we, we mess up. And when we work in IT, sometimes we have some pretty interesting mess up. So, Brian, I'd love to know, what's your worst it mess up?

Bryan Finster: 17:15

Well, I had to write down a list because I've been doing this for a while and I've, I've messed up right. And this, this particular mess up is one of the reasons why I say that nobody should ever have access to production at all, except through a pipeline. Mm-hmm. Okay. Okay. Because I was on a production system in the wrong path. I was in bin. Yeah, right. I mistyped a command cause I was trying to clear a log file, oh. And I was like, why is it taking so long to delete that log file?

Jim Hirschauer: 17:50

Did, did you delete your whole operating system

Bryan Finster: 17:53

Well it was, it was the primary application that consisted about 800 different discreet programs. Right. So it was the, the warehouse management system? Yeah. Right. So I was deleting the warehouse management system. Oh, wow. From that, from that box, it was out on the edge. Now I rapidly went to another box, you know, another DC and inserted FTPing applications across and hoping that we had no version. We were gonna kill and then crossed my fingers and then just didn't tell anybody. Never never touch production

Jim Hirschauer: 18:24

So did it work out? Did you get away with it?

Bryan Finster: 18:26

I got away with it, yeah. You know, there was another time when I didn't get away with it, but my tech lead covered for me where I, I deleted an application that only existed in one distribution center that calculated tax for Argentina. So for 30 days, Argentina didn't have any tax calculation going on, and that was, yeah. Oh my. Never touch production.

Jim Hirschauer: 18:48

Yeah, I agree. So I used to be a systems administrator a long time ago, and we worked in those production systems all the time and it, it made me nervous even though I was really comfortable in there and I had, you know, multiple factor authentication to get the root user on Unix. Wow. Did you have to be careful in there?

Bryan Finster: 19:06

My office mate at a previous company he was tech lead for public stores and he thought he was deleting the inventory table from a development system. He deleted a production inventory table and from that point on, I always had a different color screen on my terminal emulator for production versus development.

Jim Hirschauer: 19:23

Oh, that is a brilliant. Absolutely. So, yeah. Lesson for everyone listening. Yeah. If you have any access to production systems and non-production systems, like Triple Verify which one you're on before running that command. Especially if, if it's a destructive command.

Bryan Finster: 19:38

Absolutely visual verification somewhere PROD, you know, giant letters or screen colored something cuz you will make a mistake.

Jim Hirschauer: 19:46

Absolutely. All right. Well, Brian it's been wonderful talking to you. Thank you so much for being our guest on the show today. Again, love your idea of looking at software delivery from a holistic perspective. Completely agree with it. Thank you for sharing your humanity with us and, and, you know, the, the mess up that you've had. We've all done it. So yeah. We're all, we're all in the same boat with you. Yeah, yeah, for sure. And to everyone listening if you're interested in being a guest on Ship Talk, if you're an SRE or if you're in a DevOps related role, Just feel free to send us an email. Send that email to podcast at ship talk.io and we'll get back to you. That's all for now. Until next time.

Bryan Finster: 20:29

Thanks, Jim.

Jim Hirschauer: 20:30

Thanks Brian.

People on this episode

Dewan Ahmed

Host

Ravi Lachhman

Host