The New World of Software Operations

Operations has Changed

No technologist in the history of computing lived in a time other than one of change. What is unique about this one is the velocity of that change. There are new paradigms in computing every few years. Three decades ago where enterprises could plan their software strategy for decades, today, the furthest architects can look is a single decade without the concern of being left behind by competitors. I.T. administrators built empires based on administration and operation of physical servers. Programmers would only touch production infrastructure if it was absolutely critical and generally not have much interaction with operations staff. Today developers deploy docker containers getting closer to the production environment than ever before. Applications aren't viewed as large "box" servers but as processes in massive compute clouds. In the old world servers had labels on them representing what software they were responsible for operating; Today its possible to not know what quadrant of a data center an application is running in just by walking down the aisle.

My Experiences Thus Far

I got my start in technology in a small town hospital IT office. They were just starting the journey towards modernization, in their case virtualization. I learned many things, most importantly support. Not just technical support for machines, but the heart of support: supporting people with technology. That job was purely vendor support, meaning we owned none of the software, we bought all of it. I had no developers to go to, just a support phone number to call. I found early on that being friendly, trying to learn more about why the user had the problem not just the how, and pursuing the deeper cause of issues to ensure they don't happen again, led to much better outcomes and happier users. That role cemented in me the drive to make technology work for people, not just to make technology do work for people. Meaning easy to use, available, and reliable. As I transitioned into a software engineering role I found myself slowly drifting away from taking ownership of production. That's when I fell in love with DevOps/SRE roles because it merges support with development. Operations took a different definition focusing more on offering infrastructure to support developers and customers.

Currently I work as a Guest Reliability Engineer at Target HQ in Minneapolis, Minnesota. We're working to transform the organization's traditional support infrastructure to the new modern approach with platform based technology offerings and modern application design principles. It's an exciting time and I am very excited to be at the very start of a journey that will take time to manifest.

It Takes Belief

Writing software is a mad rush to slap keys into something resembling working software. The politics around it are very much based in trust. Trusting developers to deliver the software actually requested and trust in product managers to provide accurate cost and time figures. Systems like Agile aim to empower management a way to make that trust easier to gain, but it is always a rhapsody of a journey, and often difficult to give solid numbers until after the project is released.

Operations has been traditionally different. You call the help desk when things are broken. Productivity is easily measurable based on ticket volume and time to restoration. Migrating to a more heterogeneous work flow of development and operations (DevOps) is very hard. Unlike the migration from physical and virtual machines to containers or on-premise to cloud; Operations to SRE/GRE/DevOps is a completely different beast. It's not just a technical challenge to transform operations staff into developers and focusing on automation, reliability engineering, and helping developers write more maintainable systems. It's also a matter of changing how organizations think about support. No longer is your operations engineer only focused on short-term reliability (uptime), they're also focused on long-term reliability such as automation for repetition in problems, self help tools, and telemetry to help prioritize backlogs of development teams.

In the long term, this leads to scalable, resilient, and well designed software. As for the short term, it is a tumultuous time of change. The metrics the engineers are measured by change, and the stories leadership used to be able to tell (productivity via ticket work, time on the phone, etc) must be adjusted to account for the development of solutions by the engineers. This change in story takes belief by the entire organization. Belief by the larger organization that an operations engineer not on the phones is still delivering value and moving the organization towards the long term goal and belief by the engineers that automated, self help, and metric based operations is not only possible, but accomplishable by the team.

It Takes Hard Work

All great things are made by many people working hard, and a few working harder. Leaders must hire highly motivated engineers and help foster innovation. Engineers must push beyond what they believe they know and can do, and begin doing more. Leveraging the belief by the organization and trust by leadership I fully believe operations engineers can deliver world class solutions for this new world of technology.

Hard work means more than code. It means navigating the channels of an organization. Engineers and leaders alike must engage the entire technical organization leveraging every resource and building partnerships. This takes time to schedule and attend those meetings. It requires skills typically engineers don't believe they have, and pushing yourself on the soft skills, that is hard work in a technical based profession.

Conclusions

I believe the days are numbered for basic help desk roles. There will always be a place for client support, but in a world of containers, Platform as a Service and Functions as a Service operation teams are becoming so much more than downtime mitigation. They are engineers focused on the infrastructure of their products, engineering solutions to ensure the reliability of the trust customers put into their products. 

I love writing software. I have a true passion for infrastructure programming and designing reliable systems to empower others and that's why I work in operations today. Organizations across the world are beginning to adopt this mindset and we're seeing the explosion of developer productivity and more reliable platforms. This has me excited and invested in staying in infrastructure organizations helping move to the next "level" of what operations can be.

There is going to be push back. Legacy beliefs about operations will persist for a long time. I believe though that value speaks for itself. The value operations engineers can provide is transparent to users (a person on Facebook never thinks "wow! What low latency") but the developers will see the difference. If management gives trust, empowers innovation, and modernizes those metrics they used to measure operations staff by, the entire organization will too.

It's an exciting time to be in technology, and I am very happy to be on teams moving towards a better tomorrow. I hope this shed some light into what operations is transforming into, and what we can do as engineers to push forward! If you'd like me to write more on the subject or have a topic idea, leave a comment below.

Have a great day!

James

It is the responsibility of leadership to provide opportunity, and the responsibility of individuals to contribute.
- William Pollard