Sunday, September 26, 2021

Paul Osman thinks about longterm strategies,open telemetry and boring systems

We’re kicking off an interview series, called Level-Up, with standout engineering leaders to learn what’s top of mind for them. Check out the full interview at the bottom of this post. And let us know who we should talk to next!

More and more teams are developing software, rushing to embrace new ways to stay current and competitive. But, according to Paul Osman, Staff Platform Engineer at Honeycomb, the best systems are actually pretty boring. If you’ve got a system that’s working for you, why mess with it?

That might make Osman sound like he’s content with legacy solutions, but that’s far from the case. When it comes to Ops, SDKs, and libraries, Osman looks for the best possible solution for the use case as well as the team, encouraging engineers to thoughtfully consider all their options.

At Honeycomb, he’s been tasked with developing a long-term strategy for client libraries and SDKs. In a previous role as a Senior Engineering Manager at Under Armour, he led infrastructure teams, specifically focusing on Kubernetes and microservices.

Osman and I actually worked together at PagerDuty where he led their internal platform team at the time. I came to know Osman as someone who prolifically picked up new tools and technologies. More importantly, he had (and still has) a phenomenal intuition of what good developer tooling feels like.

Recently, I got to sit down with Osman to talk about telemetry, architecture, and how to build long-term Ops strategies. He filled me in on his perspective: he’s excited about OpenTelemetry and is interested in how teams can gain better observability of their data. Osman’s insights are instructive for any engineer looking to become an adaptible, effective leader, so I’ve compiled the top takeaways from our conversation.

Building a long-term strategy for client libraries and SDKs

Osman has a lot of experience developing infrastructure and leading teams. At Under Armour, he was responsible for leading all of the infrastructure teams with a focus on migrating to Kubernetes. In previous roles at PagerDuty and Soundcloud, he led platform teams.

This experience, which made Osman an expert in client libraries and SDKs, led him to his current role at Honeycomb. When Osman joined, engineers were maintaining libraries in a variety of languages– Ruby, Go, Python, and Java. “My job was to come on board, figure out what we were going to continue to support, and develop a long-term strategy for client libraries and SDKs,” he said. “We also needed to figure out how to build a team to support that strategy.”

With the help of other folks at Honeycomb (Osman gave a shout-out to Liz Fong-Jones), Osman helped formulate a strategy for the company’s approach to ingesting observability data that relied on OpenTelemetry.

“We wanted to make it as easy as possible to get data into Honeycomb,” he said. “We wanted to meet customers wherever they were, so we leveraged OpenTelemetry and made our own client SDKs. This made it easy for customers to put instrumentation into code. We also built integrations that would take logs from different applications and ship them off to Honeycomb.”

Driving towards OpenTelemetry

OpenTelemetry is key to Osman’s work at Honeycomb, but he has a bigger picture perspective on its place in Ops. He not only sees it as a great story of combined efforts, but is also hopeful that it can change how engineers think about telemetry altogether.

“OpenTelemetry is one of those great stories in open source– two communities recognized that they were serving the same audience and then decided to combine their efforts,” said Osman. “OpenTelemetry specifically grew out of two different projects with very similar goals, which were to create an open-source ecosystem around telemetry data, specifically, tracing, metrics, and logs.”

Osman and his team at Honeycomb have embraced OpenTelemetry wholeheartedly. He calls them “big fans.” For Osman, OpenTelemetry represents possibility. “I dream of a future where telemetry is not something engineers think a lot about. Instead, it’s baked into the framework,” he said. “I’m really hoping that through an effort like this, we can get closer to a world like that.”

Prioritizing observability, not just data

For a long time, metrics, logging, and tracing have been seen as the equivalent to observability. Recently, however, there’s been a shift in taking observability back to its original definition of control systems.

To Osman, metrics, logging, and tracing were born out of what engineers had available. Today, engineers have much more at their fingertips. “I really don’t like companies selling the idea that if you have these three things, you have observability,” he said. “They do give you data, but how well that data represents the internal state of your system is the degree to which you really have observability.”

Honeycomb’s goal is to transcend all three. “Honeycomb, at its heart, is an ultra-wide event store. I love seeing people’s eyes light up when they experience what Honeycomb can show them in terms of observability. Our model is simple: we accept keys and values, which our users can embed. They can have as many as possible though, then have an ultra-wide table that represents their data in what we call a Honeycomb. But that ultra-wide part lets people slice and dice over a huge number of dimensions, quickly getting to the root of the issue.”

The best systems are pretty boring

One of Osman’s favorite things about Honeycomb is how simple the architecture is. “I don’t think there’s such a thing as cool or uncool architecture– there’s just architecture that works, he said. “One of the things I really appreciate about Honeycomb and the way we’ve built our systems is that it’s all boring– our system is not that complicated.”

“I don’t think there’s such a thing as cool or uncool architecture– there’s just architecture that works, said Osman. “One of the things I really appreciate about Honeycomb and the way we’ve built our systems is that it’s all boring.”

He shared that the company leverages a few internally-built services, which are all dog-themed. Retriever is a columnar data store that stores events, Shepherd is their ingest services that persists events, and Doberman enforces usage restrictions, like a guard dog. To Osman, it’s not about having the fanciest Ops solutions. The best ones are simple– and they just work.

“Boring technology always wins,” said Osman.

How companies should adopt Kubernetes

Osman is well-versed in everything Kubernetes, and shared his wisdom on how companies should think about adopting it. Here are a few guiding principles he shared:

  • Simple architecture is the best architecture. Osman believes that the best architecture is simple. If you have solutions that work, you don’t need to suddenly embrace Kubernetes just because it’s the industry standard. “If the old and well-known thing is working for you and it’s not a limiter, why would you change it?,” he said.
  • Make sure your engineers are comfortable. If you have engineers who are really good with Kubernetes and it’s what they do the fastest, then embrace it. But if you’ll need to train engineers to get up to speed, it may not be worth the investment, especially if you have a solution that’s working. “If you’ve got a group of six engineers, and they’re all Kubernetes experts, then go to Kubernetes. It’ll probably be fast for you,” he said. “But if you’ve got a team that’s used to using configuration management and VMs, then stick with that until it’s a bottleneck.”
  • Recognize the power of industry-standard. Although Osman appreciates all that Kubernetes has to offer, he also recognizes that it’s become an industry standard, partly due to great marketing. It may be the right solution for you, but it’s not the only one.

The job of an engineer: Solving business problems

Engineers love to solve problems with their work. But according to Osman, an engineer’s job is to solve business problems with as little code (or work) as possible. “At the end of the day, the customers who love to use the tools are paying your salary,” he said. “That doesn’t mean your work isn’t important, but you do need to focus on critical areas that satisfy customers like reliability, availability, and making sure the services are running smoothly.”

The experience of an engineer can be humbling, especially when you’re building things that you’re not an expert on. “The people you work with who are using your systems are going to be the experts,” said Osman. “You can do the best job you can scoping out the use cases, but the other engineers in your company that are using your service are going to find all the edge cases and surprising ways that your solution doesn’t fit their needs.”

Osman sees this as an exciting opportunity to adopt a service mentality where engineers can get excited about learning more about how people want to use your service. There’s always room to improve a system.

Interview

Check out my full interview with Paul Osman to learn more about his perspective. We talked about Ops, OpenTelemetry, and the role of an engineer.



from Hacker News https://ift.tt/3ocURY6

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.