Skip to content

Blog

Reflections on Reading “Introduction to Reliable and Secure Distributed Programming”

caption

I’ve had Introduction to Reliable and Secure Distributed Programming sitting on my bookshelf for years, silently whispering, “Read me when you have time.” Of course, “later” always seemed like the right time. As software developers, we often prioritize practical, hands-on books that help solve immediate problems—topics like Kubernetes, Kafka or mastering another layer of C++ intricacies. But in hindsight, neglecting foundational theory is a mistake.

I began my journey with distributed systems over a decade ago. Back then, I wish someone had handed me this book and insisted I dive in immediately. It’s the kind of resource that can help set the foundation for anyone venturing into the complexities of distributed computing. Instead, I learned through trial, error, and practical exposure, which, while valuable, left gaps that only became apparent when I finally picked up this book.

The book doesn’t hold back—it dives deeply into the theoretical underpinnings of distributed systems. Despite my experience, I found several topics both fascinating and challenging. The discussions on randomized consensus with coin and hierarchical consensus stood out to me, offering insights that are as practical as they are thought-provoking. These are concepts that, while grounded in theory, can influence how we design systems in real-world applications.

Sabbatical and Distributed Systems

caption

View from the rooftop of the apartment, where we settled down. I love running up on the left side of the river—quite a nice trail run.

Working at a unicorn company for four years was an exhilarating experience, though incredibly exhausting. Tackling significant projects that few have ever attempted means Google can’t offer much help. Surrounded by highly talented colleagues, many of whom you may never encounter again in one place, all while facing a relentless pace and high expectations from leadership, it eventually takes a toll. After a while, you just need to pause, rest, and reflect. That’s why I left Motional—to take several months to recharge, spend time with family, and reconnect with my hobbies.

The irony is, after more than 20 years in IT, I’ve become somewhat of a workaholic, so proper rest eludes me. Just last weekend, I found myself speaking at the DevFest ’24 conference, sharing insights on best practices in project development with the local IT community. The event exceeded all my expectations. It’s been incredibly gratifying to see how much Kyrgyzstan’s IT sector has grown since I left in 2003. I sincerely hope this growth continues and that, eventually, talented engineers will want to stay here or return to the country instead of trying themselves outside, as I did. However, I’m also acutely aware that I’m now overqualified for most roles in the Kyrgyz job market, which is a bit bittersweet. So, my journey will likely inevitably take me elsewhere after my sabbatical.

It’s funny (or maybe a bit sad, depending on how you look at it), but my workaholic side insists that taking it easy isn’t enough. Now, with plenty of free time, I’ve decided to tackle something I’ve always wanted to dive deeper into: Distributed Systems theory. Yes, the theory itself—not just the practical implementation that most developers focus on. So, I have a Plan.

TIL: how to debug randomly hanging Python applications

Usually, if a Python-based application hangs, you either read logs or grab one of the PBD-based solutions, attaching to the application, and uses the Python console for investigation. The approach is straightforward; for example, you installed pdb-attach, and add a few lines to your application:

import pdb_attach
pdb_attach.listen(50000)

and expect that "magic" will just works:

> python -m pdb_attach <PID> 50000
(Pdb) YOU HAVE PDB SESSION HERE

But sometimes, magic is broken, and my theory is (I didn't search for proof) that this is due to GIL. So, sometimes, no PDB prompt after you have attached to the application with PDB. In my case, the application hang in the multiprocessing.Process call where I used a gRPC server. The gRPC server didn't react to the termination request, the process cannot stop, and like aggravating circumstances, all these are a part of PyTest that hang 1 of 20 executions.

This is a general PDB-based debuggers issue, which means all other tools like pyrasite-shell and PyTest PBD integration also don't work. The only option here is GDB for Python, which is surprisingly amazing! First of all, you need to install Python extension for GDB.

sudo apt-get install python3.9-dbg

Then you can connect to your Python application which is a regular Python process with GDB, and explore the call-stack!

> gdb

(GDB) attach <PID>
(GDB) py-bt

If you use not APT-based Linux, search for proper instruction here.