Release it - 2nd edition, by Michael T. Nygard
Review
Great book describing various cases and techniques that allows your software to be more resilient. There are so many good tips, and insights into what makes systems crash and slow.
The book starts at the very basic talking about the network stack, connections and what usually makes systems unresponsive. Then goes on to talk about what a flaw in one part of a system can do to other parts of the system.
The book contains many good pointers and ideas for crating distributed systems, talking about integrations, and how to build a system that can be evolved, rather than having to rebuild everything form the ground again.
A few takeaways
Design your software for production, not just to pass QA
Its important to have the final production environment in mind when you design your software - it does not make sense to have your services run smooth and without errors in development while they wont run in production. i.e. think about what load the service will end up taking in production, the development environment wont have as many users as the production environment.
Failure in software is inevitable, be ready for when it happens (instrumentation etc.)
Its impossible to design and code a system that wont fail - anything can happen. What is important is to be ready for when it happens, this means that you should always have proper instrumentation in place, so you can find and debug the issue as fast as possible
Circuit breaker
In the book there is talk about using Circuit breakers. These are specially needed when doing integrations to other systems. The idea is that once you see errors in an integration, you should have a Circuit breaker in place that could kick in. This will tell you systems that there is an issue with the integration. Then on the next calls to the integration, you will only let few connections through, and these will test weather the integration is still broken.
Your own system can then take action depending on weather the circuit is broken, or if its ok.
The platforms goal is to enable it's customers! (Other teams)
The platform, and the platform team is there to enable the customers to deliver their software and features. Usually a platforms customer is other teams. So the goals is to make delivering value to the end clients as easy as possible. I.e. make build pipelines and packing seamless, and a no brainer for the other teams.
The development environment is the production environment for the job of creating software.
In many companies the development environment is not highly prioritized. This mans that the development environment could be flawed, under powered, or just not working for periods of time. This is not a good practice, mainly as the development environment can be seen as the production environment for the job of creating software. This means that in order to deliver new software and features, the developers needs to test it on a development environment.
It it is down or inaccessible, it might be impossible or harder to do proper testing of the new features.
Include vulnerability scanning of dependencies in your build pipelines
Well there are now automated ways of doing scanning of vulnerabilities in your dependencies, this should be a no-brainer to include in your build pipelines. i.e. scan docker images, and NuGet dependencies
Chaos Engineering
The book also touches chaos engineering, which is the practice of intentional adding outages, or failures on your production systems. The idea is that almost no matter what you do and add of safeguards, then a error in production can have huge consequences. We also know that it is inevitable that failures will happen. Therefore it is better to try to be in control of the process, i.e. shut down instances running in production, to observe what happens - if everything is as expected, the instance will boot up again, and no-one will notice, but in other cases, you now have the data needed to fix a flaw.
Summary
All in all a great book, with many inspirring ideas and takeaways that can be directly used in my day to day work.