There are generally two types of software development: one tailored for individual clients, and the other designed for an entire marketplace. It is essential that code changes have a clear, singular alignment—either to benefit a specific client or to serve the broader market.
Balancing these two directions is critical. Silverchair's architectural philosophy is built around solving this challenge: we strive to capture the scalability benefits of a unified platform while also delivering customization for clients that require it for their unique business cases & users.
Ensuring Stability through Robust Testing
To maintain platform stability and reduce development friction, Silverchair has invested in a comprehensive testing framework. This framework includes:- Unit Testing: Unit tests validate small, isolated pieces of code. In a multi-client platform, this validation ensures that foundational code remains consistent and behaves as expected across different modules. By automating these tests, developers can catch issues early, preventing minor bugs from escalating into larger system failures. These tests are run via continuous integration on every code commit. If the tests don't all pass, the code does not get merged.
- System Testing: While unit tests focus on individual components, system tests ensure that the entire platform functions cohesively. For a platform serving multiple clients, this step is critical to ensure that services, APIs, and modules interact seamlessly without performance degradation or conflict. These tests are run daily and monitored by teams to ensure no regressions have occurred that were not caught in unit testing.
- Regression Testing: In a multi-client platform, even minor changes can ripple across various clients. Regression tests ensure that new updates don’t inadvertently break existing functionality. Automated regression tests allow us to detect issues early in the development lifecycle, preventing bugs from reaching production. For a multi-client platform, regression testing is paramount in defending against instability across all clients. At Silverchair, we run these tests every 2 hours, but we also continuously invest in this framework to more quickly diagnose "real" failures and eliminate any false negatives.
Catching regressions early is particularly valuable in a multi-tenant environment, where an issue introduced for one client could have far-reaching effects on others. By leveraging automation, we reduce the likelihood of widespread issues and ensure consistent platform reliability. To further these efforts, we are developing automated regression test runs that we can use to attribute any failed tests to a single code commit.
Learning and Adapting from Failures
Throughout my career as a software architect, I’ve learned that no system is immune to errors or failures. What distinguishes great software providers isn't a lack of mistakes, but the ability to learn from them and strengthen our platforms to avoid future problems.As I often say, "If you're writing software, you're writing bugs." So, while we have invested in the tests and protections described above, we know that bugs can still make it through to production. That is the nature of the work at this scale, and attempting to catch everything can be prohibitively resource intensive and/or affect efficiency. We want to find significant issues in lower environments and resolve them there but if/when we miss anything material and it goes to production, we can use these tools to fix it quickly and learn and adapt.
For us, the key is to support developers in failing forward—and doing so quickly. By creating efficiencies in our processes, we empower our teams to navigate this Zone of Dramatic Complexity with confidence, ensuring that the Silverchair Platform remains resilient and adaptive in an ever-evolving landscape.