In 2018, AKF Partners shared a list of questions we used to conduct technical due diligence engagements for our clients. It proved to be popular and continues to be one of the most viewed blog posts on our site.
We continually review and improve our models and tools and have made some changes to our technical due diligence question list. Some of the terminology refers to AKF models and a quick visit to the AKF Scale Cube will help orient you for the scalability questions. Without further ado, here it is.
Scalability X Axis
1. Are load balancers or ELBs used to distribute requests across multiple end points?
2. Is session state stored in the browser or separate tier?
3. Is there an appropriate separation of reads and writes using master/slave databases (or other X-axis capabilities - e.g. NoSQL Quorums)?
4. Are object caches utilized?
5. Does the client/target leverage edge caching (CDN/browser caching)?
Scalability Y Axis
1. Are services (i.e. login, signup, checkout) separated across different servers?
2. Is data (i.e. login, signup, checkout data) sharded across different databases?
3. Are services sized appropriately with consideration of needs (e.g. availability, frequency of change, skill sets)?
Scalability Z Axis
1. Are end points (web, app servers) dedicated to a subset of similar data (i.e. users, SKU, content)?
2. Are databases dedicated to only a subset of similar data?
Fault Isolation (Swim Lanes)
1. Are only asynchronous calls being made across services that support different services?
2. Is data retrieved only from the database for that service?
Disaster Recovery
1. Have SPOFs been eliminated?
2. Are delayed replica databases (or frequent, tested snapshots) used to protect from logical corruption?
3. Are active-active data centers, multiple AZs or regions utilized?
4. Are data centers (if colo or owned DCs) located in geographically low-risk areas?
Cost Effectiveness
1. Is the architecture void of stored procedures?
2. Are the web, application, and database servers on physically separate tiers or separate virtual instances?
3. Is the cloud utilized for peak or seasonal demand (if colo or owned DCs)? Or, are auto-scaling techniques in place (if cloud-hosted)?
4. Is the system void of any costly 3rd party technology to scale your system?
5. Are you buying small/goldfish-sized (vs. over-sized/thoroughbred) hardware?
6. Do you use virtualization, and if so, for the right reasons (headcount reduction)?
7. Would it be simple for you to switch cloud providers? Database provider? Other vendors?
8. Are data centers located in low cost areas (if colo or owned DCs)?
9. Does the client route only necessary traffic through a firewall (non PII, PCI, etc.)?
10. Is there 15 to 25% commitment of development effort to tech debt?
11. Is there any customer-specific code/data structures in your codebase?
Process Product Management
1. Is there a product management team or person that can make decisions to add, delay, or deprecate features?
2. Does the product management team have ownership of business goals?
3. Does the team develop success criteria, OKRs, or KPIs that help to inform feature decisions?
4. Does the team use an iterative discovery process to validate market value and achieve goals?
Process PDLC
1. Does the team use success metrics to determine when a goal has been reached?
2. Does the team use any relative sizing method to estimate effort for features/story cards?
3. Does the team utilize a velocity measure to improve team time to market?
4. Does the team use burn down, metrics, retrospectives to measure the progress and performance of iterations?
5. Does the team measure engineering efficiency to identify and take action on improvement opportunities?
6. Is there a Definition of Done in place?
Process Development
1. Does the client/target use an approach that allows for easy identification and use of a prod bug fix branch for rapid deployment?
2. Does the client/target use an feature-branch approach? Can a single feature/engineer block a release?
3. Does the team have documented coding standards that are applied?
4. Are engineers conducting code reviews with defined standards or have automation in the dev pipeline that validates against the standards?
5. Is open source licensing actively managed and tracked?
6. Are engineers writing unit tests (code coverage)?
7. Is the automated testing coverage greater than 75%?
8. Does the team utilize continuous integration?
9. Is load and performance testing conducted before releasing to a significant portion of users or is the testing built into the development pipeline?
10. Does the team deploy small payloads frequently versus larger payloads seldomly?
11. Does the team utilize continuous deployment?
12. Is any containerization (e.g. Docker) and orchestration (Kubernetes) in place?
13. Are feature flags, where a feature is enabled outside of a code release, in use?
14. Does the team have a mechanism that can be used to rollback (wire on/off. DDL/DML scripted and tested, additive persistence tier, no "select *"?
15. Does the client/target utilize a Joint Application Design process for large features that brings together engineering and ops for a solution or do they have experientially diverse teams?
16. Does the client/target have documented architectural principles that are followed?
17. Does the client/target utilize an Architecture Review Board where large features are reviewed to uphold architectural principles?
Process Operations
1. Is there meaningful logging done by the application, aggregated in a searchable format, regularly mined and utilized to diagnose issues?
2. Are business metrics for monitoring used to determine if a problem exists?
3. Are system level monitors and metrics used to determine where and what the problem may be?
4. Are synthetic monitors in use against your key transaction flows?
5. Are incidents centrally logged with appropriate details?
6. Are problems separated from incidents and centrally logged?
7. Is a process exercised for expediting and effectively communicating sev 1 incident resolution?
8. Are alerts sent in real time to the appropriate owners and subject matter experts for diagnosis and resolution?
9. Is there a single location where all production changes (code and infrastructure) are logged and available when diagnosing a problem?
10. Are postmortems conducted on significant problems and are actions identified and assigned and driven to completion?
11. Is availability measured in true customer-impact?
12. Are Quality of Service meetings held where customer complaints, incidents, SLA reports, postmortem scheduling and other necessary information reviewed/updated daily?
13. Do operational look back meetings occur either monthly or quarterly where themes are identified for architectural improvements?
14. Does the client/target know how much headroom is left in the infrastructure?
15. Does the client/target know how much time is left until capacity is exceeded?
Organization Knowledge and Alignment
1. Do architects have both engineering/development and infrastructure experience?
2. Are teams perpetually seeded, fed, and weeded?
3. Are teams aligned with services or features that are in pods or swim lanes?
4. Are Agile teams able to act autonomously with a satisfactory TTM?
5. Are measurable business goals, OKRs and KPIs visible and commercialized with teams?
6. Are teams comprised of members with all of the skills necessary to achieve their goals?
7. Have architects designed for graceful failures by thinking about scale cube concepts?
8. Does leadership think about availability as a feature by setting aside investment for debt and scaling?
9. Does the client have a satisfactory engineer to QA tester ratio?
Security
1. Is there a set of approved and published information security policies used by the organization?
2. Has an individual who has final responsibility for information security been designated?
3. Are security responsibilities clearly defined across teams (i.e., distributed vs completely centralized)?
4. Are the organization's security objectives and goals shared and aligned across the organization?
5. Has an ongoing security awareness and training program for all employees been implemented?
6. Is a complete inventory of all data assets maintained with owners designated?
7. Has a data categorization system been established and classified in terms of legal/regulatory requirements (PCI, HIPAA, SOX, etc.), value, sensitivity, etc.?
8. Has an access control policy been established which allows users access only to network and network services required to perform their job duties?
9. Are the access rights of all employees and external party users to information and information processing facilities removed upon termination of their employment, contract or agreement?
10. Is multi-factor authentication used for access to systems where the confidentiality, integrity or availability of data stored has been deemed critical or essential?
11. Is access to source code restricted to only those who require access to perform their job duties?
12. Are the development and testing environments separate from the production/operational environment (i.e., they don't share servers, are on separate network segments, etc.)?
13. Are network vulnerability scans run frequently (at least quarterly) and vulnerabilities assessed and addressed based on risk to the business?
14. Are application vulnerability scans (penetration tests) run frequently (at least annually or after significant code changes) and vulnerabilities assessed and addressed based on risk to the business?
15. Are all data classified as sensitive, confidential or required by law/regulation (i.e., PCI, PHI, PII, etc.) encrypted in transit?
16. Is testing of security functionality carried out during development?
17. Are rules regarding information security included and documented in code development standards?
18. Has an incident response plan been documented and tested at least annually?
19. Are encryption controls being used in compliance with all relevant agreements, legislation and regulations? (i.e., data in use, in transit and at rest)
20. Do you have a process for ranking and prioritizing security risks?
21. Do you have an IDS (Intrusion Detection System) solution implemented?
22. Do you have an IPS (Intrusion Protection System) solution implemented?
Why the Changes?
Technology evolves, engineers innovate, entrepreneurs create. A static checklist will not improve with age like wine. Keep your eyes out for future blog posts discussing more details on the changes we made.
Want to learn more?
Contact us, we would be happy to discuss how we have helped hundreds of clients over the years.