Case Study 03 · Engineering Risk

The incident was the warning.
The real risk was six months ahead.

A production incident on a Friday afternoon revealed that two environments with no security controls were six months from a planned scale serving defence-sector clients. This case study covers how one signal was used to identify systemic risk, drive remediation through a team under delivery pressure, and move from zero controls to a treated risk environment.

11 Controls from zero
5 Material risks treated
4 Accounts downgraded
0 Plaintext credentials remaining

The Environment and Its Stakes

The sole Business Analyst embedded inside a five-person product and engineering team at a cybersecurity research firm whose client base included defence-sector organisations and former military personnel. The team was building a commercial vulnerability intelligence platform while simultaneously delivering bespoke threat intelligence solutions to existing clients.

The team was operating without a dedicated security manager, without formal SDLC security processes, and under significant delivery pressure. Everyone — including non-technical stakeholders — had admin privileges on the Azure Databricks production workspace. The data pipeline and web application environments were the operational backbone of everything being built and delivered.

Six months from the point of the incident, there was a planned organisational scale on the roadmap — more team members, a commercial product release, and a growing client base in regulated sectors. The conditions were set for a material security failure. What was missing was someone to see it before it happened.

⚠️
The core tension: A small, high-performing team racing against a product roadmap had accepted security debt as the cost of speed. That was a reasonable decision at formation. Six months from a planned scale serving defence-sector clients, it was no longer reasonable. The question was not whether to act — it was whether anyone would see the signal in time to act before it became a crisis.
6
months to planned team and product scale
0
formal security controls across either environment at the point of the incident

What the Incident Revealed

On a Friday afternoon, a routine data transformation was underway in Azure Databricks — ingesting the MITRE ATT&CK TAXII dataset as part of building the threat intelligence pipeline. When attempting to mount Azure Blob Storage directly onto a notebook to begin the silver layer transformation, the shared production pipelines were accidentally disrupted.

Pipeline failure emails started arriving immediately. The data architect was escalated on Slack, a call was made, screen shared, and the sequence walked through. He rolled back the change within the hour. No client data was affected. Blast radius contained.

It could have been treated as an isolated mistake. But being embedded inside the team meant seeing what had actually happened — not just the incident, but the conditions that made it possible. Everyone had admin access to production. API keys were hardcoded. Nobody was reviewing code before it went live. Six months from scaling, that was a signal, not a slip.

The incident itself was not catastrophic. What it exposed was. Without environment separation, any team member — at any privilege level — could disrupt production at any time. As the team scaled, that risk would not stay linear. It would compound.

Three categories of systemic risk were identified across both environments that needed to be addressed before the scaling milestone:

🎯
Operational and reputational risk. Defence-sector clients depended on the reliability of the data products. Any disruption — even operational rather than security-driven — risked damaging those relationships and threatening existing contracts.
🔒
Access control and privilege risk. Universal admin access across both environments meant the blast radius of any mistake — or any malicious action — was as large as it could possibly be. There was no segmentation, no least privilege, and no change gate.
⚙️
Credential and supply chain risk. API keys were hardcoded in plaintext across notebooks and configuration files. Python and JavaScript dependencies were untracked and unscanned. Either could have introduced a silent, persistent compromise.

The Approach

Rather than raising everything at once and overwhelming a team already under pressure, a deliberate prioritisation decision was made. The full picture was documented — five material risks across two environments — and recommendations were sequenced around what was most critical to address before the scaling milestone.

The three most urgent issues — no sandbox environment, no least privilege enforcement, and hardcoded credentials — were raised first. Separate conversations were had with the data architect and the product manager, explaining the risks in business terms, connecting them to the scaling timeline, and proposing that remediation be embedded directly into the sprint backlog.

1
Signal identification
Incident treated as diagnostic, not isolated mistake
2
Risk assessment
5 material risks across 2 environments
3
Stakeholder alignment
Business case made to PM and data architect
4
Sprint integration
Remediation tracked via Jira by severity
5
Validation
Control effectiveness confirmed across both environments

Five Material Risks Across Two Environments

Risk statements follow the threat-vulnerability-consequence format aligned with ISO 27005 and NIST SP 800-30. Inherent ratings reflect the state of the environment before any controls were implemented. Each risk maps directly to the controls that treat it.

Critical
Operational disruption via uncontrolled production change
There is a risk that an authorised user exploits the absence of environment separation and change authorisation controls, resulting in unintended modification or disruption of production systems and interruption to client deliverable timelines.
CTL-DP-001CTL-DP-002CTL-DP-006
Critical
Privilege escalation via excessive access rights
There is a risk that an internal or external actor exploits the absence of least privilege enforcement across system administration roles, resulting in unauthorised access to production systems beyond the scope of their designated function.
CTL-DP-003CTL-DP-005CTL-APP-003
High
Unauthorised access via exposed credentials
There is a risk that a malicious external actor exploits API credentials stored in plaintext within source code and configuration files, resulting in unauthorised access to data pipeline infrastructure and potential exfiltration of proprietary threat intelligence data.
CTL-DP-004CTL-DP-007
Medium
Insider threat — unauthorised data exfiltration
There is a risk that a malicious or negligent insider exploits excessive access privileges, enabling unauthorised exfiltration or disclosure of client personally identifiable information.
CTL-DP-003CTL-DP-005CTL-APP-003CTL-DP-007
Medium
Supply chain compromise via unmanaged dependencies
There is a risk that a malicious or compromised third-party software package is introduced without detection, resulting in integrity compromise of data products or unauthorised access to application infrastructure.
CTL-APP-001CTL-APP-002

11 Controls Across Four Security Layers

Controls were structured across the access, application, data, and governance layers — ensuring the risk surface was addressed systematically. The balance skews preventive, reflecting the pre-scale nature of the work. Detective and governance controls were layered in to sustain the controls over time rather than treating remediation as a one-off exercise.

💡
Controls were implemented through influence without formal authority. The gaps were identified, the recommendations made, and the tracking and validation owned throughout. The data architect and engineers executed the technical implementation. That boundary is intentional and honest — it reflects how risk governance actually operates inside a small engineering team.
Data pipeline — Azure Databricks · Azure Blob Storage · GitLab
CTL-DP-001PreventiveEnvironment separationNIST CM-2 · ISO A.8.31
CTL-DP-002PreventiveChange authorisation controlNIST CM-3 · ISO A.8.32
CTL-DP-003PreventiveLeast privilege access controlNIST AC-6 · ISO A.8.2
CTL-DP-004PreventiveCryptographic credential protectionNIST IA-5 · ISO A.8.24
CTL-DP-005DetectivePeriodic access rights reviewNIST AC-2 · ISO A.8.5
CTL-DP-006CorrectiveCorrective action trackingNIST CA-5 · ISO A.5.36
CTL-DP-007GovernanceRisk treatment documentationNIST RA-3 · ISO A.5.8
Web application — GitHub · Supabase · Visual Studio Code
CTL-APP-001DetectiveSoftware component vulnerability managementNIST RA-5 · ISO A.8.8
CTL-APP-002DetectiveFront-end dependency vulnerability assessmentNIST SI-2 · ISO A.8.8
CTL-APP-003PreventiveData store access controlNIST AC-3 · ISO A.8.3
CTL-APP-004PreventiveCredential storage securityNIST IA-5(1) · ISO A.8.5

From Zero Controls to a Treated Risk Environment

Dimension
Before
After
Data pipeline — Azure Databricks · Azure Blob Storage · GitLab
Environment
Single production workspace. All development, testing, and deployment in one environment with no separation.
Sandbox workspace mirroring production. All development isolated before promotion to production.
Change control
No review process. Code pushed directly to production by any team member at any time.
GitLab merge request workflow. Senior review and approval required before every deployment.
Access rights
Universal admin across all users — including stakeholders who only needed read access.
Least privilege enforced. 4 accounts downgraded to role-appropriate permissions.
Credentials
API keys hardcoded in plaintext inside notebooks and configuration files. Visible to all team members.
All credentials migrated to Azure Key Vault. Zero plaintext credentials remaining.
Dependencies
No tracking or scanning of Python packages. Vulnerable dependencies undetected.
Dependabot enabled on GitHub. Known vulnerabilities in dependencies flagged automatically.
Web application — GitHub · Supabase · Visual Studio Code
Access control
Supabase permissions unreviewed. No row-level security enforced on client data.
Permissions reviewed. Row-level security implemented restricting data by authenticated user context.
Credential protection
Password storage practices unverified. Risk of plaintext credential exposure in the event of a breach.
Passwords stored with cryptographic hashing and salting. Plaintext exposure eliminated.
JS dependencies
No visibility over vulnerabilities in JavaScript packages used in the front-end application.
npm audit reports generated and reviewed. Findings triaged for remediation priority.
Governance — across both environments
Risk visibility
No documented risk register. Material risks unquantified, untracked, and unowned.
5 material risks documented with inherent and residual ratings. Confluence risk register maintained.
Remediation tracking
No formal tracking of control gaps or remediation ownership. No sprint integration.
Jira tickets for every gap. Ownership assigned, sprint-prioritised, validated on closure.

What I Would Do Differently at Scale

This work was self-directed, delivered without a senior security mentor, and built proportionately for a team of five under delivery pressure. The following is an honest assessment of the gaps and what a more mature organisation would do differently — alongside a distillation of how the work was driven without formal authority.

How this was driven without formal authority
1
Identified systemic risks from a single operational incident — nobody asked to look further
2
Sequenced recommendations deliberately to avoid overwhelming a team under delivery pressure
3
Secured buy-in from the product manager and data architect to get remediation into the sprint
4
Owned Jira tracking and Confluence documentation throughout — kept risk visible and accountable continuously
5
Validated control effectiveness across both environments post-implementation
What I would do differently at scale
1
Measure control effectiveness, not just remediation closure — track whether controls are working through access drift detection, change authorisation rates, and credential hygiene monitoring
2
Evaluate scope boundaries explicitly — document what is in and out of scope and justify every exclusion rather than allowing gaps to emerge by default
3
Understand the organisation's biggest manual burden before recommending tooling — build the business case for automation at the point where manual processes no longer scale

Technical Stack

Data pipeline
Azure Databricks Azure Blob Storage Azure Key Vault GitLab Power BI Python SQL
Web application
GitHub Supabase Visual Studio Code Dependabot npm audit
Governance & tracking
Jira Confluence
Standards & frameworks
NIST SP 800-30 NIST SP 800-53 ISO 27001 ISO 27005 MITRE ATT&CK