Case Study 01 · OT Vulnerability Intelligence

Most vulnerability reports tell you everything is critical. Yours probably does too.

270 CVEs were identified across a legacy OT environment, and CVSS and EPSS scoring flagged the majority of them as critical. The V-Score, built on asset context and live threat intelligence, brought that number down to 22. That gap tells the real story. Raw severity scores applied without environmental context do not tell you what is genuinely dangerous in your specific environment. They produce a list that looks thorough but obscures the decisions that actually matter, and in a resource-constrained OT setting, that distinction has real operational consequences.

270 CVEs identified
22 Material findings
92% Noise reduction

The Project and Its Purpose

The firm had developed a novel vulnerability scoring system — the V-Score — that improved on CVSS and EPSS by incorporating real-time threat intelligence from researcher forums and social media feeds. Unlike static severity scores, the V-Score adjusted dynamically as the threat landscape changed, reflecting actual threat actor behaviour rather than theoretical vulnerability severity.

The V-Score was already in use with defence-sector clients. The next step was academic validation: a rigorous, independent demonstration of its effectiveness against a real environment. The firm approached a UK research university, who accepted the proposal and made available their OT security laboratory — a lab-scale water treatment facility used by PhD researchers specialising in critical systems security.

⚙️
The core question: When the same set of vulnerabilities is scored by CVSS, EPSS, and the V-Score, does the prioritisation differ — and if so, by how much? The answer would determine whether dynamic, threat-intelligence-driven scoring represented a material improvement over the static severity ratings that most vulnerability management programmes relied on.

As the sole Business Analyst embedded in the R&D team, the role was to build the solution that would apply the V-Score to a real OT asset inventory and make the comparison visible to external academic stakeholders. The V-Score methodology itself was developed and validated separately by the data engineering team.

One further note on scope. The project was designed and delivered for a research-based OT environment, and it delivered what that context required. The stakeholders were satisfied with the output. But a contained laboratory setting and a live operational facility are not the same problem, and treating the methodology as directly transferable would be a mistake. Section 08 reflects on what would need to change in a live environment — not because the project fell short, but because understanding the boundary conditions of your own work is part of doing it properly.

The Problem with Static Scoring

CVSS assigns a severity score at the point a vulnerability is disclosed. It measures theoretical severity — attack complexity, privileges required, potential impact — without asking whether any threat actor is actively pursuing it, or whether the asset it sits on matters to the organisation. The score does not change unless the vendor updates it.

This creates a structural problem for any team trying to manage vulnerability risk in practice.

Severity is not risk. A CVE scoring 10.0 on CVSS may pose minimal actual risk if no threat actor is actively exploiting it.

The core flaw in static scoring

Treating severity as a proxy for risk produces backlogs that are unmanageable and remediation decisions that are systematically wrong. An asset with a maximum CVSS score and zero active exploitation carries less real-world risk than an asset with a moderate score that a ransomware group picked up last week.

In the water treatment laboratory, this problem was immediately visible. With 270 CVEs mapped across 17 legacy software components — many of them unpatched legacy systems the researchers could not update without disrupting ongoing work — a CVSS-driven approach would have generated a critical backlog no small team could meaningfully act on.

💡
The V-Score proposition: By incorporating real-time signals about actual threat actor behaviour — forum discussions, proof-of-concept publications, active exploitation activity — the V-Score could distinguish between vulnerabilities that were theoretically severe and those that were genuinely material to the environment right now. The same vulnerability could score low on Monday and high by Friday if a ransomware group picked it up overnight.

The Approach

The work was structured across three sequential phases: building a reliable asset inventory, constructing the data model and pipeline, and delivering a vulnerability intelligence solution in Power BI that enabled direct scoring comparison.

1
Stakeholder Workshop
UK research university, HMI scoped
2
Asset Inventory
17 components, assumptions documented
3
Data Modelling
CPE to CVE to CVSS / EPSS / V-Score
4
Pipeline Build
Medallion architecture, Azure Databricks
5
Dashboard Delivery
Power BI, DAX measures, scoring comparison

The First Problem: Incomplete Data

The human machine interface hosted between 15 and 20 software applications including Windows CE, Siemens, and Oracle tools alongside OT-specific components. Many had incomplete asset information — missing version numbers, missing brand names, or both. Without accurate asset data, CPE IDs could not be assigned, and without CPE IDs, the vulnerability matching that underpinned the entire comparison was unreliable.

This was not just a data quality problem. It was a methodological risk: if the foundation was wrong, the 92% finding would be meaningless. Before writing a single line of code, the asset inventory problem was resolved through two agreed and documented assumptions.

A1
Latest CPE ID as fallback. Where the software brand and name were known but the version was missing or unknown, the latest available CPE ID was used as a substitute. This assumption carried lower risk for OT-specific tools, which had sparse CVE coverage regardless of version. The methodological risk was concentrated in well-known components — Windows CE, Siemens, Oracle — where version mattered more and CVE coverage was dense.
A2
Null record for unidentifiable components. Where there was insufficient information to identify a CPE ID at all, the asset was recorded as null — preserving visibility without fabricating accuracy. Stakeholders retained the ability to update these records if version or brand information was later identified. Accuracy was not assumed; absence was made explicit.
📋
Both assumptions were agreed with stakeholders and documented with justifications before the data pipeline was built. The scoring comparison was only credible if the asset inventory it rested on was methodologically defensible. Getting the foundation right was not a preliminary step — it was the most important risk management decision of the project.

Medallion Architecture in Azure Databricks

The asset inventory was stored in Azure Blob Storage and processed through a three-layer medallion architecture in Azure Databricks using Python and SQL. The data model was designed and validated by the data architect before implementation began.

Bronze layer — raw ingestion

Raw asset inventory data ingested as-is from Azure Blob Storage, alongside pre-processed CVSS, EPSS, and V-Score datasets maintained by the data engineering team. No transformation at this stage — the bronze layer preserves the source data in its original state.

Silver layer — transformation and cleansing

Asset inventory data cleansed and transformed in line with the two documented assumptions:

  • Null handling for components with insufficient identification data
  • Latest CPE ID mapping applied where brand and software name were known
  • Duplicate removal and standardisation across all asset records

The silver layer produced a clean, consistent asset inventory with explicit records for every component on the HMI — including those with incomplete data, which remained visible rather than being discarded.

Gold layer — joins and views

SQL scripts written to join the asset inventory to the vulnerability, CVSS, EPSS, and V-Score tables, producing a single denormalised gold view for Power BI to consume.

  • Asset inventory to Vulnerability table on CPE ID
  • Vulnerability table to CVSS table on CVE ID
  • Vulnerability table to EPSS table on CVE ID
  • Vulnerability table to V-Score table on CVE ID

The vulnerability table acted as the hub — it held both CPE ID and CVE ID, making the entire model joinable across all scoring dimensions from a single asset record.

Layer Purpose Tools
Bronze Raw ingestion — asset inventory and scoring datasets Azure Blob Storage
Silver Cleansing, null handling, CPE ID mapping, deduplication Azure Databricks, Python, SQL
Gold Joined denormalised view across all scoring dimensions Azure Databricks, SQL
Reporting Vulnerability intelligence dashboard, DAX measures, scoring comparison Power BI

The Scoring Comparison

DAX measures were written in Power BI to calculate the percentage of vulnerabilities flagged as critical by each scoring system across the full 270-CVE dataset. The comparison produced a striking result.

CVSS
~92%
flagged as critical
Static severity-based scoring. Score assigned at disclosure, does not reflect current threat actor activity. Theoretically useful; operationally overwhelming.
EPSS
~92%
flagged as critical
Probabilistic exploitability scoring. Improves on CVSS but does not capture live threat actor signals or active exploitation context in real time.
V-Score
8%
flagged as critical
Dynamic threat-intelligence-driven scoring. Incorporates real-time signals from researcher forums and social media. Adjusts continuously as the threat landscape changes.
The finding: Of 270 CVEs mapped across 17 legacy OT software components, the V-Score identified 22 vulnerabilities of genuine material concern — those with active threat actor signals in the real world. CVSS and EPSS flagged the same dataset as overwhelmingly critical, producing an unmanageable backlog with no actionable prioritisation signal. The difference was not marginal. It was the difference between a list no team could act on and a list that enabled informed attack surface reduction decisions.

The difference was not marginal. It was the difference between a list no team could act on and a list that enabled informed decisions.

From the finding

Why This Is a Risk Problem, Not Just a Vulnerability Problem

Vulnerability management is often treated as a technical function — scan, score, patch. But the prioritisation of vulnerabilities is fundamentally a risk decision, and risk decisions require a different kind of reasoning than severity scoring alone.

Business Risk
What is the organisation trying to protect, and what happens if it fails?

Revenue, reputation, regulatory standing, operational continuity. This is the language of the board and the C-suite. Risk at this level is expressed in financial and operational terms, not CVE IDs.

Asset
What is the criticality of the asset to the business?

An asset has value because it supports a process, stores data, or enables a service. The criticality of the asset determines how much the organisation should invest in protecting it. Not all assets are equal.

Attack Surface
What are the ways a threat actor could compromise this asset?

Software vulnerabilities are one component — alongside misconfigurations, weak access controls, insecure network exposure, and human factors. CVEs are a subset of the attack surface, not the whole picture.

Threat
Who is interested in this asset, and are they active right now?

A vulnerability without a threat actor interested in exploiting it carries limited risk, regardless of its CVSS score. The V-Score reintroduced the threat dimension that CVSS systematically excluded.

The translation gap between vulnerability data and risk intelligence sits between the attack surface and asset levels. It is where technical findings either become actionable business decisions or get lost in noise. That translation — from 270 CVEs to 22 material vulnerabilities — was what the dashboard delivered. Not a technical output. A risk decision-support tool.

🔁
The continuous monitoring principle: A risk register reviewed annually operates on a false assumption — that the risk landscape documented twelve months ago is the one the organisation is living in today. It is not. The same vulnerability can be low priority on Monday and critical by Friday. The V-Score project demonstrated that dynamic, threat-intelligence-driven scoring is not just more accurate than static scoring — it is the mechanism that makes continuous risk management operationally possible.

What Would Change in a Live OT Environment

The approach was appropriate for a contained laboratory proof of concept with a clearly defined scope and academic stakeholders. In a live operational environment, four things would need to change.

1
Asset discovery would need to be automated. Manual inventory building is not viable at the scale of an operational facility. OT-specific tooling — such as Claroty or Dragos — scans the network and enumerates assets automatically, removing the dependence on human-maintained records that may be incomplete or outdated. In a live environment with interconnected systems, many of which may not have been updated or documented in years, passive discovery is not a convenience. It is a prerequisite for any credible risk picture.
2
Unknown assets become risk findings, not data gaps. In a live environment, an unidentified software component is not a null record — it is shadow OT. Unknown assets in operational environments are among the most common attack vectors, exploited precisely because defenders do not know they exist. They require escalation, ownership, and tracked resolution — not a placeholder in a spreadsheet.
3
The version assumption would require formal engineer sign-off. Using the latest CPE ID as a fallback is defensible in a lab with sparse CVE coverage. In a live facility — where a Siemens SCADA component version mismatch could mean the difference between a genuine critical and a false one with real operational consequences — every assumption requires formal validation by the OT engineers or systems integrators who built and maintain the environment.
4
Power BI would not be sufficient for data volumes at scale. The reporting layer would need to be a cloud-integrated platform designed for continuous ingestion and refresh at operational scale, with detailed discussions required on feasibility, integration, stakeholder ownership, and the network connectivity implications of connecting OT systems to a cloud environment.

Academic Validation and Publication

The university stakeholders — including the head of computing, the OT security professor, and two PhD researchers — were impressed by both the V-Score comparison and the threat landscape extension built at the CTO's request, which visualised APT groups targeting water sector facilities using MITRE ATT&CK data and a Microsoft researcher-maintained APT dataset.

The partnership continued beyond the initial workshop. The work contributed directly to a peer-reviewed publication validating the V-Score methodology:

📄
Publication: Renney, H., Chenchiah, I.V., Nethercott, M., Paligadu, R., & Lang, J. An Open and Adaptable Approach to Vulnerability Risk Scoring. Journal of Cyber Security, Vol. 7, No. 1, 2025. doi.org/10.32604/jcs.2025.064958

The data architect continues to use the project as a case study in procurement conversations. Findings ownership sat with the head of computing at the university. Remediation was out of scope — the project delivered the threat context needed for informed attack surface reduction decisions, not a remediation plan.

Technical Stack

🛠
Data engineering: Azure Blob Storage (raw ingestion), Azure Databricks (medallion architecture, bronze/silver/gold), Python (transformation scripts), SQL (gold layer joins and views).

Reporting: Power BI (vulnerability intelligence dashboard, DAX measures, scoring comparison visualisations, threat landscape page).

Threat intelligence: MITRE ATT&CK (APT data), Microsoft researcher-maintained APT dataset (campaigns, TTPs, sectors, countries of origin).

Standards & frameworks: CVSS, EPSS, V-Score (proprietary scoring methodology), CPE / NVD, MITRE ATT&CK, CAPEC.