# Tool design

## Concepts

sbom-cve-check tool is composed of 3 types of concept:
 - [Database](database.md)
 - [SBOM](sbom.md)
 - [Export](export.md)

For each concept, [plugins](plugins.md) can be registered to add custom
functionalities.

### CVE database

A {term}`CVE database` contains {term}`CVE` entries. An {term}`annotation`
entry is a special kind of CVE entry, which is contained in an
{term}`Annotation database`.

Each database class is automatically registered into a registry, thanks to the
`@register_vuln_db('...')` decorator. The `register_vuln_db` function parameter
is the type name to register. The list of builtins CVE database type names is
provided in the [CVE database](database.md#database-type) section.

In the diagram below, the class diagram associated with CVE databases.
:::{note}
Not all class are listed, only the most important ones.
:::

```mermaid
classDiagram

    class VulnDatabase

    class GitVulnDatabase
    VulnDatabase <|-- GitVulnDatabase

    class CveListVulnEntry
    class CveListVulnDatabase
    GitVulnDatabase <|-- CveListVulnDatabase
    CveListVulnDatabase o-- CveListVulnEntry

    class NvdVulnEntry
    class NvdFkieVulnDatabase
    GitVulnDatabase <|-- NvdFkieVulnDatabase
    NvdFkieVulnDatabase o-- NvdVulnEntry

    class AnnotDatabase
    VulnDatabase <|-- AnnotDatabase
    class GitAnnotDatabase
    AnnotDatabase <|-- GitAnnotDatabase

    class OpenVexAnnotEntry
    class OpenVexAnnotDatabase
    GitAnnotDatabase <|-- OpenVexAnnotDatabase
    OpenVexAnnotDatabase o-- OpenVexAnnotEntry


    class Spdx3AnnotEntry
    class Spdx3AnnotDatabase
    AnnotDatabase <|-- Spdx3AnnotDatabase
    Spdx3AnnotDatabase o-- Spdx3AnnotEntry

    class YoctoAnnotEntry
    class YoctoAnnotDatabase
    AnnotDatabase <|-- YoctoAnnotDatabase
    YoctoAnnotDatabase o-- YoctoAnnotEntry

    class VulnDbEntry
    class VulnRecordDbEntry
    VulnDbEntry <|-- VulnRecordDbEntry
    VulnRecordDbEntry <|-- CveListVulnEntry
    VulnRecordDbEntry <|-- NvdVulnEntry

    class AnnotDbEntry
    VulnDbEntry <|-- AnnotDbEntry
    AnnotDbEntry <|-- OpenVexAnnotEntry
    AnnotDbEntry <|-- Spdx3AnnotEntry
    AnnotDbEntry <|-- YoctoAnnotEntry

    class GitDatabase
    GitAnnotDatabase o-- GitDatabase
    GitVulnDatabase o-- GitDatabase
```

To speed up the lookup of CVEs that are associated with a component, the various
databases are indexed: For each database, an index is created, which is a map
between CPE product name (and only that part) and the list of potentially
associated CVE identifiers.

For git databases, `GitVulnDatabase` or `GitAnnotDatabase`, the index of the
database is stored to disk at the root of the git tree with the following name:
`.sbom-cve-check-cache-index.json`. The index is stored to disk because indexing
the database takes a bit of time. To invalidate the cache file, it contains the
associated commit hash identifier and the hash of relevant configuration
values.

The various database indexes are merged into a unique index stored in RAM.

The logic to obtain the applicable CVEs for a {term}`component` identifier is
described in the [section below](#find-applicable-cve).

### SBOM

A {term}`SBOM` provides, among other things, the components contained in
the image deployed to the device.

Each SBOM class is automatically registered into a registry, thanks to the
`@register_sbom('...')` decorator. The `register_sbom` function parameter
is the type name to register. The list of SBOM builtins type names is provided
in the [SBOM supported formats](sbom.md#supported-formats) section.

In the diagram below, the class diagram associated with SBOM:

```mermaid
classDiagram
    class Sbom {
        +Component components
        +components_grouped_by_id()
        +write_to_file()
    }

    class Component {
        +name
        +version
        +identifiers
        +description
        +compiled_sources
    }

    class Spdx3Sbom
    class Spdx3Component

    Sbom <|-- Spdx3Sbom
    Sbom o-- Component
    Spdx3Sbom o-- Spdx3Component
```

### Export

The tool provides multiple kinds of export types. The list of builtins export
type names is provided in the [export formats](export.md#export-formats)
section.

Each export class is automatically registered into a registry, thanks to the
`@register_export('...')` decorator. The `register_export` function parameter
is the type name to register.

In the diagram below, the class diagram associated with export types:

```mermaid
classDiagram
    class BaseExport {
        +process_export()
        -_is_vuln_filtered()
    }

    class CsvExport
    class Spdx3Export
    class YoctoCveCheckExport

    BaseExport <|-- CsvExport
    BaseExport <|-- Spdx3Export
    BaseExport <|-- YoctoCveCheckExport
```

From the `process_export()` function:
 - From the [SBOM](#sbom) the list of build "recipe" is retrieved, thanks to
   `iterate_component_builds()`. This function returns `CompBuild` objects,
   each containing one or multiple components with the same vendor and product
   name, and with the same version. Typically, there are multiple components
   with the same identifier when a package is split into sub-packages.
 - Then, for each group of components with the same identifier and the same
   version, a search for applicable CVE is realized as described in the
   [subsection below](#find-applicable-cve). Here, the term "applicable" means
   that the CVE is associated with this component identifier regardless of
   the component version.
 - For each found CVE identifier, a special vulnerability object is created,
   named `ComputedVulnInfo`. This object aggregates the various sources of
   information from the CVE databases that were registered. This object is also
   responsible for computing the [VEX assessment](#compute-vex-assessment), as
   described in the subsection below, associated with the previously mentioned
   group of components.
 - The `_is_vuln_filtered()` method of the `BaseExport` class allows to filter
   (exclude) annotations that will be exported. The various filters are
   described in the [export options](export.md#export-options).

## Find applicable CVE

In this section, the term "applicable" means that the CVE is associated with
one of the {term}`component identifiers<Component identifier>` regardless of the
component version. It does not mean that the component is vulnerable.

To find applicable CVEs associated with one component, the following strategy
is realized:
 - First retrieve the component identifiers associated with this component,
   which are obtained, for example, from {term}`CPE`. This list of component
   identifiers is enriched by adding associated component identifiers as
   specified in the [products database](config.md#products).
 - A unique list of product names is obtained from these component identifiers,
   and these component identifiers are grouped by product name.
 - For each product name, the database index, which is described in the
   [CVE database](#cve-database) subsection, is queried to obtain a unique set
   of potential associated CVEs. Some CVEs in this list may not be applicable,
   since we only looked for the product name, without checking for the vendor
   part of the CPE.
 - In some databases, for the same CVE identifier, the information used to
   identify the component is sometimes incomplete; for example, the vendor
   part may be missing, but another database may have all the information
   needed to confirm, without a doubt, whether the component is applicable.
   To be able to confirm that the CVE is applicable (or not), the following
   algorithm is used (for each CVE):
    - For annotation database entry, check if the annotation is applicable to
      this component: The version **and** component identifier must match. If
      this is not the case, just ignore this annotation: The information
      contained in the annotation will not be used to find if the vulnerability
      is applicable to the component currently checked.
    - For each component product name, get the component identifiers (from CPE)
      specified in CVEs that loosely match, which means that they have the same
      product name without checking for other fields (vendor, ...).
    - Still for each component product name, keep only the best associated CVE
      component identifiers: If there are CVE component identifiers with the
      vendor part, only keep those, otherwise take all the identifiers found.
    - Then check if one of the component identifiers fully matches with the CVE
      component identifiers that were kept. If there is a match, consider that
      this CVE identifier is applicable to the component to check.

## Compute VEX assessment

To compute the CVE VEX assessment associated with one or more component
identifiers and one version, the following algorithm is used.
 - Retrieve the CVE database entries associated with the CVE identifier to
   check, and group these entries by priority:
    - For CVE database entries, which are annotations, check that the component
      version is exactly the same as one of the versions specified in the
      annotation. If there is no match, do not use this CVE entry (annotation).
    - For CVE database entries, which are not annotations (typically an entry
      from NVD or CVE List CVE database), always take all entries.
 - First, regardless of the database priority, if any CVE database entry
   indicates that the vulnerability is rejected, indicate that the component
   is not affected, and provide this assessment.
 - For each group of CVE database entries, which were grouped by priority,
   execute the following checks, starting with the group with the highest
   priority:
    - If the CVE database entry is an annotation, take the VEX assessment
      provided by this annotation if there is any.
    - Otherwise, "merge" the CVE database entries: retrieve all
      the version ranges applicable to the component. From these
      version ranges compute the VEX assessment, which is described in more
      detail in the [subsection below](
      #compute-vex-assessment-from-semantic-version-ranges).
    - For this current database priority, if no assessment was computed or if
      the component is considered vulnerable, check if affected sources by the
      vulnerability are compiled, which is described in more detail in the
      [subsection below](#compute-vex-assessment-from-compiled-sources).
 - If no assessment could be computed, repeat this process with CVE database
   entries with lower priority. In the end if no assessment could be computed,
   generate a default assessment indicating that databases do not contain enough
   version information. This default assessment considers that the component
   is affected (vulnerable) by this CVE.

:::{note}
To simplify this explanation, the computation of obsolete assessments was
ignored.
:::

### Compute VEX assessment from semantic version ranges

From the group of CVE database entries with the same priority, retrieve the
list of version ranges groups applicable to the component being assessed.

Each version ranges group contains the following information:
 - Date of the last modification,
 - The organization provider (typically a {term}`CNA`),
 - The version ranges; Each range can either be a range of vulnerable versions
   or a range of unaffected versions.

Currently, the version ranges are retrieved from the following sources:
- From the **NVD database**, ranges are retrieved from `configurations` →
  `nodes` → `cpeMatch`, if the CPE applies to the component being assessed.
- From the **CVE List database**, version ranges are retrieved from all
  containers (CNA or ADP):
  - From `cpeApplicability` → `nodes` → `cpeMatch`, if the CPE applies to the
    component being assessed.
  - From `affected` → `versions`, if the version range applies to the component
    being assessed:
    - If one of `affected` → `cpes` matches the component being assessed.
    - Or, if the {term}`component identifiers<Component identifier>`, derived
      from `affected` → `vendor`, `product`, and/or `packageName` fields by
      querying the [products database](config.md#products), matches the
      component being assessed.

Outdated {term}`ADP` entries are excluded: If a {term}`CNA` has updated its
entry more recently than the ADP, the ADP entry is ignored.

The component version is compared against the retrieved version ranges. For
each range, the tool checks if the component version is within, smaller than,
or greater than the range.

If the range is provided by the kernel.org CNA, the following additional
logic applies to check if a version is within the range:
- If the component version is outside the range and smaller than the
  lower limit:
  - Accept the range if the component version is in the same branch as the
    range's lower limit:
    - The start limit version lacks a patch part (branch start point).
    - Or, the major and minor parts of the start limit version match the
      component version's major and minor parts.
- If the component version is outside the range and greater than the end
  limit:
  - Accept the range if there is no end limit and the start limit is in the
    same branch as the component version:
    - The start limit version lacks a patch part (branch start point).
    - The major part of the start limit version is **2** (covers 2.4 or 2.6
      kernel versions).
    - The major and minor parts of the start limit version match the
      component version's major and minor parts.
  - Otherwise, accept the range if the end limit is in the same branch as the
    component version.
- If the component version is inside the range:
  - Accept the range if the component version lacks a patch part (version from
    main branch).
  - Accept the range if the end limit's major and minor parts match the
    component version's major and minor parts.
  - Otherwise, accept the range if the start limit is in the same branch as
    the component version.
- If the range is not "accepted", it is ignored.

Finally, the **VEX assessment** is computed using the following rules (evaluated
in order):
- If the component version is within an unaffected range provided by the
  kernel.org CNA, indicate the vulnerability is **fixed**.
- If the component version is within a vulnerable range with a valid end
  limit, indicate the vulnerability is **affecting** the component.
- If the component version is within an unaffected range, indicate the
  vulnerability is **fixed**.
- If the component version is within a vulnerable range without an end
  limit, indicate the vulnerability is **affecting** the component **only if**:
  - The highest fixed version discovered is not within this range.
  - Or the component version is smaller than the highest fixed version
    discovered.
- Indicate the vulnerability is **fixed** if:
  - The component version is outside a vulnerable range and greater
    than the end limit of that range.
  - And both versions (component and range end limit) are from the same
    branch.
- Indicate the vulnerability **does not impact** the component (appeared in a
  more recent version) if:
  - The component version is outside a vulnerable range and smaller than
    the lower limit of that range.
  - And both versions (component and range lower limit) are from the same
    branch, or the component version is not outside another vulnerable range
    (greater).
- Indicate the vulnerability **does not impact** the component if the component
  version is smaller than the lowest discovered vulnerable version.
- Indicate the vulnerability is **fixed** if the component version is outside
  a vulnerable range and greater than the end limit of that range.
- Indicate the vulnerability is **fixed** if:
  - The range was provided by the kernel.org CNA and the default status is
    unaffected.
  - And the component version is not associated with any version ranges
    (not in the branch covered by any version ranges).
- Indicate the vulnerability is **affecting** the component if the component
  version is outside a non-vulnerable range.
- Indicate the vulnerability is **fixed** if the CVE database only provides
  specific vulnerable versions (no ranges).
- Otherwise, no VEX assessment is provided.


If an **affected assessment** is generated, the status note provided is
`version-in-range` and the following statement is provided:
 - `May need backporting (fixed from x.y.z)` if the component version falls
   within a vulnerable range, and the upper limit is from the same branch as
   the component version.

 - `Needs backporting (originally fixed in x.y.z)` if the version with the
   original fix is known.
 - `Needs backporting (fixed from x.y.z)` if a fixed version was discovered.
 - `Mitigation action unknown` otherwise.

If an **affected assessment** is generated because the component version is
outside a non-vulnerable range, the status note provided is
`version-maybe-in-range` and the default statement is:
`Check if really vulnerable`.

If a **fixed assessment** is generated, the following status note is provided:
 - If no fixed version associated with the component version was found:
   - `version-not-in-range: Fixed from version x.y.z` if the highest fixed
     version was found.
   - `version-not-in-range: Originally fixed in x.y.z` if the version with the
     original fix is known.
  - `version-not-in-range` otherwise.
 - If the component version is in a branch where the fix is considered
   backported: `cpe-stable-backport: Backported in x.y.z`.
 - `fixed-version: Originally fixed in x.y.z` if the version with the original
   fix is known, and the discovered fixed version is not from the same branch as
   the component version, and the component version is greater than this
   original fixed version.
 - `fixed-version: Fixed from version x.y.z` otherwise.

If a **fixed assessment** is generated because the vulnerability appeared in a
more recent version, the following status note is provided:
`version-not-in-range: Only affects x.y.z onwards`.

### Compute VEX assessment from compiled sources

If the vulnerability database entry (typically from CVE List database) provides
a list of affected sources (program files), and if the SBOM provides compiled
source files for this component, check if we can ignore this vulnerability.

If both conditions are not met, do not provide any assessment.

For each listed affected source file, check if the affected file path is a
"suffix" of one of the listed compiled file paths. This is done this way, since
most of the time the compiled file path is prefixed with the build directory.

If all affected source files are not found in the list of compiled files, then
indicate that the component is not affected with the following assessment:
 - Status notes: `not-applicable-config`.
 - Statement: `Source code not compiled by config`.
 - Justification: "vulnerable code not present".

Otherwise, do not provide any assessment.