FLOP:CVE Monitoring

From Funtoo
Revision as of 18:33, April 8, 2020 by D4g33z (talk | contribs)
Jump to navigation Jump to search
Created on
2020/01/21
Original Author(s)
d4g33z
Git sources (for cloning)
Link
Status
Reference Bug
FL-6938

Funtoo Linux Optimization Proposal: CVE Monitoring

Let's monitor the Common Vulnerabilities and Exposures (CVE) list and flag packages in the current portage tree accordingly. Posting bugs on jira.funtoo.org for affected packages could be automated to a significant extent.

cver: A Tool for Monitoring CVEs

Summary

Ultimately, not all ebuilds are created equal. Hence they are updated at different rates according to their popularity in the tree of available packages and this is generally fine: packages with a lot of use get updated frequently, and vulnerabilities are generally dealt with. Unpopular ebuilds can languish, and no one really cares. However, unpopular ebuilds with a significant vulnerability should be updated, popular or not, as they represent a potential vector for attack, if they can be installed.

Identifying ebuilds with an associated CVE will bring them to 'head of the queue' for pull requests and updates, which should often be trivial, as the vulnerability is dealt with upstream and released as a new hotfix version. Or, we can fork and provide our own mitigation, merging with upstream again when a new release comes out (if at all).

The cver (pronounced ça-veer) tool is built around redis cached mongodb collections that are regularly updated with newly filed CVEs. The tool queries the collections to produce a set of text data appropriate to fill fields on a newly created security vulnerability issue on the Funtoo bug tracker. The data can be output in various formats (current just formatted text on stdout), and eventually input directly to the bug tracker via its REST api.

Architecture

The architecture is simple:

┌─────────┐                                 
│redis    │      ┌────┐                     
│┌───────┐│      │jira│──────────┐          
││mongoDB││      └──┬─┘          │          
│└───────┘│         │            │          
└────┬────┘         │            │          
     │              │        *********      
     │     ┌───┐    │     ***         ***   
     ├─────┤dev│──────────*  discussion *   
     │     └─┬─┘    │     ***         ***   
     │       │      │        *********      
     │       │      │                       
     │     ┌─┴─┐    │                       
     ├─────┤bot│────┘                       
     │     └───┘                            
     │                                      
     │                                      
     │     ┌───┐                            
     └─────│usr│                            
           └───┘
  • A dev uses the tool to query the redis cache of the CVE data held in the mongoDB, update and admin the mongoDB, create reports for discussion, and control a bot.
  • The bot can query the redis cache and create issues to post via the REST api of jira.
  • A user can query the redis cache and create reports.
  • discussion produces issues to be posted at jira.

Algorithm

The cvedb.cves collection provided by cve-search has the following estimated schema (see variety, a schema estimator for mongodb):

+--------------------------------------------------------------------------------+
| key                              | types    | occurrences | percents           |
| -------------------------------- | -------- | ----------- | ------------------ |
| Modified                         | Date     |      136539 | 100.00000000000000 |
| Published                        | Date     |      136539 | 100.00000000000000 |
| _id                              | ObjectId |      136539 | 100.00000000000000 |
| access                           | Object   |      136539 | 100.00000000000000 |
| assigner                         | String   |      136539 | 100.00000000000000 |
| cvss                             | Number   |      136539 | 100.00000000000000 |
| cwe                              | String   |      136539 | 100.00000000000000 |
| id                               | String   |      136539 | 100.00000000000000 |
| impact                           | Object   |      136539 | 100.00000000000000 |
| references                       | Array    |      136539 | 100.00000000000000 |
| summary                          | String   |      136539 | 100.00000000000000 |
| vulnerable_configuration         | Array    |      136539 | 100.00000000000000 |
| vulnerable_configuration_cpe_2_2 | Array    |      136539 | 100.00000000000000 |
| vulnerable_product               | Array    |      136539 | 100.00000000000000 |
| access.authentication            | String   |      128583 |  94.17309340188518 |
| access.complexity                | String   |      128583 |  94.17309340188518 |
| access.vector                    | String   |      128583 |  94.17309340188518 |
| cvss-time                        | Date     |      128583 |  94.17309340188518 |
| cvss-vector                      | String   |      128583 |  94.17309340188518 |
| impact.availability              | String   |      128583 |  94.17309340188518 |
| impact.confidentiality           | String   |      128583 |  94.17309340188518 |
| impact.integrity                 | String   |      128583 |  94.17309340188518 |
+--------------------------------------------------------------------------------+

An important key in the collection is that of vulnerable_product. It contains an array of the Common Platform Enumeration of the affected pieces of software, and can potentially be matched (along with the affected product's version(s)) to packages in the Funtoo portage meta-repo.

This is the bird's eye view of what a CPE is:

CPE is a structured naming scheme for information technology systems, software, and packages. Based upon the generic syntax for Uniform Resource Identifiers (URI), CPE includes a formal name format, a method for checking names against a system, and a description format for binding text and tests to a name.

Thus, filtering packages by CVE requires a map between package names and CPE. The current algorithm is the simplest possible: if a CVE has a list of CPEs, each CPE is interpreted to yield a single token and an exact match with package name is attempted for the whole meta-repo using app-portage/eix. If there is a match, then a jira issue can be constructed and reported. Even this simple algorithm produces quite a few matches, but it also misses very significant issues if the CPEs are not added properly to the CVE database for the issue. FL-6938 is a case in point: it was not filed with a CPE for sys-apps/portage (does it exist?) so the algorithm skipped right over it. A more sophisticated algorithm would have done regular expression matching on the summary key of the issue, perhaps matching on the string 'Gentoo Portage,' and producing a report for discussion, and eventual posting to jira.

Once a match is made, the cve-search collection and the portage package database (via app-portage/eix) can be combined to produce the data appropriate for a report.

The correct pattern for this is probably a truth table, with the above exact matching algorithm one example of generalized predicates at are applied to each cve document in the cvedb. A table pairing packages and predicates can they be interpreted via custom logical operations to yields sets of the packages to consider for further discussion or immediate issue creation.



State

The cver tool is currently stateless: it takes some bytes and it makes some bytes. We should probably keep it that way. A disk cache of the LRU memo-ized python function eix_xml might be nice. It would have to be wiped when eix was updated, of course.