Skip to content

Commit

Permalink
Add support for Windows containers (#181)
Browse files Browse the repository at this point in the history
* Use newer version of container libraries

Signed-off-by: Philippe Ombredanne <[email protected]>

* Use new container-inspector structures

Signed-off-by: Philippe Ombredanne <[email protected]>

* Add minimal support for Windows containers

Signed-off-by: Philippe Ombredanne <[email protected]>

* Update Windows package getter

    * The windows_helper module from scancode is not available on pypi

Signed-off-by: Jono Yang <[email protected]>

* Use newer version of container libraries

Signed-off-by: Philippe Ombredanne <[email protected]>

* Update call to windows_helper to win_reg

Signed-off-by: Jono Yang <[email protected]>

* Create new pipeline for Windows Docker images

    * Create Windows specific tag_uninteresting_windows_codebase_resources function

Signed-off-by: Jono Yang <[email protected]>

* Add function to find packages at well-known paths

    * Update tests

Signed-off-by: Jono Yang <[email protected]>

* Add step to tag known software in pipeline

    * Change name of Docker step from "find_images_linux_distro" to "find_images_os_and_distro"

Signed-off-by: Jono Yang <[email protected]>

* Get version from path in tag_known_software #238

    * Update docstrings
    * Pin fetchcode dep

Signed-off-by: Jono Yang <[email protected]>

* Troubleshoot regex patterns #238

Signed-off-by: Jono Yang <[email protected]>

* Report Program File contents as packages #238

Signed-off-by: Jono Yang <[email protected]>

* Update Windows-specific regex

    * Add more file names and file extensions to be ignored
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>

* Do not ignore .mui files #238

Signed-off-by: Jono Yang <[email protected]>

* Filter using extension field rather than path #238

Signed-off-by: Jono Yang <[email protected]>

* Update scanpipe/pipes/docker.py

Create issue to track extraction issue

See #251

Signed-off-by: Philippe Ombredanne <[email protected]>

* Fix scancode-toolkit pinned version in base.txt #238

Signed-off-by: Jono Yang <[email protected]>

* Create pipeline step to tag ignorable files #252

Signed-off-by: Jono Yang <[email protected]>

* Update formatting #238

Signed-off-by: Jono Yang <[email protected]>

* Generalize regex expressions #238

    * Modify regex used for Windows container analysis so it can be used outside the context of a Windows Docker image
    * Update tests

Signed-off-by: Jono Yang <[email protected]>

* Create new pipes for ignoring files #238

    * Create pipes that ignore media files and data files with no clues
    * Update test results

Signed-off-by: Jono Yang <[email protected]>

* Add more file extensions to ignore #238

Signed-off-by: Jono Yang <[email protected]>

* Bump dep versions #238

Signed-off-by: Jono Yang <[email protected]>

* Update docstring #238

    * Use InstalledWindowsProgram object instead of Package

Signed-off-by: Jono Yang <[email protected]>

* Improve regex used in tag_known_software #238

    * Update tests with more paths to test regex patterns

Signed-off-by: Jono Yang <[email protected]>

* Adjust code for consistency across the codebase #181

Signed-off-by: Thomas Druez <[email protected]>

* Address PR comments #238

    * Use re.match instead of re.split
    * Rename WindowsDocker pipeline to DockerWindows
    * Set the default value of the q_objects argument for tag_installed_package_files to be a tuple

Signed-off-by: Jono Yang <[email protected]>

* Add is_media field to CodebaseResource #238

    * Update test results

Signed-off-by: Jono Yang <[email protected]>

* Simplify tag_media_files_as_unintersting() #238

    * Update test

Signed-off-by: Jono Yang <[email protected]>

* Refine windows pipes #238

Signed-off-by: Thomas Druez <[email protected]>

Co-authored-by: Jono Yang <[email protected]>
Co-authored-by: Thomas Druez <[email protected]>
  • Loading branch information
3 people authored Aug 4, 2021
1 parent b5dbd57 commit a7e3897
Show file tree
Hide file tree
Showing 14 changed files with 1,107 additions and 375 deletions.
18 changes: 18 additions & 0 deletions scanpipe/migrations/0011_codebaseresource_is_media.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 3.2.6 on 2021-08-03 18:27

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('scanpipe', '0010_codebaseresource_is_key_file'),
]

operations = [
migrations.AddField(
model_name='codebaseresource',
name='is_media',
field=models.BooleanField(default=False),
),
]
1 change: 1 addition & 0 deletions scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1036,6 +1036,7 @@ class Type(models.TextChoices):
is_text = models.BooleanField(default=False)
is_archive = models.BooleanField(default=False)
is_key_file = models.BooleanField(default=False)
is_media = models.BooleanField(default=False)

class Compliance(models.TextChoices):
OK = "ok"
Expand Down
6 changes: 3 additions & 3 deletions scanpipe/pipelines/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def steps(cls):
return (
cls.extract_images,
cls.extract_layers,
cls.find_images_linux_distro,
cls.find_images_os_and_distro,
cls.collect_images_information,
cls.collect_and_create_codebase_resources,
cls.collect_and_create_system_packages,
Expand Down Expand Up @@ -63,9 +63,9 @@ def extract_layers(self):
if errors:
self.add_error("\n".join(errors))

def find_images_linux_distro(self):
def find_images_os_and_distro(self):
"""
Finds the linux distro of input images.
Finds the operating system and distro of input images.
"""
for image in self.images:
image.get_and_set_distro()
Expand Down
81 changes: 81 additions & 0 deletions scanpipe/pipelines/docker_windows.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# SPDX-License-Identifier: Apache-2.0
#
# http://nexb.com and https://github.com/nexB/scancode.io
# The ScanCode.io software is licensed under the Apache License version 2.0.
# Data generated with ScanCode.io is provided as-is without warranties.
# ScanCode is a trademark of nexB Inc.
#
# You may not use this software except in compliance with the License.
# You may obtain a copy of the License at: http://apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software distributed
# under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.
#
# Data Generated with ScanCode.io is provided on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, either express or implied. No content created from
# ScanCode.io should be considered or used as legal advice. Consult an Attorney
# for any legal advice.
#
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

from scanpipe.pipelines.docker import Docker
from scanpipe.pipes import docker
from scanpipe.pipes import rootfs
from scanpipe.pipes import windows


class DockerWindows(Docker):
"""
A pipeline to analyze Windows Docker images.
"""

@classmethod
def steps(cls):
return (
cls.extract_images,
cls.extract_layers,
cls.find_images_os_and_distro,
cls.collect_images_information,
cls.collect_and_create_codebase_resources,
cls.collect_and_create_system_packages,
cls.tag_known_software_packages,
cls.tag_uninteresting_codebase_resources,
cls.tag_program_files_dirs_as_packages,
cls.tag_empty_files,
cls.scan_for_application_packages,
cls.scan_for_files,
cls.analyze_scanned_files,
cls.tag_data_files_with_no_clues,
cls.tag_not_analyzed_codebase_resources,
)

def tag_known_software_packages(self):
"""
Flag files from well-known software packages by checking common install paths.
"""
windows.tag_known_software(self.project)

def tag_uninteresting_codebase_resources(self):
"""
Flag files that are known to be uninteresting.
"""
docker.tag_whiteout_codebase_resources(self.project)
windows.tag_uninteresting_windows_codebase_resources(self.project)
rootfs.tag_ignorable_codebase_resources(self.project)
rootfs.tag_media_files_as_uninteresting(self.project)

def tag_program_files_dirs_as_packages(self):
"""
Report the immediate subdirectories of `Program Files` and `Program
Files (x86)` as packages.
"""
windows.tag_program_files(self.project)

def tag_data_files_with_no_clues(self):
"""
If a file is a data file and has no clues towards its origin, mark as
uninteresting.
"""
rootfs.tag_data_files_with_no_clues(self.project)
1 change: 0 additions & 1 deletion scanpipe/pipes/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ def extract_layers_from_images(project, images):
Returns the `errors` that may happen during the extraction.
"""
errors = []

for image in images:
image_dirname = Path(image.extracted_location).name
target_path = project.codebase_path / image_dirname
Expand Down
53 changes: 52 additions & 1 deletion scanpipe/pipes/rootfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

import fnmatch
import logging
import os
from functools import partial
Expand All @@ -28,12 +29,14 @@
from django.db.models import Q

import attr
from commoncode.ignore import default_ignores
from container_inspector.distro import Distro

from scanpipe import pipes
from scanpipe.pipes import alpine
from scanpipe.pipes import debian
from scanpipe.pipes import rpm
from scanpipe.pipes import windows

logger = logging.getLogger(__name__)

Expand All @@ -48,6 +51,7 @@
"opensuse": rpm.package_getter,
"opensuse-tumbleweed": rpm.package_getter,
"photon": rpm.package_getter,
"windows": windows.package_getter,
}


Expand Down Expand Up @@ -188,7 +192,7 @@ def has_hash_diff(install_file, codebase_resource):

def scan_rootfs_for_system_packages(project, rootfs, detect_licenses=True):
"""
Given a `project` Project and an `rootfs` RootFs, scan the `rootfs` for
Given a `project` Project and a `rootfs` RootFs, scan the `rootfs` for
installed system packages, and create a DiscoveredPackage for each.
Then for each installed DiscoveredPackage file, check if it exists
Expand Down Expand Up @@ -336,3 +340,50 @@ def tag_uninteresting_codebase_resources(project):

qs = project.codebaseresources.no_status()
qs.filter(lookups).update(status="ignored-not-interesting")


def tag_ignorable_codebase_resources(project):
"""
Using the glob patterns from commoncode.ignore of ignorable files/directories,
tag codebase resources from `project` if their paths match an ignorable pattern.
"""
lookups = Q()
for pattern in default_ignores.keys():
# Translate glob pattern to regex
translated_pattern = fnmatch.translate(pattern)
# PostgreSQL does not like parts of Python regex
if translated_pattern.startswith("(?s"):
translated_pattern = translated_pattern.replace("(?s", "(?")
lookups |= Q(rootfs_path__icontains=pattern)
lookups |= Q(rootfs_path__iregex=translated_pattern)

qs = project.codebaseresources.no_status()
qs.filter(lookups).update(status="ignored-default-ignores")


def tag_data_files_with_no_clues(project):
"""
Tags CodebaseResources that have a file type of `data` and no detected clues
to be uninteresting.
"""
lookup = Q(
file_type="data",
copyrights=[],
holders=[],
authors=[],
licenses=[],
license_expressions=[],
emails=[],
urls=[],
)

qs = project.codebaseresources
qs.filter(lookup).update(status="ignored-data-file-no-clues")


def tag_media_files_as_uninteresting(project):
"""
Tags CodebaseResources that are media files to be uninteresting.
"""
qs = project.codebaseresources.no_status()
qs.filter(is_media=True).update(status="ignored-media-file")
1 change: 0 additions & 1 deletion scanpipe/pipes/scancode.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,6 @@ def get_resource_info(location):

# Missing fields on CodebaseResource model returned by `get_file_info`.
unsupported_fields = [
"is_media",
"is_source",
"is_script",
"date",
Expand Down
Loading

0 comments on commit a7e3897

Please sign in to comment.