feat: cluster list fetchers and cluster resource fetcher #15

tmishina · 2020-08-13T00:54:22Z

Tick to sign-off your agreement to the Developer Certificate of Origin (DCO) 1.1

What

This pull request provides a feature to fetch resources of kubernetes clusters achieves (see #9 in details).
The PR also includes a capability to store lists of kubernetes clusters into an evidence locker from BOM (Bill of Materials) and APIs of cloud service providers.

Why

The resources in a kubernetes cluster contain various types of evidences; for example, spec of Pods represents configuration of applications, ConfigMap contains the configuration for the kubernetes cluster itself, and threfore fetching the resources of a kubernetes cluster is important capability for compliance evidence validation of kubernetes clusters.

How

This PR contains two main functionalities.

cluster list fetcher: store the list of clusters
- kube/fetchers/fetch_cluster_list.py: copy BOM (Bill of Materials) specified in an auditree config file into an evidence locker
- ibm_cloud/fetchers/fetch_cluster_list.py: fetch the list of clusters managed by IBM Cloud (IBM Kubernetes Service or IKS, and Red Hat Openshift Kubernetes Services or ROKS) by invoking the ibmcloud CLI tool
cluster resource fetcher: fetch resources from the listed clusters by invoking kubectl CLI tool

Test

tests of cluster list fetcher (kubernetes and ibm_cloud) and cluster resource fetcher were passed for the IBM Cloud clusters (both IKS and ROKS)

Context

new feature: cluster list/resource fetcher #9

alfinkel · 2020-08-14T22:46:07Z

@tmishina - I will have a look on Monday. In the mean time, can you read the DCO and check the box in the PR description if you agree?

tmishina · 2020-08-15T10:45:09Z

@alfinkel thank you, I've checked the box of DCO. and I will perform git rebase main.

alfinkel

I reviewed through the "IKS" cluster lists fetcher but you can apply README and other general comments to the entire PR. Address/answer the questions raised and I'll continue the review of the other fetcher after that.

alfinkel · 2020-08-17T19:43:35Z

arboretum/common/errors.py

+# -*- mode:python; coding:utf-8 -*-
+# Copyright (c) 2020 IBM Corp. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Common error classes."""
+
+
+class CommandExecutionError(RuntimeError):
+    """Represents error at executing command."""
+
+    def __init__(self, cmd, stdout, stderr, returncode):
+        """Initialize an instance.
+
+        Initialize an instance with the return values of the command.
+        """
+        self.__cmd = cmd
+        self.__stdout = stdout
+        self.__stderr = stderr
+        self.__returncode = returncode
+
+    def __str__(self):
+        """Get information about the command line and its result."""
+        return (
+            f'Error running command: {self.cmd}\n'
+            f'returncode: {self.returncode}\n'
+            f'stdout: {self.stdout}\n'
+            f'stderr: {self.stderr}'
+        )
+
+    @property
+    def cmd(self):
+        """Get command line text."""
+        return self.__cmd
+
+    @property
+    def stdout(self):
+        """Get standard out text of the command."""
+        return self.__stdout


I'd call this exceptions.py to more closely mirror the naming in the framework.

It also seems unnecessary to explicitly use the @property decorator here.

You should also just use one underscore rather than two for private attributes but I think we don't really need them in this class.

Suggest changing this class to this and renaming as exceptions.py:

# -*- mode:python; coding:utf-8 -*- # Copyright (c) 2020 IBM Corp. All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Common custom exception classes.""" class CommandExecutionError(RuntimeError): """System command executed exception class.""" def __init__(self, cmd, stdout, stderr, returncode): """ Initialize an instance. Initialize an instance with the return values of the command. """ self.cmd = cmd self.stdout = stdout self.stderr = stderr self.returncode = returncode

alfinkel · 2020-08-17T19:52:18Z

arboretum/common/utils.py

+
+
+def run_command(cmd, secrets=None):
+    """Run commands in a system."""


Suggested change

"""Run commands in a system."""

"""Execute system command."""

alfinkel · 2020-08-17T19:57:09Z

arboretum/common/utils.py

+def run_command(cmd, secrets=None):
+    """Run commands in a system."""
+    if type(cmd) == str:
+        cmd = cmd.split(' ')


What else would it be if not a string? Also, you should use isinstance() instead of type() but I think this entire if block is not necessary if cmd will always be a string.

my intention for cmd is that it should accept str or list of str because subprocess.Popen accepts str (if shell=True) or list of str (if shell=False). By accepting both types, user of this function does not need to care about the internal implementation.

To clearly show the intention (and apply your comment about type and isinstance), I plan to modify the code as follows.

""" Execute system command. :param cmd: a space-separated string or a list of string :param secrets: a text which should be masked in log text. :returns: standard output of the command. """ if isinstance(cmd, str): cmd = cmd.split(' ') elif not isinstance(cmd, list): raise TypeError('given command line was neither ' f'a space-separated string nor list of string: {cmd}')

my intention for cmd is that it should accept str or list of str because subprocess.Popen accepts str (if shell=True) or list of str (if shell=False)

But that's not what's happening in run_command currently. There's no logic to optionally set the shell=True parameter for Popen and it will always be acting on a list of arguments anyway based on the previous and proposed versions of run_command because if cmd comes in as a string, you turn it into a list of arguments.

If you truly want run_command to dynamically either execute cmd as a string or cmd as a list of arguments then if isinstance(cmd, str): is true you should apply shell=True as a parameter to the Popen call.

However, if you want to continue to break cmd into arguments then you should use shlex.split rather than str.split.

Either solution (1) or (2) will achieve:

...By accepting both types, user of this function does not need to care about the internal implementation.

A few other things...

I think you should look into using subprocess.run() rather than directly using Popen.

You can then leverage https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError for error handling

It's not necessary to raise a TypeError if cmd is not a string or list.

According to your comments and the document of subprocess, I plan to change the implementation as follows. I really appreciate your comments and suggestions on this, thanks.

use subprocess.run() instead of subprocess.Popen()

assume only list[str] for cmd to leverage functionality (quoting, etc.) of subprocess.run()

delegate error handling to subprocess.run(), and use subprocess.CalledProcessError instead of custom exception class CommandExecutionError

to do so, externalize the masking feature to another function

other changes (accept further parameters and return stderr in addition to stdout for future use)

def mask_secrets(text, secrets): """ Replace secret words in a text with `***`. :param str text: a string which may contain secret words. :param list[str] secrets: secret word list. :returns: masked text. """ for s in secrets: text = text.replace(s, '***') return text def run_command(cmd, input_text=None, timeout=None): """ Execute system command. This is a wrapper for `subprocess.run()`. Example 1: `run_command(['echo', '-n', 'hello'])` returns `('hello','')`. Example 2: `run_command(['cat'], input='hello')` returns `('hello','')`. Use `subprocess.run()` if other complicated parameters (e.g., encoding) should be specified. :param list[str] cmd: command line arguments :param str input_text: text for standard input of command :param int timeout: timeout for command in seconds :raises subprocess.CalledProcessError: if the command finishes with non-zero returncode. :raises subprocess.TimeoutExpires: if timeout expires. :raises TypeError: if some of `cmd` element is not a `str`. :raises IndexError: if length of `cmd` is zero. :returns: a tuple of standard output and standard error of the command. """ cp = subprocess.run( cmd, input=input_text, text=True, timeout=timeout, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, shell=False ) return cp.stdout, cp.stderr

alfinkel · 2020-08-17T20:00:28Z

arboretum/common/utils.py

+    if p.returncode != 0:
+        secrets = secrets or []
+        for s in secrets:
+            cmd = cmd.replace(s, '***')


I don't think this is right. After you split it, cmd will be a list which does not have a replace function. Did you test this? There should be a unit test here for any code that isn't a fetcher or a check like evidence and common utilities.

alfinkel · 2020-08-17T20:06:58Z

arboretum/ibm_cloud/README.md

+
+* Class: [ClusterListFetcher][fetch-cluster-list]
+* Purpose: Write the list of IBM Cloud clusters to the evidence locker.
+* Behavior: Log in to IBM Cloud using `ibmcloud login` command, and save the result of `ibmcloud cs cluster ls` command.


This really isn't the behavior content we're looking for. You shouldn't explicitly state the command that's getting executed because we'll have update this README anytime any sort of change to the command happens. We may also in the future move away from the command and towards using the API if/when it stabilizes but the behavior of the fetcher would remain the same. I think something that communicates the behavior in a more broad (generic) way would be better.

I've changed the description into more abstract expression.
89f20fd#diff-fa9c900ac18eaea9441dab042c0ad0c9

Behavior: Log in to IBM Cloud and save the list of clusters bound with specified account.

alfinkel · 2020-08-17T20:29:56Z