-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cluster list fetchers and cluster resource fetcher #15
Conversation
51e549c
to
02ba30c
Compare
@tmishina - I will have a look on Monday. In the mean time, can you read the DCO and check the box in the PR description if you agree? |
@alfinkel thank you, I've checked the box of DCO. and I will perform |
02ba30c
to
5d97cf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed through the "IKS" cluster lists fetcher but you can apply README and other general comments to the entire PR. Address/answer the questions raised and I'll continue the review of the other fetcher after that.
arboretum/common/errors.py
Outdated
# -*- mode:python; coding:utf-8 -*- | ||
# Copyright (c) 2020 IBM Corp. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Common error classes.""" | ||
|
||
|
||
class CommandExecutionError(RuntimeError): | ||
"""Represents error at executing command.""" | ||
|
||
def __init__(self, cmd, stdout, stderr, returncode): | ||
"""Initialize an instance. | ||
|
||
Initialize an instance with the return values of the command. | ||
""" | ||
self.__cmd = cmd | ||
self.__stdout = stdout | ||
self.__stderr = stderr | ||
self.__returncode = returncode | ||
|
||
def __str__(self): | ||
"""Get information about the command line and its result.""" | ||
return ( | ||
f'Error running command: {self.cmd}\n' | ||
f'returncode: {self.returncode}\n' | ||
f'stdout: {self.stdout}\n' | ||
f'stderr: {self.stderr}' | ||
) | ||
|
||
@property | ||
def cmd(self): | ||
"""Get command line text.""" | ||
return self.__cmd | ||
|
||
@property | ||
def stdout(self): | ||
"""Get standard out text of the command.""" | ||
return self.__stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'd call this exceptions.py to more closely mirror the naming in the framework.
- It also seems unnecessary to explicitly use the
@property
decorator here. - You should also just use one underscore rather than two for private attributes but I think we don't really need them in this class.
Suggest changing this class to this and renaming as exceptions.py:
# -*- mode:python; coding:utf-8 -*-
# Copyright (c) 2020 IBM Corp. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Common custom exception classes."""
class CommandExecutionError(RuntimeError):
"""System command executed exception class."""
def __init__(self, cmd, stdout, stderr, returncode):
"""
Initialize an instance.
Initialize an instance with the return values of the command.
"""
self.cmd = cmd
self.stdout = stdout
self.stderr = stderr
self.returncode = returncode
arboretum/common/utils.py
Outdated
|
||
|
||
def run_command(cmd, secrets=None): | ||
"""Run commands in a system.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Run commands in a system.""" | |
"""Execute system command.""" |
arboretum/common/utils.py
Outdated
def run_command(cmd, secrets=None): | ||
"""Run commands in a system.""" | ||
if type(cmd) == str: | ||
cmd = cmd.split(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else would it be if not a string? Also, you should use isinstance()
instead of type()
but I think this entire if
block is not necessary if cmd
will always be a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my intention for cmd
is that it should accept str
or list of str
because subprocess.Popen
accepts str
(if shell=True
) or list of str
(if shell=False
). By accepting both types, user of this function does not need to care about the internal implementation.
To clearly show the intention (and apply your comment about type
and isinstance
), I plan to modify the code as follows.
"""
Execute system command.
:param cmd: a space-separated string or a list of string
:param secrets: a text which should be masked in log text.
:returns: standard output of the command.
"""
if isinstance(cmd, str):
cmd = cmd.split(' ')
elif not isinstance(cmd, list):
raise TypeError('given command line was neither '
f'a space-separated string nor list of string: {cmd}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my intention for cmd is that it should accept str or list of str because subprocess.Popen accepts str (if shell=True) or list of str (if shell=False)
But that's not what's happening in run_command
currently. There's no logic to optionally set the shell=True
parameter for Popen
and it will always be acting on a list of arguments anyway based on the previous and proposed versions of run_command
because if cmd
comes in as a string, you turn it into a list of arguments.
- If you truly want
run_command
to dynamically either executecmd
as a string orcmd
as a list of arguments thenif isinstance(cmd, str):
is true you should applyshell=True
as a parameter to thePopen
call. - However, if you want to continue to break
cmd
into arguments then you should useshlex.split
rather thanstr.split
.
Either solution (1) or (2) will achieve:
...By accepting both types, user of this function does not need to care about the internal implementation.
A few other things...
- I think you should look into using subprocess.run() rather than directly using
Popen
. - You can then leverage https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError for error handling
- It's not necessary to raise a TypeError if cmd is not a string or list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to your comments and the document of subprocess, I plan to change the implementation as follows. I really appreciate your comments and suggestions on this, thanks.
- use
subprocess.run()
instead ofsubprocess.Popen()
- assume only
list[str]
forcmd
to leverage functionality (quoting, etc.) ofsubprocess.run()
- delegate error handling to
subprocess.run()
, and usesubprocess.CalledProcessError
instead of custom exception classCommandExecutionError
- to do so, externalize the masking feature to another function
- other changes (accept further parameters and return stderr in addition to stdout for future use)
def mask_secrets(text, secrets):
"""
Replace secret words in a text with `***`.
:param str text: a string which may contain secret words.
:param list[str] secrets: secret word list.
:returns: masked text.
"""
for s in secrets:
text = text.replace(s, '***')
return text
def run_command(cmd, input_text=None, timeout=None):
"""
Execute system command.
This is a wrapper for `subprocess.run()`.
Example 1: `run_command(['echo', '-n', 'hello'])` returns `('hello','')`.
Example 2: `run_command(['cat'], input='hello')` returns `('hello','')`.
Use `subprocess.run()` if other complicated parameters (e.g., encoding)
should be specified.
:param list[str] cmd: command line arguments
:param str input_text: text for standard input of command
:param int timeout: timeout for command in seconds
:raises subprocess.CalledProcessError: if the command finishes with
non-zero returncode.
:raises subprocess.TimeoutExpires: if timeout expires.
:raises TypeError: if some of `cmd` element is not a `str`.
:raises IndexError: if length of `cmd` is zero.
:returns: a tuple of standard output and standard error of the command.
"""
cp = subprocess.run(
cmd,
input=input_text,
text=True,
timeout=timeout,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=True,
shell=False
)
return cp.stdout, cp.stderr
arboretum/common/utils.py
Outdated
if p.returncode != 0: | ||
secrets = secrets or [] | ||
for s in secrets: | ||
cmd = cmd.replace(s, '***') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right. After you split it, cmd
will be a list which does not have a replace
function. Did you test this? There should be a unit test here for any code that isn't a fetcher or a check like evidence and common utilities.
arboretum/ibm_cloud/README.md
Outdated
|
||
* Class: [ClusterListFetcher][fetch-cluster-list] | ||
* Purpose: Write the list of IBM Cloud clusters to the evidence locker. | ||
* Behavior: Log in to IBM Cloud using `ibmcloud login` command, and save the result of `ibmcloud cs cluster ls` command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really isn't the behavior content we're looking for. You shouldn't explicitly state the command that's getting executed because we'll have update this README anytime any sort of change to the command happens. We may also in the future move away from the command and towards using the API if/when it stabilizes but the behavior of the fetcher would remain the same. I think something that communicates the behavior in a more broad (generic) way would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the description into more abstract expression.
89f20fd#diff-fa9c900ac18eaea9441dab042c0ad0c9
Behavior: Log in to IBM Cloud and save the list of clusters bound with specified account.
"""Fetch IBM Cloud cluster list.""" | ||
for account in self.config.get( | ||
'org.ibm_cloud.cluster_list.config.account'): | ||
return json.dumps({account: self._get_cluster_list(account)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. You're issuing a return statement inside the for loop so the loop will only execute one time no matter how many accounts you've specified. Did you test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added multi-account test and fixed this issue. Thank you for pointing out this.
def _get_cluster_list(self, account): | ||
|
||
# get credential for the account | ||
api_key = getattr(self.config.creds['ibm_cloud'], account + '_api_key') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use f-strings whenever possible.
api_key = getattr(self.config.creds['ibm_cloud'], f'{account}_api_key')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed here and another in arboretum/kubernetes/fetchers/fetch_cluster_resource.py.
auditree-arboretum/arboretum/kubernetes/fetchers/fetch_cluster_resource.py
Lines 110 to 112 in caf5c12
api_key = getattr( | |
self.config.creds['ibm_cloud'], f'{account}_api_key' | |
) |
try: | ||
cluster_list = run_command(cmd) | ||
except CommandExecutionError as e: | ||
if e.returncode == 2: # "2" means no plugin error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that "2" always means a no plugin error and are you certain that the returncode will be an integer? BTW this is where you can construct the logger you had in your setUpClass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the return code is indeed an integer then your comment of # "2"...
is a bit confusing. Perhaps # RC: 2 == no plugin
might be clearer to future selves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for your suggestion I applied this to two files.
auditree-arboretum/arboretum/ibm_cloud/fetchers/fetch_cluster_list.py
Lines 63 to 67 in caf5c12
if e.returncode == 2: # RC: 2 == no plugin | |
logger.warning( | |
'Kubernetes service plugin missing. ' | |
'Attempting to install plugin...' | |
) |
auditree-arboretum/arboretum/kubernetes/fetchers/fetch_cluster_resource.py
Lines 140 to 143 in caf5c12
if e.returncode == 2: # RC: 2 == no plugin | |
self.logger.warning( | |
'Kubernetes service plugin missing. ' | |
'Attempting to install plugin...' |
run_command('ibmcloud plugin install kubernetes-service') | ||
cluster_list = run_command(cmd) | ||
else: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just raise
should suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed two lines (same block as #15 (comment))
One other point... I think you should perform a logout (as well as a login) from ibmcloud for each account you process. |
@alfinkel I investigated steps to fetch resources from an IBM Cloud cluster using its REST API. Does this make sense? Get access token and refresh token using API keyhttps://cloud.ibm.com/apidocs/iam-identity-token-api
returns: {
"access_token": "eyJ...",
"refresh_token": "OKDMQ...",
"ims_user_id": 0000000,
"token_type": "Bearer",
"expires_in": 3600,
"expiration": 1598322312,
"scope": "ibm openid"
}
Get cluster listhttps://cloud.ibm.com/apidocs/kubernetes#getclusters
returns: [
{
"id": "bpfn...",
"serverURL": "https://serverurl:9999",
"type": "kubernetes",
...
},
...
]
(for IKS clusters only) Get token for clusterhttps://cloud.ibm.com/apidocs/kubernetes#getclusterconfig
extracting the bearer token for the cluster.
(note: yq is a wrapper for yaml files to pass it to (for OpenShift clusters only) Get token for clusterhttps://cloud.ibm.com/docs/openshift?topic=openshift-access_cluster
Get resource of a clusterFor example,
The option |
You're going to use Python FYI, the fetcher base class has a session method that wraps |
@alfinkel I pushed draft code which uses IBM Cloud API instead of @alfinkel @drsm79 I have two concerns about externalizing common functions regarding IBM Cloud APIs. I created a utility function file.
|
You can put |
I guess there's still the benefit that you won't need to explicitly set the token for every call as it will be part of the session headers upon construction of the session object. |
@alfinkel OK, from your comment and the document of requests I realized that using |
I think we should approach this like the "plugins" for kube, since GCP, AWS, Azure... will have their own iam implementations. So, I disagree with:
Unless it's named something like |
Currently, ComplianceFetcher.session() does not support renewal of the session, and therefore raw requests.session() is also used at fetching resources.
@alfinkel currently two fetchers use Is there a way to renew @drsm79 refactored.
|
I'll add a fix shortly to the ComplianceFetcher.session(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to what you're trying to achieve with the cluster resource "fetcher" work. Can you explain why you're dynamically loading another module mid fetch and why that module while containing "fetch_" methods doesn't contain a fetch class?
I think we may be at a point where you should split this PR up into two PRs. One PR for fetching the cluster list and one PR for the cluster resource as it seems like this review is starting to go a bit sideways.
module_name = ( | ||
'arboretum.kubernetes.fetchers.' | ||
f'fetch_cluster_resource_{cltype}' | ||
) | ||
function_name = 'fetch_cluster_resource' | ||
try: | ||
module = import_module(module_name) | ||
except ModuleNotFoundError: | ||
self.logger.error( | ||
'Failed to load plugin for cluster type "%s": %s', | ||
cltype, | ||
module_name | ||
) | ||
raise | ||
try: | ||
fetcher = getattr(module, function_name) | ||
except AttributeError: | ||
self.logger.error( | ||
'Failed to load expected funtion "%s" ' | ||
'in module "%s"', | ||
function_name, | ||
module_name | ||
) | ||
raise | ||
resources[cltype] = fetcher(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that I said that I wouldn't look at the cluster resource fetcher until the cluster list was done but what is happening here? Dynamic loading of modules doesn't seem right.
@@ -0,0 +1,159 @@ | |||
"""Feching cluster resouce from IBM Cloud.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..and what's the story here? This doesn't even include a fetcher class.
ComplianceAsCode/auditree-framework#57 allows for the fetcher session object to be reset. Once merged auditree-framework v1.2.6 will have the change. |
@alfinkel thank you for your suggestion, then shall we split this PR (to be closed) into two PRs? I have created a new branch When the new PR is merged, I will create another PR for branch @drsm79 Please tell me if you have any preference about this change, thanks! |
Re. the plugins - wouldn't doing this via different fetchers (which maybe share a base class) suffice? It does mean that people would need to configure multiple fetchers (kube fetcher, openshift fetcher etc..) but I think it's more extensible, and gives us the "plugin" behaviour in a more "native" manner. |
Maybe? We'll see once you've submitted the new PR whether we're just some administrative operations away from an approval. BTW, I'm hoping that your new PR will already be rebased and squashed. As far as the version goes, you don't need to bump it because an auto release is not being performed but you should add an update to the CHANGES.md detailing your changes under an |
What
This pull request provides a feature to fetch resources of kubernetes clusters achieves (see #9 in details).
The PR also includes a capability to store lists of kubernetes clusters into an evidence locker from BOM (Bill of Materials) and APIs of cloud service providers.
Why
The resources in a kubernetes cluster contain various types of evidences; for example,
spec
of Pods represents configuration of applications, ConfigMap contains the configuration for the kubernetes cluster itself, and threfore fetching the resources of a kubernetes cluster is important capability for compliance evidence validation of kubernetes clusters.How
This PR contains two main functionalities.
kube/fetchers/fetch_cluster_list.py
: copy BOM (Bill of Materials) specified in an auditree config file into an evidence lockeribm_cloud/fetchers/fetch_cluster_list.py
: fetch the list of clusters managed by IBM Cloud (IBM Kubernetes Service or IKS, and Red Hat Openshift Kubernetes Services or ROKS) by invoking theibmcloud
CLI toolkubectl
CLI toolTest
kubernetes
andibm_cloud
) and cluster resource fetcher were passed for the IBM Cloud clusters (both IKS and ROKS)Context