Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart submit mode #110

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
917cdb9
v1 - test
olszowski Jun 3, 2018
0e55c15
refactoring + tests
olszowski Jun 11, 2018
2b0736a
smart submit
olszowski Jun 18, 2018
32c6c04
fixing file name for stdout and stderr
olszowski Jun 26, 2018
acdd1c7
removing testing code
olszowski Jun 26, 2018
843574e
ignore invalid lines from command output
olszowski Jun 26, 2018
60ce443
fixing command call
olszowski Jun 26, 2018
b06d075
fixing script generation
olszowski Jun 30, 2018
d2cfc6a
improved log message
olszowski Jun 30, 2018
08c05e6
Merge remote-tracking branch 'mcpartools/master' into feature/smart_s…
olszowski Jun 30, 2018
3532112
removing unnecessary files
olszowski Jun 30, 2018
50385a3
import for reduce method
olszowski Jun 30, 2018
f3f23c3
fixing missing error class in python 3.x
olszowski Jun 30, 2018
02f00d5
removing unnecessary whitespaces
olszowski Jun 30, 2018
9cdeadd
include j2 file in manifest
olszowski Jun 30, 2018
371f2c1
fixing submit file
olszowski Jul 17, 2018
2f91474
improved options format, configurable utilisation and ratio
olszowski Jul 22, 2018
b925a95
configurable partition
olszowski Jul 22, 2018
99830f7
fix partition support
olszowski Jul 25, 2018
c33cd2e
improved logging
olszowski Jul 25, 2018
6b6814e
experimental: append sbatch log info to the end of file instead of
olszowski Jul 25, 2018
615f718
log nodes in order
olszowski Jul 25, 2018
df19d7c
fixing python3 vs python2 ways to deal with std out from subprocess
olszowski Jul 26, 2018
c1619af
Merge remote-tracking branch 'mcpartools/master' into feature/smart_s…
olszowski Jul 29, 2018
ab27fd8
change string in tests to bstring to better match real output
olszowski Jul 29, 2018
4f3a8a8
python 2 vs 3 test fix
olszowski Jul 29, 2018
76935ab
python 2 vs 3 test fix
olszowski Jul 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
recursive-include mcpartools *.py *.sh *.json
recursive-include mcpartools *.py *.sh *.j2 *.json
exclude *.sh *.yml *.ini *.txt *.spec .gitkeep
exclude versioneer.py LICENSE CODE_OF_CONDUCT.md
prune docs
Expand Down
129 changes: 77 additions & 52 deletions mcpartools/generatemc.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,61 +14,86 @@ def main(args=sys.argv[1:]):
"""
import mcpartools
parser = argparse.ArgumentParser()
parser.add_argument('-V', '--version',
action='version',
version=mcpartools.__version__)
parser.add_argument('-v', '--verbose',
action='count',
default=0,
help='Give more output. Option is additive, '
'and can be used up to 3 times')
parser.add_argument('-q', '--quiet',
action='count',
default=0,
help='Be silent')
parser.add_argument('-w', '--workspace',
type=str,
help='workspace directory')
parser.add_argument('-m', '--mc_run_template',
type=str,
default=None,
help='path to optional MC run script')
parser.add_argument('-s', '--scheduler_options',
type=str,
default=None,
help='optional scheduler options: path to a file or list of options in square brackets')
parser.add_argument('-e', '--mc_engine_options',
type=str,
default=None,
help='optional MC engine options: path to a file or list of options in square brackets')
parser.add_argument('-x', '--external_files',
nargs='+', # list may be empty
type=str,
help='list of external files to be copied into each job working directory')
parser.add_argument('-b', '--batch',
type=str,
default=None,
choices=[b.id for b in SchedulerDiscover.supported],
help='Available batch systems: {}'.format([b.id for b in SchedulerDiscover.supported]))
parser.add_argument('-c', '--collect',
type=str,
default='mv',
choices=Options.collect_methods,
help='Available collect methods')
parser.add_argument('-p', '--particle_no',
dest='particle_no',
metavar='particle_no',
type=int,
required=True,
help='number of primary particles per job')
parser.add_argument('-j', '--jobs_no',
type=int,
required=True,
help='number of parallel jobs')

general_args = parser.add_argument_group(title=None, description='General options')
config_args = parser.add_argument_group(title=None, description='Generator configuration options')
mc_args = parser.add_argument_group(title=None, description='Monte Carlo options')
smart_args = parser.add_argument_group(title=None, description='Smart submit options.')

parser.add_argument('input',
type=str,
help='path to input configuration')
# TODO add grouping of options

general_args.add_argument('-V', '--version',
action='version',
version=mcpartools.__version__)
general_args.add_argument('-v', '--verbose',
action='count',
default=0,
help='Give more output. Option is additive, '
'and can be used up to 3 times')
general_args.add_argument('-q', '--quiet',
action='count',
default=0,
help='Be silent')

config_args.add_argument('-w', '--workspace',
type=str,
help='workspace directory')

config_args.add_argument('-m', '--mc_run_template',
type=str,
default=None,
help='path to optional MC run script')
config_args.add_argument('-s', '--scheduler_options',
type=str,
default=None,
help='optional scheduler options: path to a file or list of options in square brackets')
config_args.add_argument('-e', '--mc_engine_options',
type=str,
default=None,
help='optional MC engine options: path to a file or list of options in square brackets')
config_args.add_argument('-x', '--external_files',
nargs='+', # list may be empty
type=str,
help='list of external files to be copied into each job working directory')
config_args.add_argument('-b', '--batch',
type=str,
default=None,
choices=[b.id for b in SchedulerDiscover.supported],
help='Available batch systems: {}'.format([b.id for b in SchedulerDiscover.supported]))
config_args.add_argument('-c', '--collect',
type=str,
default='mv',
choices=Options.collect_methods,
help='Available collect methods')

mc_args.add_argument('-p', '--particle_no',
dest='particle_no',
metavar='particle_no',
type=int,
required=True,
help='number of primary particles per job')
mc_args.add_argument('-j', '--jobs_no',
type=int,
required=True,
help='number of parallel jobs')

smart_args.add_argument('--smart',
action='store_true',
help='smart mode on (only for slurm)')
smart_args.add_argument('--partition',
default="plgrid",
help='partition (default: %(default)s)')
smart_args.add_argument('--utilisation',
type=float,
default=0.5,
help='cluster node utilisation (default: %(default)s), 1 stands for 100%%')
smart_args.add_argument('--ratio',
type=float,
default=1.08,
help='multiply utilisation by ratio (default: %(default)s) to contain all jobs')

args = parser.parse_args(args)

if args.quiet:
Expand Down
46 changes: 42 additions & 4 deletions mcpartools/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,31 @@ def __init__(self, args):
# no checks needed - argparse does it
self.batch = args.batch

self.smart_options = SmartOptions(args)

def is_smart_enabled(self):
return self.smart_options.is_smart_enabled()

@property
def valid(self):
return self._valid


class SmartOptions:
def __init__(self, args):
self.is_smart = args.smart
self.utilisation = args.utilisation
self.ratio = args.ratio
self.partition = args.partition
self.nodes = None

def is_smart_enabled(self):
return self.is_smart is True

def set_nodes(self, nodes):
self.nodes = nodes


class Generator:
def __init__(self, options):
self.options = options
Expand Down Expand Up @@ -128,7 +148,10 @@ def run(self):
if scheduler_class: # if not empty
# list should have only 1 element - that's why we call scheduler_class[0] (list is not callable)
self.scheduler = scheduler_class[0](self.options.scheduler_options)
logger.info("Using: " + self.scheduler.id)
if self.options.is_smart_enabled():
logger.info("Using: " + self.scheduler.id + " (smart mode)")
else:
logger.info("Using: " + self.scheduler.id)
else:
logger.error("Given scheduler: \'%s\' is not on the list of supported batch systems: %s",
self.options.batch, [supported.id for supported in SchedulerDiscover.supported])
Expand All @@ -138,7 +161,7 @@ def run(self):
self.generate_workspace()

# generate submit script
self.generate_submit_script()
self.generate_submit_script(smart=self.options.smart_options)

# copy input files
self.copy_input()
Expand Down Expand Up @@ -191,15 +214,30 @@ def generate_workspace(self):
self.scheduler.write_main_run_script(jobs_no=self.options.jobs_no, output_dir=self.workspace_dir)
self.mc_engine.write_collect_script(self.main_dir)

def generate_submit_script(self):
def generate_submit_script(self, smart):
if smart.is_smart_enabled():
from mcpartools.scheduler.smart.slurm import get_cluster_state_from_os
cluster_state = get_cluster_state_from_os(partition=smart.partition)
smart.set_nodes(
cluster_state.get_nodes_for_scheduling(
int(self.options.jobs_no), smart.utilisation, smart.ratio))
self._log_selected_nodes(smart)

script_path = os.path.join(self.main_dir, self.scheduler.submit_script)
logger.debug("Preparation to generate " + script_path)
logger.debug("Jobs no " + str(self.options.jobs_no))
self.scheduler.write_submit_script(
main_dir=self.main_dir,
script_basename=self.scheduler.submit_script,
jobs_no=self.options.jobs_no,
workspace_dir=self.workspace_dir)
workspace_dir=self.workspace_dir,
smart=smart)

def _log_selected_nodes(self, smart):
logger.info("Nodes selected for smart mode submit:")
nodes_tuple = tuple(smart.nodes)
for node in sorted(set(smart.nodes)):
logger.info("Node " + node + " used " + str(nodes_tuple.count(node)) + " times")

def copy_input(self):
indir_name = 'input'
Expand Down
39 changes: 27 additions & 12 deletions mcpartools/scheduler/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,22 +26,37 @@ def __init__(self, scheduler_options):
submit_script = 'submit.sh'
main_run_script = 'main_run.sh'

def submit_script_body(self, jobs_no, main_dir, workspace_dir):
def submit_script_body(self, jobs_no, main_dir, workspace_dir, smart):
from pkg_resources import resource_string
tpl = resource_string(__name__, self.submit_script_template)
self.submit_script = tpl.decode('ascii')

log_dir = os.path.join(main_dir, "log")
if not os.path.exists(log_dir):
os.mkdir(log_dir)

return self.submit_script.format(options_args=self.options_args,
jobs_no=jobs_no,
log_dir=log_dir,
script_dir=workspace_dir,
calculate_script_name='main_run.sh',
main_dir=main_dir,
collect_script_name='collect.sh')
if smart.is_smart_enabled():
from jinja2.environment import Template
tpl = resource_string(__name__, self.smart_submit_script_template)

self.submit_script = tpl.decode('ascii')

return Template(self.submit_script).render(options_args=self.options_args,
jobs_no=jobs_no,
log_dir=log_dir,
workspace_dir=workspace_dir,
nodes=smart.nodes,
partition=smart.partition)
else:
tpl = resource_string(__name__, self.submit_script_template)

self.submit_script = tpl.decode('ascii')

return self.submit_script.format(options_args=self.options_args,
jobs_no=jobs_no,
log_dir=log_dir,
script_dir=workspace_dir,
calculate_script_name='main_run.sh',
main_dir=main_dir,
collect_script_name='collect.sh')

def main_run_script_body(self, jobs_no, workspace_dir):
from pkg_resources import resource_string
Expand All @@ -51,12 +66,12 @@ def main_run_script_body(self, jobs_no, workspace_dir):
jobs_no=jobs_no)
return self.main_run_script

def write_submit_script(self, main_dir, script_basename, jobs_no, workspace_dir):
def write_submit_script(self, main_dir, script_basename, jobs_no, workspace_dir, smart):
script_path = os.path.join(main_dir, script_basename)
fd = open(script_path, 'w')
abs_path_workspace = os.path.abspath(workspace_dir)
abs_path_main_dir = os.path.abspath(main_dir)
fd.write(self.submit_script_body(jobs_no, abs_path_main_dir, abs_path_workspace))
fd.write(self.submit_script_body(jobs_no, abs_path_main_dir, abs_path_workspace, smart))
fd.close()
os.chmod(script_path, 0o750)
logger.debug("Saved submit script: " + script_path)
Expand Down
30 changes: 30 additions & 0 deletions mcpartools/scheduler/data/smart_submit_slurm.sh.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash

# Log file submit.log will be created in the same directory submit.sh is located
# submit.log is for storing stdout and stderr of sbatch command, for log info from individual jobs see {log_dir:s} directory
LOGFILE="$(cd $(dirname $0) && pwd)/submit.log"
echo -n "" > "$LOGFILE"

# Create temporary files for parsing stdout and stderr output from sbatch command before storing them in submit.log
OUT=`mktemp`
ERR=`mktemp`
# On exit or if the script is interrupted (i.e. by receiving SIGINT signal) delete temporary files
trap "rm -f $OUT $ERR" EXIT
{% for node_id in nodes %}
sbatch {{options_args}} --partition={{partition}} --nodelist={{node_id}} -n1 --output="{{log_dir}}/output_{{loop.index0}}_{{node_id}}.log" --error="{{log_dir}}/error_{{loop.index0}}_{{node_id}}.log" --parsable {{workspace_dir}}/{{"job_{0:04d}".format(loop.index)}}/run.sh >> $OUT 2>> $ERR
{% endfor %}
echo "Saving logs to $LOGFILE"

# If sbatch command ended with a success log following info
if [ $? -eq 0 ] ; then
echo "Job ID: `cat $OUT | cut -d ";" -f 1`" > "$LOGFILE"
echo "Submission time: `date +"%Y-%m-%d %H:%M:%S"`" >> "$LOGFILE"
fi

# If output from stderr isn't an empty string then log it as well to submit.log
if [ "`cat $ERR`" != "" ] ; then
echo "---------------------" >> "$LOGFILE"
echo "ERROR MESSAGE" >>"$LOGFILE"
echo "---------------------" >> "$LOGFILE"
cat $ERR >> "$LOGFILE"
fi
1 change: 1 addition & 0 deletions mcpartools/scheduler/slurm.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ def __init__(self, options_content):
JobScheduler.__init__(self, options_content)

submit_script_template = os.path.join('data', 'submit_slurm.sh')
smart_submit_script_template = os.path.join('data', 'smart_submit_slurm.sh.j2')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .j2 suffix ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a jinja2 template, I wanted this to be explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


main_run_script_template = os.path.join('data', 'run_slurm.sh')
Empty file.
Loading