Skip to content

Commit

Permalink
Added --all capability. Cleaned up user's --details output.
Browse files Browse the repository at this point in the history
The --all capability works now, going through and finding each node without duplication due to interconnection of hostgroups. The output of a user's --details is now less cluttered and more appealing. It displays all jobs running on nodes the specified user's jobs are running on. Also display's virtual memory. Slightly changed the amount of characters the screenwidth is when using --visual or -v, from 120 to 119 and only displaying 39 cores on a line instead of 40. This is done as when using split panes in Tmux it looked funky, so to have it line up better it uses one less core per line.
  • Loading branch information
CodyKank committed May 18, 2017
1 parent fa78acb commit ae62811
Show file tree
Hide file tree
Showing 4 changed files with 79 additions and 40 deletions.
6 changes: 2 additions & 4 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,14 @@
have to do is change around the names of the general_access/debug queues in the python
script to what your needs are.

* node-search.py will search through information gathered from qstat about the
UGE environment, including information about nodes, users, user-lists, host-
* node-search.py will search through information gathered from qstat and other tools
about the UGE environment, including information about nodes, users, user-lists, host-
groups, and it can even give a visualization of a host-group.

* This script is assuming (1) Python 3 is installed, tested with python 3.6.0 and 3.4.0,
(2) the subprocess module is installed [should be by default], and (3) you
have qconf and qstat working and configured.

* To do:
Clean up formatting on -u --details options.
Add jobs to Host groups --details output
Make pending_jobs OO.
Check -all option for duplicate nodes!
13 changes: 7 additions & 6 deletions node_search.1
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
.\" Manpage for node_search.py.
.\" Contact [email protected] to correct typos or errors.
.TH man 1 "17 MAY 2017" "1.0.2" "node_search man page"
.TH man 1 "18 MAY 2017" "1.0.3" "node_search man page"
.SH NAME
node_search \- get node, user, or host-group information from Univa Grid Engine
.SH SYNOPSIS
\fBnode_search.sh \fR[\fB-h\fR, \fB--help\fR] [\fB-d\fR, \fB--debug\fR] [\fB-g\fR, \fB--general_access\fR]
[\fB-H\fR, \fB --hosts\fR] [\fB-hostname\fR] [\fB-u\fR, \fB--user username\fR] [\fB-uf username\fR] [\fB-U\fR]
[\fB-v\fR, \fB--visual\fR]
\fBnode_search.sh \fR[\fB-h\fR, \fB--help\fR] [\fB-d\fR, \fB--debug\fR] [\fB-a\fR, \fB--all\fR]
[\fB-g\fR, \fB--general_access\fR] [\fB-H\fR, \fB --hosts\fR] [\fB-hostname\fR]
[\fB-u\fR, \fB--user username\fR] [\fB-uf username\fR] [\fB-U\fR] [\fB-v\fR, \fB--visual\fR]

.SH DESCRIPTION
\fRnode_search is a python 3 script which gathers information from the Univa Grid Engine and displays
that information to stdout, which can then be redirected as desired. The following information can be obtained:
Expand Down Expand Up @@ -94,9 +95,9 @@ Examples below.
\fB--details\fR depends on what it is modifying. If it is modifying the debug or long queues or a
specifed host-group, then the output will show all of the machines/nodes that belong to that queue
or host-group. Each one of those nodes will also show its core usage (used vs total). If \fB--details
\fR is modifying a user, then all of the nodes that the user has jobs running on will be displayed,
\fRis modifying a user, then all of the nodes that the user has jobs running on will be displayed,
along with every other job running on that specific node. If the user has any pending jobs then those
will be shown as well. If \fB--details\fR is modifying the \fB-uf \fRoption, then it will display
will be shown as well. If \fB--details \fRis modifying the \fB-uf \fRoption, then it will display
all of the nodes which belong to the host-groups the specified user has access to.

." END OPTIONS !!
Expand Down
96 changes: 67 additions & 29 deletions node_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ def __str__(self):
def add_job(self, job):
"""Method to add an instance of class Job to the job_list of a node."""
self.job_list.append(job)
self.num_jobs += 1

def set_load(self, load):
"""Method to set the sys-load for a node"""
Expand Down Expand Up @@ -317,7 +318,7 @@ def main():
else:
process_user(sys.argv[2])
elif (sys.argv[1] == '-a' or sys.argv[1] == '--all'):
if len(sys.argv) != 2:
if len(sys.argv) > 3:
print('Argument Error. To see info on every host group just type "node_search.sh --all" or use "-a".')
show_usage(23)
desired_host = "all"
Expand Down Expand Up @@ -394,8 +395,8 @@ def find_host_groups(user_name, detail_switch):
if ul in line:
host_user_list.append(line.split('=')[0])
# Manually adding these, as they do not have user_lists associated with them (open to everyone) !!!
host_user_list.append(DEBUG_QUEUE_HOSTGROUP)
host_user_list.append(GENERAL_ACCESS_QUEUE_HOSTGROUP)
host_user_list.append('@' + DEBUG_QUEUE_HOSTGROUP)
host_user_list.append('@' + GENERAL_ACCESS_QUEUE_HOSTGROUP)

if detail_switch:
print_duser_host(host_user_list, user_name, user_list)
Expand Down Expand Up @@ -516,30 +517,46 @@ def process_host(desired_host):
print('Error: Arg syntax error with: ' + sys.argv[2])
show_usage(23)
elif len(sys.argv) < 3:
print_host_info(total_cores, used_cores, total_nodes, empty_nodes, desired_host, disabled_cores, disabled_nodes)
print_host_info(total_cores, used_cores, total_nodes, empty_nodes, desired_host, disabled_cores,
disabled_nodes)
else:
print('Error: Too many args')
show_usage(23)
return
#^----------------------------------------------------------------------------- process_host(desired_host)

def getAllMachines():
"""Function to get all of the machines UGE can find and return a list of them as strings."""
"""Function to get all of the machines UGE can find and return a list of them as strings.
Returns: List of strings."""

validHosts = (subprocess.getoutput("qconf -shgrpl").split())
machineList = []
processedHGList = []
readNodes = False
for host in validHosts:
hostMachineList = ((subprocess.getoutput("qconf -shgrp_resolved " + str(host))).split())
for machine in hostMachineList:
machineList.append(machine)
hostMachineList = ((subprocess.getoutput("qconf -shgrp_tree " + str(host))).split())
for element in hostMachineList:
if '@' in element: # If it is a HG name
if element not in processedHGList:
processedHGList.append(element)
readNodes = True # We haven't seen this HG yet, process the nodes
else:
readNodes = False # We've already seen this HG, don't process nodes
elif readNodes:
machineList.append(element)
else: # readNodes == False and '@' not in element(not HG), already counted this node!
continue # We've already seen this node within the hostgroup, don't count it.
return machineList
#^----------------------------------------------------------------------------- getAllMachines()

def draw_queue(total_nodes, total_cores, used_cores, empty_nodes, desired_host, disabled_cores, disabled_nodes, node_list, free_cores):
"""Method to draw the queue on the screen. Will use '[]' to represent a core."""
def draw_queue(total_nodes, total_cores, used_cores, empty_nodes, desired_host,
disabled_cores, disabled_nodes, node_list, free_cores):
"""Method to draw the queue on the screen. Will use '[]' to represent a core.
Returns: Nothing, draws to stdout."""

if total_cores > 400:
screen_size = 120
cores_per_row = 40
screen_size = 119
cores_per_row = 39
else:
screen_size = 100
cores_per_row = 30
Expand All @@ -560,7 +577,8 @@ def draw_queue(total_nodes, total_cores, used_cores, empty_nodes, desired_host,
print('Total Nodes:'.ljust(int(screen_size/2)),end ="")
print('{0}'.format(str(total_nodes)).ljust(int(screen_size/2)))
print('-'.center(screen_size, '-'))
print(('[0] = Open Core' + PRINT_INDENT + '[~] = Used Core' + PRINT_INDENT + '[#] = Disabled/Err Core').center(screen_size) + '\n')
print(('[0] = Open Core' + PRINT_INDENT + '[~] = Used Core' + PRINT_INDENT + \
'[#] = Disabled/Err Core').center(screen_size) + '\n')

#Drawing representation of the Queue
drawn_cores = 0
Expand All @@ -587,12 +605,14 @@ def draw_queue(total_nodes, total_cores, used_cores, empty_nodes, desired_host,
if drawn_cores == cores_per_row:
drawn_cores = 0
print('')
print('\n' + '-'.center(100,'-'))
print('\n' + '-'.center(screen_size,'-'))
return
#^----------------------------------------------------------------------------- draw_queues(. . .)

def print_detailed_host(total_cores, used_cores, total_nodes, empty_nodes, desired_host, disabled_cores, disabled_nodes, node_list):
"""Prints detailed version of the designated host. Will print every node along with the totals."""
def print_detailed_host(total_cores, used_cores, total_nodes, empty_nodes, desired_host,
disabled_cores, disabled_nodes, node_list):
"""Prints detailed version of the designated host. Will print every node along with the totals.
Return: Nothing, writes to stdout."""

print('\nDetailed info pertaining to: ' + desired_host)
print('Total Nodes: {0}'.format(str(len(node_list))) + ' (some may be disabled!)')
Expand All @@ -603,15 +623,16 @@ def print_detailed_host(total_cores, used_cores, total_nodes, empty_nodes, desir
for node in node_list:
cores = str(node.get_used()) + '/' + str(node.get_total())
if node.get_disabled_switch():
disabled = 'd'
disabled = 'Unavailable'
else:
disabled = ''
print((PRINT_INDENT + node.get_name()).ljust(int(TERMWIDTH/2)) + PRINT_INDENT + (str(cores).rjust(5,' ') \
+ PRINT_INDENT + disabled))
return
#^----------------------------------------------------------------------------- print_detailed_host(. . .)

def print_host_info(total_cores, used_cores, total_nodes, empty_nodes, desired_host, disabled_cores, disabled_nodes):
def print_host_info(total_cores, used_cores, total_nodes, empty_nodes, desired_host,
disabled_cores, disabled_nodes):
"""Prints the information from the process_host function in a pretty* format"""

print(str(desired_host).ljust(TERMWIDTH))
Expand Down Expand Up @@ -685,14 +706,16 @@ def process_user(user_name):
temp_node.set_used_mem(used_mem)
temp_node.set_free_mem(free_mem)

# In qstat -F, qf:min_cpu . . . . is the last item before the jobs are listed, 28 is how many char's that string is (don't want it)
# In qstat -F, qf:min_cpu . . . . is the last item before the jobs are listed,
# 28 is how many char's that string is (don't want it)
node_stat= host[host.find('qf:min_cpu_interval=00:05:00') + 28\
:host.find('\n---------------------------------------------------------------------------------\n')]
# There is always an extra '\n' in here, so subtract 1 to get rid of it
num_jobs = len(node_stat.split('\n')) -1
# If there are any jobs, parse them and gather info
if num_jobs > 0:
# Python is non-inclusive for the right operand, and we want to skip another extra '\n' so start at 1, and want to go num_jobs
# Python is non-inclusive for the right operand, and we want to
# skip another extra '\n' so start at 1, and want to go num_jobs
for i in range(1, num_jobs + 1):
info = node_stat.split('\n')[i].split()
temp_job = Job(info[2], info[3], info[7])
Expand All @@ -712,25 +735,42 @@ def process_user(user_name):

if len(sys.argv) == 4:
if sys.argv[3] == '--details':
print_detailed_user(node_list, pending_list, user_name)
print_detailed_user(final_list, pending_list, user_name)
else:
print('Error: Arg syntax error with: ' + args[4])
print('Error: Arg syntax error with: ' + sys.argv[3])
show_usage(23)
else:
print_short_user(final_list, pending_list, user_name)
#^----------------------------------------------------------------------------- process_user(user_name)

def print_detailed_user(node_list, pending_list, user_name):
"""Prints detailed version, as in all of the nodes the specified user's jobs are on along with other
user's jobs which are running on that node. Will also print all of user's pending jobs(if any)."""
users' jobs which are running on that node. Will also print all of user's pending jobs(if any).
Upon completion, will exit."""

user_pend = []
if len(pending_list):
for j in range(3, len(pending_list)):
for j in range(1, len(pending_list)):
user_pend.append(pending_list[j])

print("=".center(TERMWIDTH,"="))
print("=".ljust(TERMWIDTH - 1) + "=")
print("=" + ("Detailed Node information for jobs corresponding to {0}."\
.format(user_name)).center(TERMWIDTH - 2) + "=")
print("=".ljust(TERMWIDTH - 1) + "=")
print("=".center(TERMWIDTH, '=') + "\n")

for node in node_list:
print(node)
print(node.name + (str(node.used_cores) + "/" + str(node.total_cores)).rjust(int(TERMWIDTH/2)))
print('-'.center(TERMWIDTH,"-"))
print("JobID".ljust(10) + "JobName".ljust(20) + "User".ljust(20) + "MaxVmem".ljust(20) + \
"Cores".ljust(10))
for job in node.job_list:
print(job.id.ljust(10) + job.name.ljust(20) + job.user.ljust(20) + job.max_mem.ljust(20) + \
job.cores.ljust(10))

print('') # Simple newline

if len(user_pend):
print('\n' + '#'.center(TERMWIDTH, '#'))
print("{0}'s pending jobs:".format(user_name).center(TERMWIDTH))
Expand All @@ -746,7 +786,7 @@ def print_short_user(node_list, pending_list, user_name):

user_pend = []
if len(pending_list):
for j in range(3, len(pending_list)):
for j in range(1, len(pending_list)): # Skipping first, as its a print header and not job.
user_pend.append(pending_list[j])

job_count = 0
Expand All @@ -773,8 +813,6 @@ def print_short_user(node_list, pending_list, user_name):
+ job.get_name().ljust(int(TERMWIDTH/4)) + PRINT_INDENT +str(job.get_core_info()).ljust(int(TERMWIDTH/4))\
+ job.get_max_mem().ljust(int(TERMWIDTH/4)))
job_count += 1
this_nodeJobs += 1
print('User Jobs on node: ' + str(this_nodeJobs))
print("----\n{0}'s Total Running Jobs: {1}\n".format(user_name, str(job_count)))

if len(user_pend):
Expand Down Expand Up @@ -803,7 +841,7 @@ def show_usage(exit_code):
print(" -uf, [user_name]".ljust(int(TERMWIDTH/2)) + "show which host groups are available to specified user.".ljust(int(TERMWIDTH/2)))
print(" -U".ljust(int(TERMWIDTH/2)) + "show a list of all users currently recognized by the Univa Grid Engine.".ljust(int(TERMWIDTH/2)))
print("Optional arguments:".ljust(int(TERMWIDTH/2)))
print(" --details".ljust(int(TERMWIDTH/2)) + "flag which can be passed after a user or a hostname to specify a detailed output.".ljust(int(TERMWIDTH/2)))
print(" --details".ljust(int(TERMWIDTH/2)) + "flag which can be passed to certain args for a detailed output.".ljust(int(TERMWIDTH/2)))
print(" -v, --visual".ljust(int(TERMWIDTH/2)) + "flag which can be passed after a host name for a visual queue.".ljust(int(TERMWIDTH/2)) + '\n')
print('Examples:')
print(' {0} -d'.format(SCRIPT_NAME).ljust(int(TERMWIDTH/2)) + '[--debug] could also be used'.ljust(int(TERMWIDTH/2)))
Expand Down
4 changes: 3 additions & 1 deletion node_search.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,7 @@ export LOADEDMODULES=$LOADEDMODULES:opt_local/1.0:gcc/6.2.0:python/3.6.0
export _LMFILES_=$_LMFILES_:/afs/crc.nd.edu/x86_64_linux/Modules/modules/system_modules/opt_local/1.0:/afs/crc.nd.edu/x86_64_linux/Modules/modules/development_tools_and_libraries/gcc/6.2.0:/afs/crc.nd.edu/x86_64_linux/Modules/modules/development_tools_and_libraries/python/3.6.0
export PATH=$PATH:/opt/crc/p/python/3.6.0/gcc/6.2.0/bin:

/afs/crc.nd.edu/user/c/ckankel/Public/node_search/node_search.py $@
# Calling the node_search.py script. Piping through less so if the results are longer than one page,
# it will be opened with less instead of just dumping to the screen.
/afs/crc.nd.edu/user/c/ckankel/Public/node_search/node_search.py $@ | less -F

0 comments on commit ae62811

Please sign in to comment.