TypeError when parsing slurm_stderr in ex_hpc #47

pietromarchesi · 2018-02-01T16:21:45Z

Hi,

Apologies for opening so many issues today. I have adapted the wiki example for SLURM, and have changed it such that it replaces foo with the hostname where the job is running. Code is available at this gist. I was testing it on a SLURM cluster, and got

  File "sciluigi_slurm_example.py", line 72, in run
    out = self.out_replaced().path))
  File "/gpfs/homeb/pcp0/pcp0135/venvs/basic36/lib/python3.6/site-packages/sciluigi/slurm.py", line 131, in ex
    self.ex_hpc(command)
  File "/gpfs/homeb/pcp0/pcp0135/venvs/basic36/lib/python3.6/site-packages/sciluigi/slurm.py", line 147, in ex_hpc
    self.log_slurm_info(stderr)
  File "/gpfs/homeb/pcp0/pcp0135/venvs/basic36/lib/python3.6/site-packages/sciluigi/slurm.py", line 194, in log_slurm_info
    matches = re.search('[0-9]+', slurm_stderr)
  File "/cm/shared/global/opt/python/3.6.1/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object

I made a quick fix by changing line 147 of slurm.py from self.log_slurm_info(stderr) to self.log_slurm_info(str(stderr)).. It works, but I have no idea whether it will break in other circumstances. That said, my job still runs on the login node and does not reach the queue apparently, if you know why that may be the case, let me know.

Cheers,

Pietro

The text was updated successfully, but these errors were encountered:

samuell · 2018-02-01T17:07:46Z

I made a quick fix by changing line 147 of slurm.py from self.log_slurm_info(stderr) to self.log_slurm_info(str(stderr)).. It works, but I have no idea whether it will break in other circumstances.

Oops. Hmm, wonder if that is something with Python 3 that made it do a more proper distinction between strings and byte streams, or something. Could it be that the stderr in this case contains some extra (non-string) characters like for colors or something? Anyhow, it seems to me that you fix should be safe.

That said, my job still runs on the login node and does not reach the queue apparently, if you know why that may be the case, let me know.

Do you have access to the salloc command on your login node? Do you see what exact shell command your script is trying to run (via the logs or so)? Might be useful to try to see what results you get if you run this command manually in bash... with and without salloc [relevant slurm parameters] prepended.

pietromarchesi · 2018-02-02T10:57:10Z

So, regarding the bytes/string issue, I think you are right, it's a Python 3 thing. In particular, I found this answer, which says:

reading stdout and stdin from subprocess changed in Python 3 from str to bytes. This is because Python can't be sure which encoding this uses. It probably uses the same as sys.stdin.encoding (the encoding of your system), but it can't be sure.

I have been looking at what's the best way do convert to string, some people suggest .decode('utf-8') (like in the quoted answer), others simply .decode() or str() without specifying the encoding. If you have any thoughts, let me know. I'll look a bit more into it then I'll be happy to submit a PR.

Regarding the salloc part, I think it actually works fine, because when I look at the executed jobs using sacct, it shows that it ran on a computing node. It may that the part where I'm messing up is when I run the hostname command, which, instead of giving me the name of the compute node, somehow returns the name of the login node (and writes login1 to bar.txt). If I interactively log into a compute node, however, hostname does return the name of the compute node. I'm a bit confused by this.

samuell · 2018-02-02T11:17:28Z

I have been looking at what's the best way do convert to string, some people suggest .decode('utf-8') (like in the quoted answer), others simply .decode() or str() without specifying the encoding. If you have any thoughts, let me know. I'll look a bit more into it then I'll be happy to submit a PR.

Cool, TIA!

Regarding the salloc part, I think it actually works fine, because when I look at the executed jobs using sacct, it shows that it ran on a computing node. It may that the part where I'm messing up is when I run the hostname command, which, instead of giving me the name of the compute node, somehow returns the name of the login node (and writes login1 to bar.txt). If I interactively log into a compute node, however, hostname does return the name of the compute node. I'm a bit confused by this.

Ah, yes, I think I see why: The shell expansion of the variable, will happen when the command is first issued, which is on the login node. Then SLURM will take care of executing the command on the login node (since it is prepended by salloc), but then the shell expansion will have been done already.

Not sure what is the best way to fix that ... perhaps by putting your host-lookup in a separate shell script. Then that should only be executed once on the compute node.

pietromarchesi · 2018-02-02T12:26:30Z

Thanks for the tip on the shell expansion. I was able to fix it as you suggested by putting the command in a bash script. I will write up a draft for a wiki page with the example.

pietromarchesi · 2018-02-02T17:30:35Z

PR submitted!

pietromarchesi mentioned this issue Feb 2, 2018

Added casting to string of slurm_stderr and jobinfo_stdout #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError when parsing slurm_stderr in ex_hpc #47

TypeError when parsing slurm_stderr in ex_hpc #47

pietromarchesi commented Feb 1, 2018

samuell commented Feb 1, 2018

pietromarchesi commented Feb 2, 2018 •

edited

Loading

samuell commented Feb 2, 2018 •

edited

Loading

pietromarchesi commented Feb 2, 2018

pietromarchesi commented Feb 2, 2018

TypeError when parsing slurm_stderr in ex_hpc #47

TypeError when parsing slurm_stderr in ex_hpc #47

Comments

pietromarchesi commented Feb 1, 2018

samuell commented Feb 1, 2018

pietromarchesi commented Feb 2, 2018 • edited Loading

samuell commented Feb 2, 2018 • edited Loading

pietromarchesi commented Feb 2, 2018

pietromarchesi commented Feb 2, 2018

pietromarchesi commented Feb 2, 2018 •

edited

Loading

samuell commented Feb 2, 2018 •

edited

Loading