Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script from nlp suite doesn't work with $1 #719

Open
nikpag opened this issue Jun 6, 2024 · 1 comment
Open

Script from nlp suite doesn't work with $1 #719

nikpag opened this issue Jun 6, 2024 · 1 comment

Comments

@nikpag
Copy link
Collaborator

nikpag commented Jun 6, 2024

Trying to run the 1_1.sh script from the nlp suite breaks PaSh.

#!/bin/bash
# tag: count_words

IN=${IN:-$SUITE_DIR/inputs/pg}
OUT=${1:-$SUITE_DIR/outputs/1_1/}
ENTRIES=${ENTRIES:-1000}
mkdir -p "$OUT"

for input in $(ls ${IN} | head -n ${ENTRIES} | xargs -I arg1 basename arg1)
do
    # cat $IN/$input | tr -c 'A-Za-z' '[\n*]' | grep -v "^\s*$" | sort | uniq -c > $1/${input}.out
    cat $IN/$input > $1/$input.out
done

echo 'done';
# rm -rf "$OUT"

I invoke PaSh like this (my $PASH_TOP is /home/nick/pash):

IN=$PASH_TOP/evaluation/benchmarks/nlp/inputs/test/in  OUT=$PASH_TOP/evaluation/benchmarks/nlp/outputs/test/out ENTRIES=2 $PASH_TOP/pa.sh -d 1 -w 2 $PASH_TOP/evaluation/benchmarks/nlp/scripts/test.sh $PASH_TOP/evaluation/benchmarks/nlp/outputs/test/other_out

Note that the $OUT parameter and $1 are slightly different (.../out vs .../other_out), to differentiate between the two in error messages. Running PaSh like this produces the following traceback:

Traceback (most recent call last):
  File "/home/nick/pash/compiler/pash_compiler.py", line 96, in compile_ir
    ret = compile_optimize_output_script(ir_filename, compiled_script_file, args, compiler_config)
  File "/home/nick/pash/compiler/pash_compiler.py", line 112, in compile_optimize_output_script
    optimized_ast_or_ir = compile_optimize_df_region(candidate_df_region, args, compiler_config)
  File "/home/nick/pash/compiler/pash_compiler.py", line 160, in compile_optimize_df_region
    asts_and_irs = compile_candidate_df_region(df_region, config.config)
  File "/home/nick/pash/compiler/pash_compiler.py", line 206, in compile_candidate_df_region
    compiled_asts = compile_asts(candidate_df_region, fileIdGen, config)
  File "/home/nick/pash/compiler/ast_to_ir.py", line 60, in compile_asts
    expanded_ast = expand_command(ast_object, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 437, in expand_command
    return ast_match(command, expand_cases, exp_state)
  File "/home/nick/pash/python_pkgs/shasta/ast_node.py", line 824, in ast_match
    return cases[type(ast_node).NodeName](*args)(ast_node)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 408, in <lambda>
    lambda ast_node: expand_simple(ast_node, exp_state)),
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 448, in expand_simple
    node.redir_list = expand_redir_list(node.redir_list, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 459, in expand_redir_list
    redir_list[i] = expand_redir(r, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 464, in expand_redir
    file_arg = expand_arg(redirection.arg, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 304, in expand_arg
    new = expand_arg_char(arg_char, quoted, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 345, in expand_arg_char
    return expand_var(fmt=arg_char.fmt,
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 360, in expand_var
    _type, value = lookup_variable(var, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 215, in lookup_variable
    expanded_var = lookup_variable_inner(var, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 221, in lookup_variable_inner
    value = lookup_variable_inner_core(varname, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 231, in lookup_variable_inner_core
    value = lookup_variable_inner_unsafe(varname, exp_state)
  File "/home/nick/pash/python_pkgs/sh_expand/expand.py", line 240, in lookup_variable_inner_unsafe
    _type, value = exp_state.variables.get(varname, [None, None])
ValueError: too many values to unpack (expected 2)

The most recent call is in this function of the expand.py file:

def lookup_variable_inner_unsafe(varname, exp_state: ExpansionState):
    ## TODO: Is it in there? If we have -u and it is in there.
    _type, value = exp_state.variables.get(varname, [None, None])
    return value

Logging the result of the exp_state.variables.get() call (without unpacking into _type and value) shows these two variables:

  • /home/nick/pash/evaluation/benchmarks/nlp/outputs/test/other_out
  • (None, '61.txt')
    So the problem lies with the first parameter passed to pa.sh (the one specified after the script to parallelize), because it doesn't get turned to a 2-tuple as it should, unlike (None, '61.txt'), which is turned into a tuple without a problem.

Running the script with a named variable instead of $1 works without any problems.

@BolunThompson
Copy link
Contributor

BolunThompson commented Dec 30, 2024

This slightly modified script works for me on Ubuntu 20.04 and 24.04, so it seems fixed. I can’t think of any relevant changes we’ve made to the sh-expand code in the meantime, though.

test.sh

#!/bin/bash

IN=${IN:-$SUITE_DIR/inputs/pg}
ENTRIES=${ENTRIES:-1000}

for input in $(ls ${IN} | head -n ${ENTRIES} | xargs -I arg1 basename arg1)
do
    # cat $IN/$input | tr -c 'A-Za-z' '[\n*]' | grep -v "^\s*$" | sort | uniq -c > $1/${input}.out
    cat $IN/$input > $1/$input.out
done

cat $1/*
$ mkdir test
$ echo a > test/a
$ echo b > test/b
$ mkdir other_out
$ IN=./test $PASH_TOP/pa.sh test.sh other_out/
a
b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants