Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_validate_11.sh script gets stuck on first test case #92

Open
pawani2v opened this issue Feb 24, 2023 · 14 comments
Open

run_validate_11.sh script gets stuck on first test case #92

pawani2v opened this issue Feb 24, 2023 · 14 comments

Comments

@pawani2v
Copy link

pawani2v commented Feb 24, 2023

I am prepared my shared lib and now i am running run_validate_11.sh but the validate11 does not return even after it has run on all the images.

Checking installation of required packages [SUCCESS]
Looking for core implementation library in /home/i2v/Downloads/NIST/frvt/11/lib.[SUCCESS] Found core implementation library /home/i2v/Downloads/NIST/frvt/11/lib/libfrvt_11_i2v_001.so.
Attempting to compile and link /home/i2v/Downloads/NIST/frvt/11/lib/libfrvt_11_i2v_001.so against test harness.
Scanning dependencies of target validate11
[ 33%] Building CXX object src/testdriver/CMakeFiles/validate11.dir/home/i2v/Downloads/NIST/frvt/common/src/util/util.cpp.o
[ 66%] Building CXX object src/testdriver/CMakeFiles/validate11.dir/validate11.cpp.o
[100%] Linking CXX executable ../../../bin/validate11
[100%] Built target validate11
[SUCCESS] Built executable in /home/i2v/Downloads/NIST/frvt/11/bin.
------------------------------
 Running 1:1 validation
------------------------------
Checking for hard-coded config directory
Exiting with status 0

How do i know validate11 does not return control to run_testdriver.sh script: I put a std::cout in validate11.cpp just before return and changed the switch code to:

std::cout << "Exiting with status " << exitStatus << std::endl;    
    return exitStatus;
   for (auto &inputFile : inputFileVector) {
		/* Fork */
		switch(fork()) {
		case 0: /* Child */
			if (action == Action::CreateTemplate)
			{	exitStatus = createTemplate(
                            implPtr,
                            inputFile,
                            outputDir + "/" + outputFileStem + ".log." + to_string(i),
                            templatesDir,
                            role);
            }
            else if (action == Action::CreateMultiTemplates)
            {    exitStatus = createMultiTemplates(
                            implPtr,
                            inputFile,
                            outputDir + "/" + outputFileStem + ".log." + to_string(i),
                            templatesDir,
                            role);
            }
			else if (action == Action::Match)
            {   exitStatus = match(
                            implPtr,
                            inputFile,
                            templatesDir,
                            outputDir + "/" + outputFileStem + ".log." + to_string(i));
            }
            break;
		case -1: /* Error */
			cerr << "Problem forking" << endl;
			break;

I can see the Exiting with status 0 printed but can't see the echo that i did in run_testdriver.sh script below the retEnroll=$?

echo i put in script: echo "retEnroll: $retEnroll"

But if i open htop and kill the process i prints [SUCCESS] and proceeds further, but the same issue occurs in next steps, what is this issue and how do i resolve it.

I have run this on my development machine , and also on a virtual machine that i created using the image provided by NIST:
My Machine:
image

Virtual machine:
image

The issue occurs on both machines.

@RomanShnurov
Copy link

RomanShnurov commented Mar 1, 2023

Same thing, but for run_validate_1N script. Any update here?

@pawani2v
Copy link
Author

pawani2v commented Mar 1, 2023

@RomanShnurov No updates, i haven't been able to resolve or find what is the issue.

@pawani2v pawani2v changed the title run_validate_11.sh gets stuck on first test case run_validate_11.sh script gets stuck on first test case Mar 1, 2023
@pawani2v
Copy link
Author

pawani2v commented Mar 1, 2023

@mlngan can you please help us out.

@mlngan
Copy link
Collaborator

mlngan commented Mar 1, 2023

Can you please try and run the provided null implementation and let us know if you encounter the same issue?

@RomanShnurov
Copy link

@mlngan
I didn't encounter the problem when run the provided null implementation.
But I don't get how my code and @pawani2v 's code can affect the script if we see that our programs terminate successfully.

@xsacha
Copy link

xsacha commented Mar 1, 2023

Sounds like a genuine problem in your implementation.

Check permissions of the folder.
Also make sure your app is resilient to a moved config directory, which is what that first test is.

@RomanShnurov
Copy link

@xsacha, thanks for reply!
But how permissions of the folder can help in our situation?
And what is the configuration folder? The folder where my models(and other config things) are supposed to be stored?

@xsacha
Copy link

xsacha commented Mar 1, 2023

From what I remember, the test makes the folder unreadable (permissions) and tests that you cope with this situation.

The folder you use should be the one that is provided to the API. You shouldn't hardcode the name. You should also respond with an appropriate error if you cannot access it. That's what this test is.

If your app crashed or otherwise failed due to not checking the folder, you might need to manually reset the permissions first.

@pawani2v
Copy link
Author

pawani2v commented Mar 2, 2023

@xsacha thanks for responding, i don't think that the config folder is a issue since i am able to load models ( i can see it but putting print in my code). Also CreateTemplate runs for all test methods (again can confirm since i put print in it), and as i mentioned in the original question i have put a line std::cout << "Exiting with status " << exitStatus << std::endl; in validate11.cpp before return and it prints but then after it their is no response.

@pawani2v
Copy link
Author

pawani2v commented Mar 2, 2023

@mlngan These are the error i get when i run the provided null implementation in virtual machine created using image provided by NIST, also it the same error i get on my machine.

image
image

@mlngan
Copy link
Collaborator

mlngan commented Mar 2, 2023

@pawani2v It appears that you've modified the validation script/source code. It would be a good idea to run the null implementation with the original, unmodified scripts/code. The expected order of tests and output should like look the below:


Running 1:1 validation

Checking for hard-coded config directory [SUCCESS]
Creating Enrollment Templates (Single Process) [SUCCESS]
Creating Enrollment Templates on Multiple Images per Subject (Single Process) [SUCCESS]
Creating Enrollment Templates (Multiple Processes) [SUCCESS]
Creating Verification Templates (Multiple Processes) [SUCCESS]
Matching Templates (Multiple Processes) [SUCCESS]
Creating Verification Templates for Multiple Persons Detected in an Image (Single Process) [SUCCESS]
Matching Multi-person Templates (Single Process) [SUCCESS]
Sanity checking validation output [SUCCESS]
Creating submission package [SUCCESS]

@RomanShnurov
Copy link

I found the problem. I'm using ONNXInference and forgot to set IntraOpNumThreads to Ort::SessionOptions.

Ort::SesstionOptions options;
options.SetIntraOpNumThreads(1);
...

@pawani2v
Copy link
Author

pawani2v commented Mar 3, 2023

@RomanShnurov did you get it running after this change ?

@RomanShnurov
Copy link

@pawani2v, yep, all tests were passed after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants