-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PWA periodically return invalid JSON for same query #214
Comments
So far, I haven't been successful in recreating this behavior on my development host. Is there any more information about the servers that are exhibiting this issue [e.g., host OS & version, version of PWA installed, installation method (docker or RPM)]? |
Hi @grigutis, I've seen this problem when I updated our PWA to the latest version here at RNP. We have a machine with CentOS 7.9 where we run the docker containers using this docker-compose file. |
@DanielNeto I'm still not able to reproduce this. Would you be able to get some logs for me? This should do it:
Run that, reproduce the problem, then you can kill those jobs and attach the logs to this issue. |
@DanielNeto Actually, maybe logs won't be necessary after all. I finally was able to reproduce this. It is only appearing when configs have tests that use a disjoint topology. |
Thanks to a user in Slack, I now know how to reliability reproduce this problem. It apparently only occurs when the app is under load. For example: $ ab -n 100 -c 2 https://psconfig.opensciencegrid.org/pub/config/opn-all and while that is going on, do $ for i in `seq 1 10` ; do curl -s https://psconfig.opensciencegrid.org/pub/config/opn-all | wc -c ; done If it's working correctly, you should see the same byte count for all 10 iterations. If it's not, you won't. I've also been reading a book about Node.js Design Patterns and came across something that sounds like it might be what is causing this issue.
I think the problem lies somewhere here, but that's just a hunch. I see that promise is being overridden in Mongoose, but not sure if that has anything to do with it yet. |
Just to give some more details about this … A colleague and I took a deeper look at this and when the issue appears, the host_groups_details variable is not being fully populated before the psconfig JSON object is returned. We're not sure where exactly the error is happening due to the nested async functions and anonymous call backs which make it very confusing to follow, but in general, the flow goes like this (all in meshconfig.js): exports.generate We made several attempts to fix the problem, but nothing was successful and came to the conclusion that rewriting the "/config/:url" route from scratch was probably the best way forward. |
Just wondering about the status on this. For OSG/WLCG we are worried that "variable" configs coming from PWA could be part of the problems we are seeing. We can track how often this is occurring using our CheckMK monitoring. For psconfig-itb see https://psetf.aglt2.org/etf/check_mk/index.py?start_url=%2Fetf%2Fpnp4nagios%2Findex.php%2Fgraph%3Fhost%3Dpsconfig-itb%26srv%3Dpsconfig-itb_stats%26theme%3Dmultisite%26baseurl%3D%2Fetf%2Fcheck_mk%2F%26view%3D4 and for psconfig see https://psetf.aglt2.org/etf/check_mk/index.py?start_url=%2Fetf%2Fpnp4nagios%2Findex.php%2Fgraph%3Fhost%3Dpsconfig%26srv%3Dpsconfig_stats%26source%3D0%26theme%3Dmultisite%26baseurl%3D%2Fetf%2Fcheck_mk%2F%26view%3D4 |
I'm still working on it, but I would appreciate any help. I'm working in the issue-214 branch, and the problem seems to be in meshconfig.js. I suspect either in the I'm trying to rewrite the callbacks into promises (async/await) to make the code flow clearer, but this is proving to be a real pain. |
If you run the
psconfig validate URL
command against a PWA URL, about 1/10 times it seems you will get a validation error. PWA is occasionally for some reason returning different JSON for the same call. We have seen this in multiple different contexts:Identifying what is causing the JSON to change to be invalid will need to be done and then determining the best way to fix it.
The text was updated successfully, but these errors were encountered: