-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeLoadException or BadImageFormatException under heavy multi-threaded proxy generation #193
Comments
@bluerobotch since you are the only one with a repro and DynamicProxy has been used in a lot of multi-threaded web apps for many years this might not be too easy to track down. Could you pull out fuslogvw to see if you can get any more info on the fusion failure. Thanks. |
@jonorossi thanks for the quick reply. We already tried to get more information by enabling the fusion log. But unfortunately the fusion log does not contain any records concerning the above problem. We also enabled the castle core log, but it did not provide further information. It looks like it's a type loading issue and not a problem with locating the assembly. During investigation we had the TypeLoadException in one of two identical test were the first one failed and the second one succeeded (xunit theory). The first test (failing) created the proxy type and the invocation type. The second one (succeeding) took both types from the cache. |
@bluerobotch could you include the |
Hello @jonorossi Thanks for the fast answer and support. I created the issue on moq. Yes it is the same problem described in devlooped/moq#246. @bluerobotch and I work on the same project. Together we trying to find the root cause of the issue since february without any luck. Locking into the code of castle and moq. You can use the stack trace of the issue devlooped/moq#246. Its never the same test but as an example trance it should work for the TypeLoaderException. |
@bluerobotch @Schaeri |
@bluerobotch @Schaeri scratch most of that, I see you've included the Moq version in this issue. The question about other usage is still relevant. |
Yes we have other usage of castle core inside our project. We need castle core for our WCF infrastructure and to extend our dependency inject framework. This infrastructure is also covered in some unit tests. For a test I eliminate our castle core dependency in our project and run our unit test again. The error currently not occurred. But I have to run the tests over night to see if this is really an issue. What can cause this error when we also use castle core inside our project? Or what does we do wrong when using castle core? |
@Schaeri at the moment I have no idea, just trying to guess things that might be different with your project which is somehow unique with no other reports of this problem. There might be a defect in DynamicProxy when two ProxyGenerators are used at the same time and this is only surfacing because of how they are set up during unit tests. Let me know the results of that overnight unit test run, and anything else your project is doing that you think might be out of the ordinary (e.g. mocking COM interfaces). |
Unfortunately the error still occurs. Currently I have no clue what we do special that could provoke such an error. @bluerobotch and I will sit together and rethink the situation. Maybe we have an other idea what could be special about our project. |
@Schaeri no worries, at least it narrows it down by exclusion. Unfortunately I'm pretty swamped at the moment otherwise I'd have accepted your request for a remote session. I assume since you have been looking at this since February you have a workaround, maybe running your tests without parallelisation? |
No at the moment we haven’t a workaround. When we switched from xunit 1.x to 2.x we did a lot of performance optimizations to run our tests as fast as possible. Execute our tests synchronous means we have to adapt our whole project structure to reach the same build and integrations times. But yesterday I was able to reproduce one of our issue we have with castle on a very simple example. The attached example contains 1001 interfaces and 1001 test class. Each test class mocks a single interface and setup the DoSomething method on it. Then it calls the method and make sure it returns the setup value. This is done in a theory with the value true and false. The Run.bat configure the xunit runner with the parameters we use in our project (run test classes in parallel) and repeat the execution of the test until they fail. They will fail with a BadImageFormatException thrown inside castle.core (see screenshot). To reproduce the error, the test must run 1 to 8 hours. But it will happen every try. The Run.bat can be started more than one time to increase the chance to get the error faster. We have to deal in most cases with the TypeLoaderException but the BadImageFormatException occurs also sometime. And I think both exception have the same origin. |
@Schaeri great to hear you've been able to put together a repro. I've had it running now on my Windows VM for just over 8 hours and it hasn't failed. I'll run another few copies to try to push it to fail. What is your machine and build server set up, i.e. how many CPUs, are they virtualised. |
@Schaeri just got a BadImageFormatException! Different message which definitely indicates something screwy.
|
Currently we run our tests in a loop on our developer notebooks for analyze the issue and for increasing the chance for getting the error. Intel i7 with 4 cores with hyper threading. The OS is a windows 7 64 bit and the .net framework is 4.6.1 (394271). The errors on our developer notebooks and build servers seams the same. But our build agents running on VMware vSphere 5.5 on Intel Xeon CPUs. The framework is the same of our build agents. I run the sample a lot and get every time the exception on the screenshot (at least 10 times). But on our build server I saw also the error System.BadImageFormatException : Could not load file or assembly. The problem seems to have different characteristic. |
@jonorossi we saw that you have labeled our issue as bug. Can we somehow contribute to help resolving the bug? And again, we appreciate your help and fast response. |
@Schaeri I haven't actually confirmed it is a bug in DP rather than the .NET runtime (i.e. still don't know what causes it), however I could reproduce it. The next week for me is going to be busy so I won't be able to look into it, we need to get to the bottom of the problem obviously before we can fix it. |
Guys, just an update on what I found when I tried to parallelise the tests for castle core using win10-x64(host install) doing multi framework targeting. The modulescope class was kicking out errors because of I/O collisions between threads when saving strong named assemblies. The reason is because it uses fixed file names. I hacked this locally to do dynamic assembly naming locally. After that multi threaded proxy gens worked and tests started passing. The trade off unfortunately, is that you cannot leverage friend assemblies easily that are strongly named by using internals visible to attributes. Unless this has been solved, you might want to create a test that batters ModuleScope in a multi-threaded context calling SaveAssembly(true). My 2 cents anyway... |
Hello @Fir3pho3nixx. Can you attach a patch for your modulescope class fix. I would like to check if this would solve our problem we have. Thanks for the help. |
Here is my *** Edited/Removed for TL;DR *** |
Thanks for the patch. I applied your changes and build castle.core and Moq with the modifications. I include the updated package into my previous posted CastleProblem project and tested the change now for over a week. Its sad but I still get the type loader exceptions from time to time. With and without modification the occurrence seems to be the same of the problem. Any other idea what can cause the problem? Thanks for the support. |
Will download Castle Problem zip file and start digging around to see if I can replicate this. |
Thanks for the support. The error is very rar. The test must run for several hours until it will occur. If you need any help let me know. |
Just an update on this issue, I raised a new one here: #253 They could be related or not. Would like to check this out first before I come back to this. Will let you know what the outcome is. |
Thanks for the support. Yes let me know what the outcome is. Or when you have a patch/pre-realease to do further testing. |
https://github.com/BrunoJuchli/Core/tree/ModulePerProxy has run for 96hours / 4 days straight without one failure. @stakx |
@BrunoJuchli - Interesting news! It appears that after many months and some wrong turns, we're getting somewhere now. even though most of that time is now spent waiting for several days. 😄 I might be able to whip up a simple implementation of
I'll send a PR your way if I get around to it. |
@BrunoJuchli - didn't send a PR, but I wrote this program (Gist) which mimicks DynamicProxy's interceptors and invocations. I've run 4 instances of that program for 6 hours, but that wasn't enough to repro the issue. Maybe you can still use any parts of it. |
@stakx I looked at your program and noticing it does some parallelism, somehow I still ended up thinking 'how is it going to repro the issue if it isn't doing locking?' The reason its not doing locking AFAICS is that it assumes its safe to do so based on a) the order in which the types are generated. b) the fact there aren't going to be multiple threads which can try to write to the same type. So the net result, I haven't logically figured out why I am thinking this thought, but it still nags at me, could it fail to repro because what it is doing is too linearized somehow compared to the other repro we have? |
Or is it just because they are intercepts of interface calls..? |
Ideas for mixing it up: |
@TimLovellSmith: That Gist was simply trying to repro the issue under the assumption that neither Your first post seems somewhat contradictory, "too linearized" is what you'd get with locking, yet there's none of that as you noticed, so "too linearized" doesn't appear to be a likely reason for why the repro fails to fail. (Unless I've made a mistake with the use of the TPL.) It was a quick try, but since it wasn't successful, I'm not going to invest much more time there (but feel free to run further experiments, we can need all the help we can get here!). Chances of a successful minimal repro are perhaps higher if we go back to DynamicProxy and start taking it apart as much as we can... DynamicProxy does a lot more stuff than my repro, who knows which of these many differences is responsible for the errors!? Btw., if you want to run further experiments based on my Gist, I suspect it would be important to let it involve types that force Ref.Emit to rewrite e.g. method signatures (IL metadata tokens referring to types are always module-specific so they cannot be reused in a different module), my current suspicion is that this is one place where things might be going wrong, and my repro likely doesn't cause any signature rewriting. A start would be to have the 1,000 interface types in the repro code from which the generated proxy types in the dynamic module inherit. |
It seems the issue I am experiencing with integration tests running in parallel is related to this |
@iberodev Are you experiencing it sporadically or can it be reproduced everytime when running tests in parallel? |
@BrunoJuchli I experience this sporadically, unfortunately. I suspect it has something to do with the assembly loading and Autofac. Funnily enough, when I experience it, if I restart my PC often the problem goes away (like right now.. of course! :/ ) |
I ran https://github.com/stakx/CastleCore193Repro/tree/master/NetCore/NetCoreRepro again yesterday. Switched to .net Core runtime 2.2.5. After 6 hours 2 out of 32 failed with a TypeLoadException for type '' (name => empty quotes). |
So, I've tested .net 4.8 with Castle.Core 4.4. 32 processes, on my 32 Logical Cores machine, for 14 hours. 5 Out of them failed. I've got the following exceptions:
I couldn't debug them with VS2019, it hung when loading... |
I have a very similar issue when calling the following code in one of the tests:
the error's i am getting is always regarding a mocked interface that was used in one of the tests ran before the above code: It happens a bit random as well and also happens when running non-parallel in my case, but as mentioned it seems that it's required to run some "mocked" tests in the same project before this code is executed to make it happen. My guessing work is that a previous test created some "proxy" types and as soon as that test class is finished they are "unloaded" again, which makes them unavailable for a later test. I am using xunit 2.4.1 and Moq 4.17.2 (which uses castle 5.0.0). |
Thanks for chiming in!
This seems unlikely. The .NET runtime cannot unload single types. The .NET Framework can unload |
Having the same issue here: FakeItEasy/FakeItEasy#1910 with FakeItEasy 7.3.1 and Castle.Core 4.3.1. |
I was exploring the codebase of the FakeItEasy and the Castle Dynamic proxy. I have it failing a couple of times per day. Any ideas what I should be adding to those logs? |
@maxcherednik, if you read the whole thread above, you'll find that we last suspected a bug in the runtime. To diagnose this problem further, one would presumably have to set up the runtime for debugging (being able to get full stack traces and step into its source, etc.). I didn't manage to get a stable, reliable CoreCLR dev environment at the time; perhaps someone else is more lucky. I can't really give any more precise advice without resuming that work myself, unfortunately. |
@stakx sorry for the stupid questions. I have read the whole thread - I see you guys having fun here and I am totally late to the party. From the history I see:
Since I am new to the party I am trying to double check and validate certain ideas. I have added some logs to the failing tests so that we could inspect the generated types. I see some strange behavior which might give us a lead. |
@maxcherednik, a few points in random order:
|
Indeed seems the same.
Might be. I am still trying to leave as much trace as possible. For the ones who is going to be fixing it on the CLR side. For example, the bug might be during the type generation or during the invocation. I am currently inclined towards the type generation. From the logs it is clear that the generic method of the generated type differs from the generic method of the interface. |
We use Moq 4.5.10 with Castle.Core 3.3.3 and xUnit 2.1.0 for our unit tests. For xUnit we have the option activated to only have one app domain for all tests. We also enabled parallelization of test execution in xUnit.
Sporadically we get TypeLoadExceptions like the following example:
System.TypeLoadException : Could not load type 'Castle.Proxies.Invocations.IExecutorAccessor_Prepare' from assembly 'DynamicProxyGenAssembly2, Version=0.0.0.0, Culture=neutral, PublicKeyToken=a621a9e7e5c32e69'. at Castle.Proxies.IExecutorAccessorProxy.Prepare[TExecutionUnit](TExecutionUnit executionUnit) ....
We reviewed the source code of Moq to ensure there is no threading issue with the ProxyGenerator caching. But Moq looks fine.
When disabling xUnit test parallelization we don't get TypeLoadExceptions anymore. For this reason we think it is a race condition in Castle.Core or even in the .Net Framework.
As far as we see, the missing type is the type that is created for the invocation by InterfaceProxyWithoutTargetContributor.GetInvocationType. The TypeLoadException seems to be raised in the code that is emitted for the invocation in MethodWithInvocationGenerator.BuildProxiedMethodBody. The invocation type is built before it is used in the emitted code. Therefore we do not see a reason why the type cannot be loaded.
The text was updated successfully, but these errors were encountered: