Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relo records not being created properly with inlining and findOrFabricateShadowSymbol #20669

Closed
cjjdespres opened this issue Nov 22, 2024 · 10 comments · Fixed by #20686
Closed

Comments

@cjjdespres
Copy link
Contributor

cjjdespres commented Nov 22, 2024

While running acmeair with -Xaot:forceAOT, I noticed that adding disableInlining in the cold run reduced the number of AOT failures by around 200, including roughly 150 compilationSymbolValidationManagerFailure failures (which are assertion failures in the SVM). If I run with TR_svmAssertionsAreFatal=1, I get a crash in TR_J9SharedCacheVM::getClassFromSignature() in this method:

J9::SymbolReferenceTable::findOrFabricateShadowSymbol(

Compiling this method without fatal assertions does show that the failure occurs during

[     4] O^O INLINER: Inlining qwerty java/util/Locale.equals(Ljava/lang/Object;)Z into java/util/ResourceBundle$CacheKey.equals(Ljava/lang/Object;)Z with a virtual guard kind=DirectMethodGuard type=NonoverriddenTest partialInline=0

The assertion is "method ... should have already been validated", and from looking at findOrFabricateShadowSymbol() it does seem like the first method of containingClass wouldn't necessarily have been validated already. A relo record should be created for this method, or some other way should be found to get declaringClass.


(EDIT: that other issue is only somewhat related to this, and the issue there wouldn't have been caused by the error here).

Actually, this might be related to #20529 (comment) as well. I was confused why certain values couldn't be looked up in a beholder constant pool while TR_RelocationRecordValidateStaticClassFromCP was being validated, and noticed that there weren't any prior records with an internalVMFunctions call that would actually resolve the entry at that index, unlike other collections of records. I suspect that if the first method of containingClass coincidentally happens to be validated already, the SVM compilation will succeed, but then will fail to load because a record isn't created for definingClass and some time later something called classOfStatic() on definingClass. I mention this because that ValidateStaticClassFromCP record was created while inlining java/util/Locale.equals(Ljava/lang/Object;)Z too, in what looks like the same the same place as the assertion failure here. (The method being compiled was different, of course).

Now that I think about it, getClassFromSignature() isn't called with isVettedForAOT=true and that would cause an arbitrary class validation to be created without SVM, so the declaringClass in findOrFabricateShadowSymbol() is probably missing a validation record as well.

Copy link

Issue Number: 20669
Status: Open
Recommended Components: comp:vm, comp:test, comp:gc

@cjjdespres
Copy link
Contributor Author

@mpirvu This is the failure I was seeing during my testing, and @dsouzai this could be related to the constant pool lookup failures I was talking about, which might have been a bug after all.

@dsouzai
Copy link
Contributor

dsouzai commented Nov 22, 2024

I suspect that if the first method of containingClass coincidentally happens to be validated already, the SVM compilation will succeed, but then will fail to load because a record isn't created for definingClass

I'm not sure how that could happen. The reason is because, if a method gets a SVM validation record, it is
done via TR::SymbolValidationManager::addMethodRecord which calls appendClassChainInfoRecords(record->definingClass(), chainInfo); [1]. If the defining class was not already validated, this will define a new ID as part of the ClassChainRecord record.

That said, I had run into a "should have already been validated" error as part of my AOT MH work, because there was a mismatch between the JIT and AOT FrontEnd/ResolvedMethod APIs [2]. Maybe something similar is happening here? You'd have to use a trace log to find the source of the method that is missing the validation.

[1]

appendClassChainInfoRecords(record->definingClass(), chainInfo);

[2] 9a779e1

@dsouzai
Copy link
Contributor

dsouzai commented Nov 22, 2024

Now that I think about it, getClassFromSignature() isn't called with isVettedForAOT=true and that would cause an arbitrary class validation to be created without SVM, so the declaringClass in findOrFabricateShadowSymbol() is probably missing a validation record as well.

Because getClassFromSignature() is not called with isVettedForAOT=true, in non-SVM AOT it will return NULL, which I think just means the compiler treats this symbol as unresolved. So, it's fine that there's no validation record here.

@cjjdespres cjjdespres changed the title Relo records not being created properly with inlining and findOrCreateShadowSymbol Relo records not being created properly with inlining and findOrFabricateShadowSymbol Nov 22, 2024
@cjjdespres
Copy link
Contributor Author

I'm not sure how that could happen.

What I mean is that in this issue, the method here:

TR::SymbolValidationManager *svm = comp->getSymbolValidationManager();
SVM_ASSERT_ALREADY_VALIDATED(svm, method);
validated = svm->addClassByNameRecord(j9class, getClassFromMethodBlock(method));

was not previously validated, so the SVM assert triggers and the AOT warm compilation fails. It looks like that's because firstMethod here:

// As yet JITServer does not support getClassFromSignature() based directly
// on J9ConstantPool*, but it does support it using TR_OpaqueMethodBlock*.
// Find an arbitrary method (if there is one) defined by the same class that
// declares the field, and use that method to getClassFromSignature(), since
// it will have the desired constant pool.
//
// The class containing the field is highly likely to declare at least a
// constructor, and if it doesn't, then it seems that it's not possible to
// instantiate it in the usual way (new, dup, invokespecial <init>), so the
// performance of accesses to its instance fields is especially unlikely to
// matter.
//
TR_J9VM *fej9 = reinterpret_cast<TR_J9VM *>(fe());
if (fej9->getNumMethods(containingClass) > 0)
{
auto *firstMethod =
static_cast<TR_OpaqueMethodBlock*>(fej9->getMethods(containingClass));
TR_OpaqueClassBlock *declaredClass = fej9->getClassFromSignature(
signature, (int32_t)strlen(signature), firstMethod);
if (declaredClass != NULL)
sym->setDeclaredClass(declaredClass);
}

is just the first method in the pool, and there wouldn't be any reason for a method record to have been created for it at this point. My idea was that in #20529 (comment), the method would coincidentally have been validated earlier, so compilation wouldn't have failed. I think that could happen, but after looking at the compilation logs again for #20529 (comment), I realized that the problem there wouldn't have occurred in findOrFabricateShadowSymbol. I was getting the logs of a couple methods mixed up, so the actual issue there is a bit different than what I described. Sorry. The sorts of relocation failures I was talking about do relate to using getClassFromSignature() instead of consulting a constant pool like in the failing code above, but I might as well describe exactly what's going on in that other issue.

@cjjdespres
Copy link
Contributor Author

I've tested this a bit more, and I should mention that I get some (but not a very large number of) compilationSymbolValidationManagerFailures with only -Xaot:forceAOT, and also with both -Xjit:disableInlining -Xaot:forceAOT,disableInlining. It's when I run with -Xjit:disableInlining -Xaot:forceAOT that I get as many as I mentioned in the first comment.

@dsouzai
Copy link
Contributor

dsouzai commented Nov 27, 2024

is just the first method in the pool, and there wouldn't be any reason for a method record to have been created for it at this point.

Ah I see, well this is fairly easy to solve, but maybe not the easiest to make it clean. After

    auto *firstMethod = 
       static_cast<TR_OpaqueMethodBlock*>(fej9->getMethods(containingClass)); 

you need to make a call similar to

bool
J9::Compilation::validateTargetToBeInlined(TR_ResolvedMethod *implementer)
   {
   if (self()->getOption(TR_UseSymbolValidationManager)
       && self()->compileRelocatableCode())
      {
      return self()->getSymbolValidationManager()->addMethodFromClassRecord(implementer->getPersistentIdentifier(),
                                                                            implementer->classOfMethod(),
                                                                            -1);
      }
   return true;
   }

where you call addMethodFromClassRecord. That way it creates a validation record for that first method.

@cjjdespres
Copy link
Contributor Author

Ah I see, well this is fairly easy to solve, but maybe not the easiest to make it clean.

Yeah, I started doing something similar and it was a bit inelegant. I decided to take a stab at implementing getClassFromSignature() with a constant pool for JITServer/AOT instead, in #20686, since that comment in findOrFabricateShadowSymbol() said it would be a better choice if it were supported.

@dsouzai
Copy link
Contributor

dsouzai commented Nov 27, 2024

Yeah saw your PR after I made my comment, and what you're doing now is definitely much cleaner than what I suggested.

Copy link

Issue Number: 20669
Status: Closed
Actual Components: comp:jit:aot
Actual Assignees: No one :(
PR Assignees: cjjdespres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants