Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird css in some rows of ACA search results #4254

Open
sergiobaiao opened this issue Nov 22, 2024 · 20 comments
Open

Weird css in some rows of ACA search results #4254

sergiobaiao opened this issue Nov 22, 2024 · 20 comments

Comments

@sergiobaiao
Copy link

sergiobaiao commented Nov 22, 2024

Describe the bug
I'm having some weird ACA search results within my environment, like the picture bellow:

image

Some rows in the results are showing "n class='aca-highlight'>" within the title, description and/or content. Aparently this only happens to my own Alfresco data, since @aborroy was able to replicate this with his own local data, but did replicate with my data from my Alfresco server.

Environment

  • ACA version: 5.2.0
  • Alfresco 23.2 from alfresco-docker-installer with this .env

ALFRESCO_CE_TAG=23.2.1
SEARCH_CE_TAG=2.0.9.1
SHARE_TAG=23.2.1
ACA_TAG=4.4.1
POSTGRES_TAG=15.6
#POSTGRES_TAG=14.4
MARIADB_TAG=11.3.2
TRANSFORM_ENGINE_TAG=5.1.0
ACTIVEMQ_TAG=5.18-jre17-rockylinux8

To Reproduce
Steps to reproduce the behavior:

  1. Go to ACA
  2. Click on search icon and perform a search
  3. Scroll down to search results
  4. See error

I don't think this will happen to everyone, maybe it's related to wrong character encoding from metadata on the search results, that maybe wrongly escaping quotes or any other chars. Also, the last line from each row is showing a lot of weird chars also.

Screenshots
ACA Search Results:
image

Share Search Results: (same user and same search term)
image

P.S - I can provide a Consumer only user with restricted access to anyone who wants to test this against my Alfresco server.

@aborroy
Copy link

aborroy commented Nov 25, 2024

@DenysVuika my guessing is that there is some error in this method:

https://github.com/Alfresco/alfresco-content-app/blob/5.2.0/projects/aca-content/src/lib/components/search/search-results-row/search-results-row.component.ts#L157C1-L161C4

Some inputs, like a nested "aca-highlight" input would create this issue.

A better approach could be to use HTML elements for this replacement. Something similar to:

private stripHighlighting(highlightedContent: string): string {
  if (!highlightedContent) return '';
  
  const parser = new DOMParser();
  const doc = parser.parseFromString(highlightedContent, 'text/html');
  const highlights = doc.querySelectorAll('.aca-highlight');
  
  highlights.forEach(highlight => {
    const parent = highlight.parentNode;
    if (parent) {
      while (highlight.firstChild) {
        parent.insertBefore(highlight.firstChild, highlight);
      }
      parent.removeChild(highlight);
    }
  });
  
  return doc.body.innerHTML;
}

@sergiobaiao
Copy link
Author

while researching today for a probable cause, i've done this to log what i was receiving from the search results:
const highlights: SearchEntryHighlight[] = this.node.entry['search']?.['highlight'];
let name = this.node.entry.name;
console.log("name", name);
const properties = this.node.entry.properties;
console.log("properties", properties);
let title = properties?.['cm:title'] || '';
console.log("title", title);
let description = properties?.['cm:description'] || '';
console.log("description", description);
let content = '';
console.log("content", content);

and this is the search result:
image

If you look what i've circled in red in the right side, the description field has a lot of back/forward slashes, and this entire piece is being stripped from the search result on the left site. I'm not sure if the slashes are being correctly escaped and this may be one of the causes.

@sergiobaiao
Copy link
Author

@DenysVuika my guessing is that there is some error in this method:

https://github.com/Alfresco/alfresco-content-app/blob/5.2.0/projects/aca-content/src/lib/components/search/search-results-row/search-results-row.component.ts#L157C1-L161C4

Some inputs, like a nested "aca-highlight" input would create this issue.

A better approach could be to use HTML elements for this replacement. Something similar to:

private stripHighlighting(highlightedContent: string): string {
  if (!highlightedContent) return '';
  
  const parser = new DOMParser();
  const doc = parser.parseFromString(highlightedContent, 'text/html');
  const highlights = doc.querySelectorAll('.aca-highlight');
  
  highlights.forEach(highlight => {
    const parent = highlight.parentNode;
    if (parent) {
      while (highlight.firstChild) {
        parent.insertBefore(highlight.firstChild, highlight);
      }
      parent.removeChild(highlight);
    }
  });
  
  return doc.body.innerHTML;
}

I tried this and it still shows the same error.

@aborroy
Copy link

aborroy commented Nov 25, 2024

In that case, the fix should be something like:

private stripHighlighting(highlightedContent: string): string {
  if (!highlightedContent) return '';
  
  // Escape special regex characters in prefix and postfix
  const escapedPrefix = this.highlightPrefix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  const escapedPostfix = this.highlightPostfix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  
  return highlightedContent
    .replace(new RegExp(escapedPrefix, 'g'), '')
    .replace(new RegExp(escapedPostfix, 'g'), '');
}

@sergiobaiao
Copy link
Author

sergiobaiao commented Nov 26, 2024

In that case, the fix should be something like:

private stripHighlighting(highlightedContent: string): string {
  if (!highlightedContent) return '';
  
  // Escape special regex characters in prefix and postfix
  const escapedPrefix = this.highlightPrefix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  const escapedPostfix = this.highlightPostfix.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  
  return highlightedContent
    .replace(new RegExp(escapedPrefix, 'g'), '')
    .replace(new RegExp(escapedPostfix, 'g'), '');
}

tried that also, didn't work. I think we need to search for wrongly escaped chars not only on the prefix/postfix, but also on the entire title/name/description/content

@sergiobaiao
Copy link
Author

my dev just found out that the function :

highlights?.forEach((highlight) => {
switch (highlight.field) {
case 'cm:name':
name = highlight.snippets[0];
break;
case 'cm:title':
title = highlight.snippets[0];
break;
case 'cm:description':
description = highlight.snippets[0];
break;
case 'cm:content':
content = ...${highlight.snippets[0]}...;
break;
default:
break;
}
});

is inserting the wrong code (n class="aca-highlight">). The highlight is imported from @alfresco/js-api.

@sergiobaiao
Copy link
Author

any workaround on this?

@MichalKinas
Copy link
Contributor

@sergiobaiao @aborroy I will investigate this issue, can you please provide a pdf and a search phrase that I could use to reproduce this issue on my local env? This would be very helpful as I've never seen this issue before

@sergiobaiao
Copy link
Author

i can do better...is there a way i can contact you directly so i can send you an username/password to my alfresco server?

@MichalKinas
Copy link
Contributor

Yes you can message me on MS Teams, Slack or send me an email -> [email protected]

@sergiobaiao
Copy link
Author

Hi Michal, did you receive my email?

@MichalKinas
Copy link
Contributor

Hello Sergio, yes I did, thank you! I was able to access the server and replicate the issue, I'm investigating the code now

@MichalKinas
Copy link
Contributor

Okay I analyzed what's going on there and it seems that this is an issue with search engine, instead of providing proper prefix <span class='aca-highlight'> for every highlight it seems to apply n class='aca-highlight'> instead which break the rendering in ACA in that particular case, I will forward this issue to the BE team to analyze it

@MichalKinas
Copy link
Contributor

@sergiobaiao could I use one of the PDFs for which the issue is observed and attach it to the bug report for the BE team? So that they can reproduce it locally in their ACS instance? Also please let me know which search engine you're using in your instance, is it SOLR or ES?

@sergiobaiao
Copy link
Author

sergiobaiao commented Dec 6, 2024

Yes you can, these are public domain documents. We're using SOLR. You can forward that login info too if needed.

@MichalKinas
Copy link
Contributor

Okay so I was able to reproduce this issue locally with Solr, I created the bug for the BE team and passed the info for them

@sergiobaiao
Copy link
Author

is there a link for me to follow the bug report?

@MichalKinas
Copy link
Contributor

Yes -> https://hyland.atlassian.net/browse/ACS-9073 sorry I forgot to put it here

@sergiobaiao
Copy link
Author

sergiobaiao commented Dec 16, 2024 via email

@MichalKinas
Copy link
Contributor

I'm sorry but we cannot grant you access there, this bug will be picked up by the BE team as soon as this will be possible, I will update you on the progress and planned release date here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants