You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am marking this as a feature request to add it as an extra normalization option. In the meantime, you can achieve an equivalent with replacements and regular expression. Here is an example (untested):
<urlNormalizerclass="GenericURLNormalizer">
<normalization>
<!-- Your current normalizations here --->
</normalizations>
<replacements>
<replace>
<match>(.*?)(#[^\/]*)$</match>
<replacement>$1</replacement>
</replace>
</replacements>
</urlNormalizer>
We previously used a similar pattern, but we’ll give yours a try—it might be safer than the one we developed since we are not experts in Java regex. :)
3.1.0-SNAPSHOT was just released and now supports a new normalization rule: removeTrailingFragment. It behaves teh same as removeFragment except for only considering a hashtag to be a fragment if after the last URL segment (/...)
https://opensource.norconex.com/commons/lang/v2/apidocs/com/norconex/commons/lang/url/URLNormalizer.html?is-external=true#removeFragment--
The removeFragments option in the URL normalizer seems to be removing the pound sign (#) in URLs used in Single Page Application (SPA) schemes.
For example, the URL https://forces.ca/en/events/#/details/14742 should remain intact, even when the removeFragments option is applied.
I believe the fragment should only be removed if it appears at the end of the URL, after the last forward slash, like in https://forces.ca/en/career/emergency-medicine/#sec-training.
Thank you.
The text was updated successfully, but these errors were encountered: