You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I work for a project that validates its links using this library. One link that is frequently validated is the HTML spec at https://html.spec.whatwg.org/. This page has one of the bigger HTML files on the web but node-html-parser was able to parse it well in approximately 23 seconds on my local machine until release 5.3.2.
Consider this example:
constHTMLParser=require('node-html-parser');constnFetch=require('node-fetch');asyncfunctionparseHTMLSpec(){try{constresponse=awaitnFetch('https://html.spec.whatwg.org/');consthtml=awaitresponse.text();console.log('Fetched HTML. Attempting to parse...');console.time('parseHTMLSpec');constparsedHTML=HTMLParser.parse(html);console.timeEnd('parseHTMLSpec');console.log('HTML parsed successfully.');console.log('Title:',parsedHTML.querySelector('title').text);}catch(error){console.error('Error occurred:',error);}}parseHTMLSpec();
With node-html-parser 5.3.1, this outputs the following:
Fetched HTML. Attempting to parse...
parseHTMLSpec: 23.415s
HTML parsed successfully.
Title: HTML Standard
With node-html-parser 5.3.2, this hangs indefinitely; only outputting the following even after running for hours:
console.log('Fetched HTML. Attempting to parse...');
The text was updated successfully, but these errors were encountered:
stalgiag
changed the title
Regression: Versions >= v5.3.2 are unable to parse complex HTML
Regression: Versions >= v5.3.2 are unable to specific link
Sep 24, 2024
stalgiag
changed the title
Regression: Versions >= v5.3.2 are unable to specific link
Regression: Versions >= v5.3.2 are unable to parse specific link
Sep 24, 2024
Sorry for the bad experience.
I release a beta version [email protected]
but I could not test it due to large memory usage. Could you test it for me? thank you.
I tested [email protected] and I also ran out of memory with this error: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory.
I tested on both Node 20.10.0 and Node 18.18.1. Note that this does not happen on <v5.3.2 using the same machine.
I work for a project that validates its links using this library. One link that is frequently validated is the HTML spec at https://html.spec.whatwg.org/. This page has one of the bigger HTML files on the web but node-html-parser was able to parse it well in approximately 23 seconds on my local machine until release 5.3.2.
Consider this example:
With node-html-parser 5.3.1, this outputs the following:
With node-html-parser 5.3.2, this hangs indefinitely; only outputting the following even after running for hours:
The text was updated successfully, but these errors were encountered: