Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Ensure that full processor can seek to earlier bookmarks #7649

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f9777aa
Add failing test case
sirreal Oct 25, 2024
5d75bb7
Work on fix
sirreal Oct 25, 2024
541b4f6
wip
sirreal Oct 25, 2024
dc3f1e6
try fixes
sirreal Nov 5, 2024
89aa01b
Fix the issue
sirreal Nov 5, 2024
737bf92
Restore original bookmark matching
sirreal Nov 5, 2024
cb63bbc
Lint: remove empty line
sirreal Nov 5, 2024
744cf01
Add commentary around state resetting in seek
sirreal Nov 6, 2024
6106a56
Use do-while loop to iterate after first check
sirreal Nov 6, 2024
851df38
Fix fragment parser bug that should reset breadcrumbs before seeking
sirreal Nov 6, 2024
9af204c
Assert setting bookmark two
sirreal Nov 6, 2024
660dc85
Remove condition that may not be necessary
sirreal Nov 6, 2024
9a0c017
Set parsing namespace correctly when seeking
sirreal Nov 6, 2024
d181938
Fix fragment state reset for different context nodes
sirreal Nov 6, 2024
d5bf14c
Add and improve comments
sirreal Nov 6, 2024
388bf19
Add and improve HTML processor bookmark tests
sirreal Nov 6, 2024
f7af6e3
Add test seeking from SVG namespace
sirreal Nov 6, 2024
d5a7d5c
Rework with tests for fragment and full processors
sirreal Nov 6, 2024
b95e402
Restore private stack properties to private
sirreal Nov 6, 2024
90eb6e2
Use a temporary bookmark to avoid modifying tag processor internal state
sirreal Nov 6, 2024
75ab9c2
Break at root-node when popping elements on seek
sirreal Nov 6, 2024
05ca2a4
Remove context-node cases from open elements
sirreal Nov 6, 2024
1cde425
Merge branch 'trunk' into html-api/fix-63390-seek
sirreal Nov 6, 2024
c48be70
Add test for root-node regression
sirreal Nov 11, 2024
710c5f7
Fix root-node pop regression in fragment parser seek
sirreal Nov 11, 2024
0efe691
Merge branch 'trunk' into html-api/fix-63390-seek
sirreal Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 52 additions & 20 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -5324,8 +5324,7 @@ public function seek( $bookmark_name ): bool {
*/
if ( 'backward' === $direction ) {
/*
* Instead of clearing the parser state and starting fresh, calling the stack methods
* maintains the proper flags in the parser.
* When moving backward, stateful stacks should be cleared.
*/
foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
if ( 'context-node' === $item->bookmark_name ) {
sirreal marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -5343,32 +5342,65 @@ public function seek( $bookmark_name ): bool {
$this->state->active_formatting_elements->remove_node( $item );
}

parent::seek( 'context-node' );
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
$this->state->frameset_ok = true;
$this->element_queue = array();
$this->current_element = null;
/*
* **After** clearing stacks, more processor state can be reset.
* This must be done after clearing the stack because those stacks generate events that
* would appear on a subsequent call to `next_token()`.
*/
$this->state->frameset_ok = true;
$this->state->stack_of_template_insertion_modes = array();
$this->state->head_element = null;
$this->state->form_element = null;
$this->state->current_token = null;
Comment on lines +5348 to +5352
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a reset() method to WP_HTML_Processor_State?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that but ultimately decided against it. This isn't really a full reset, and the reset behaves a bit differently between the fragment/full processors.

To simplify this, I'd probably create a new state instance and copy over the important properties. That may be worth exploring but that felt like a larger refactor.

$this->current_element = null;
$this->element_queue = array();

if ( isset( $this->context_node ) ) {
$this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
/*
* The absence of a context node indicates a full parse.
* The presence of a context node indicates a fragment parser.
*/
if ( null === $this->context_node ) {
$this->change_parsing_namespace( 'html' );
$this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_INITIAL;
$this->breadcrumbs = array();

$this->bookmarks['initial'] = new WP_HTML_Span( 0, 0 );
parent::seek( 'initial' );
unset( $this->bookmarks['initial'] );
} else {
$this->breadcrumbs = array();
}
}
$this->change_parsing_namespace(
$this->context_node->integration_node_type
? 'html'
: $this->context_node->namespace
);

if ( 'TEMPLATE' === $this->context_node->node_name ) {
$this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
}

// When moving forwards, reparse the document until reaching the same location as the original bookmark.
if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
return true;
$this->reset_insertion_mode_appropriately();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is inspired by #7348 where a fragment parser is created on arbitrary context nodes and must be initialized properly.

This is very similar, the state must be reset to the appropriate initial state for the context.

This follows the fragment parsing algorithm: https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm

$this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
parent::seek( $this->context_node->bookmark_name );
}
}

while ( $this->next_token() ) {
/*
* Here, the processor moves forward through the document until it matches the bookmark.
* do-while is used here because the processor is expected to already be stopped on
* a token than may match the bookmarked location.
*/
do {
/*
* The processor will stop on virtual tokens, but bookmarks may not be set on them.
* They should not be matched when seeking a bookmark, skip them.
*/
if ( $this->is_virtual() ) {
continue;
}
if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
while ( isset( $this->current_element ) && WP_HTML_Stack_Event::POP === $this->current_element->operation ) {
$this->current_element = array_shift( $this->element_queue );
sirreal marked this conversation as resolved.
Show resolved Hide resolved
}
return true;
}
}
} while ( $this->next_token() );

return false;
}
Expand Down
143 changes: 143 additions & 0 deletions tests/phpunit/tests/html-api/wpHtmlProcessor-bookmark.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
<?php
/**
* Unit tests covering WP_HTML_Processor bookmark functionality.
*
* @package WordPress
* @subpackage HTML-API
*/

/**
* @group html-api
*
* @coversDefaultClass WP_HTML_Processor
*/
class Tests_HtmlApi_WpHtmlProcessor_Bookmark extends WP_UnitTestCase {
/**
* @dataProvider data_processor_constructors
*
* @ticket 62290
*/
public function test_processor_seek_same_location( callable $factory ) {
$processor = $factory( '<div><span>' );
$this->assertTrue( $processor->next_tag( 'DIV' ) );
$this->assertTrue( $processor->set_bookmark( 'mark' ), 'Failed to set bookmark.' );
$this->assertTrue( $processor->has_bookmark( 'mark' ), 'Failed has_bookmark check.' );

// Confirm the bookmark works and processing continues normally.
$this->assertTrue( $processor->seek( 'mark' ), 'Failed to seek to bookmark.' );
$this->assertSame( 'DIV', $processor->get_tag() );
$this->assertSame( array( 'HTML', 'BODY', 'DIV' ), $processor->get_breadcrumbs() );
$this->assertTrue( $processor->next_tag() );
$this->assertSame( 'SPAN', $processor->get_tag() );
$this->assertSame( array( 'HTML', 'BODY', 'DIV', 'SPAN' ), $processor->get_breadcrumbs() );
}

/**
* @dataProvider data_processor_constructors
*
* @ticket 62290
*/
public function test_processor_seek_backward( callable $factory ) {
$processor = $factory( '<div><span>' );
$this->assertTrue( $processor->next_tag( 'DIV' ) );
$this->assertTrue( $processor->set_bookmark( 'mark' ), 'Failed to set bookmark.' );
$this->assertTrue( $processor->has_bookmark( 'mark' ), 'Failed has_bookmark check.' );

// Move past the bookmark so it must scan backwards.
$this->assertTrue( $processor->next_tag( 'SPAN' ) );

// Confirm the bookmark works.
$this->assertTrue( $processor->seek( 'mark' ), 'Failed to seek to bookmark.' );
$this->assertSame( 'DIV', $processor->get_tag() );
}

/**
* @dataProvider data_processor_constructors
*
* @ticket 62290
*/
public function test_processor_seek_forward( callable $factory ) {
$processor = $factory( '<div one></div><span two></span><a three>' );
$this->assertTrue( $processor->next_tag( 'DIV' ) );
$this->assertTrue( $processor->set_bookmark( 'one' ), 'Failed to set bookmark "one".' );
$this->assertTrue( $processor->has_bookmark( 'one' ), 'Failed "one" has_bookmark check.' );

// Move past the bookmark so it must scan backwards.
$this->assertTrue( $processor->next_tag( 'SPAN' ) );
$this->assertTrue( $processor->get_attribute( 'two' ) );
$this->assertTrue( $processor->set_bookmark( 'two' ), 'Failed to set bookmark "two".' );
$this->assertTrue( $processor->has_bookmark( 'two' ), 'Failed "two" has_bookmark check.' );

// Seek back.
$this->assertTrue( $processor->seek( 'one' ), 'Failed to seek to bookmark "one".' );
$this->assertSame( 'DIV', $processor->get_tag() );

// Seek forward and continue processing.
$this->assertTrue( $processor->seek( 'two' ), 'Failed to seek to bookmark "two".' );
$this->assertSame( 'SPAN', $processor->get_tag() );
$this->assertTrue( $processor->get_attribute( 'two' ) );

$this->assertTrue( $processor->next_tag() );
$this->assertSame( 'A', $processor->get_tag() );
$this->assertTrue( $processor->get_attribute( 'three' ) );
}

/**
* Ensure the parsing namespace is handled when seeking from foreign content.
*
* @dataProvider data_processor_constructors
*
* @ticket 62290
*/
public function test_seek_back_from_foreign_content( callable $factory ) {
$processor = $factory( '<custom-element /><svg><rect />' );
$this->assertTrue( $processor->next_tag( 'CUSTOM-ELEMENT' ) );
$this->assertTrue( $processor->set_bookmark( 'mark' ), 'Failed to set bookmark "mark".' );
$this->assertTrue( $processor->has_bookmark( 'mark' ), 'Failed "mark" has_bookmark check.' );

/*
* <custom-element /> has self-closing flag, but HTML elements (that are not void elements) cannot self-close,
* they must be closed by some means, usually a closing tag.
*
* If the div were interpreted as foreign content, it would self-close.
*/
$this->assertTrue( $processor->has_self_closing_flag() );
$this->assertTrue( $processor->expects_closer(), 'Incorrectly interpreted HTML custom-element with self-closing flag as self-closing element.' );

// Proceed into foreign content.
$this->assertTrue( $processor->next_tag( 'RECT' ) );
$this->assertSame( 'svg', $processor->get_namespace() );
$this->assertTrue( $processor->has_self_closing_flag() );
$this->assertFalse( $processor->expects_closer() );
$this->assertSame( array( 'HTML', 'BODY', 'CUSTOM-ELEMENT', 'SVG', 'RECT' ), $processor->get_breadcrumbs() );

// Seek back.
$this->assertTrue( $processor->seek( 'mark' ), 'Failed to seek to bookmark "mark".' );
$this->assertSame( 'CUSTOM-ELEMENT', $processor->get_tag() );
// If the parsing namespace were not correct here (html),
// then the self-closing flag would be misinterpreted.
$this->assertTrue( $processor->has_self_closing_flag() );
$this->assertTrue( $processor->expects_closer(), 'Incorrectly interpreted HTML custom-element with self-closing flag as self-closing element.' );

// Proceed into foreign content again.
$this->assertTrue( $processor->next_tag( 'RECT' ) );
$this->assertSame( 'svg', $processor->get_namespace() );
$this->assertTrue( $processor->has_self_closing_flag() );
$this->assertFalse( $processor->expects_closer() );

// The RECT should still descend from the CUSTOM-ELEMENT despite its self-closing flag.
$this->assertSame( array( 'HTML', 'BODY', 'CUSTOM-ELEMENT', 'SVG', 'RECT' ), $processor->get_breadcrumbs() );
}

/**
* Data provider.
*
* @return array
*/
public static function data_processor_constructors(): array {
return array(
'Full parser' => array( array( WP_HTML_Processor::class, 'create_full_parser' ) ),
'Fragment parser' => array( array( WP_HTML_Processor::class, 'create_fragment' ) ),
);
}
}
2 changes: 1 addition & 1 deletion tests/phpunit/tests/html-api/wpHtmlProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ public function test_clear_to_navigate_after_seeking() {

// Create a bookmark inside of that stack.
if ( null !== $processor->get_attribute( 'two' ) ) {
$processor->set_bookmark( 'two' );
$this->assertTrue( $processor->set_bookmark( 'two' ) );
break;
}
}
Expand Down
Loading