Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGSTACK-843 page indexing implementation from fina + FieldMapper impl… #10

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
152f1c7
NGSTACK-843 page indexing implementation from fina + FieldMapper impl…
Apr 15, 2024
33a0839
NGSTACK-843 improved configuration builder and added tests for page i…
Apr 19, 2024
214f60e
NGSTACK-843 fix service definition
Apr 22, 2024
65e3818
NGSTACK-843 add router stub for tests
Apr 22, 2024
6e0bcf9
NGSTACK-843 move RouterStub to tests/lib
Apr 22, 2024
ac162c6
NGSTACK-843 fix refactor mistake wrong file loaded
Apr 22, 2024
b8b5f64
NGSTACK-843 add missing parameters for tests
Apr 22, 2024
a6f1d46
NGSTACK-843 fix typo
Apr 22, 2024
ccfacdb
NGSTACK-843 use Symfony HttpClient instead of curl
Apr 22, 2024
5632762
NGSTACK-843 throw RuntimeException if RouterStub methods are used
Apr 22, 2024
497a329
NGSTACK-843 renaming files and seperate document factory and page ind…
Apr 25, 2024
3a5d0ef
NGSTACK-843 use page_indexing.enabled instead of use_page_indexing pa…
Apr 25, 2024
6f3976b
NGSTACK-843 configuration builder modifications
Apr 25, 2024
6f13134
NGSTACK-843 fix field mapper docs
Apr 25, 2024
e7589b8
NGSTACK-843 add renamed compiler passes
Apr 25, 2024
28c2082
NGSTACK-843 command improvements
Apr 25, 2024
e497dda
NGSTACK-843 PageTextExtractor fixes
Apr 25, 2024
fe380a7
NGSTACK-843 remove unused use statements
Apr 25, 2024
4c9a90d
NGSTACK-843 add missing use statement
Apr 25, 2024
f9ce3aa
NGSTACK-843 introducing abstract PageTextIndexer
Apr 25, 2024
6dc18e4
NGSTACK-843 add new line at the end of file
Apr 25, 2024
bca6ae4
NGSTACK-843 configuration modifications
May 9, 2024
8849dbf
NGSTACK-843 fix test configuration
May 9, 2024
890d577
NGSTACK-843 move getSiteConfigForContent method to new service SiteAc…
May 9, 2024
c9cb281
NGSTACK-843 solr FieldMapper fix
May 9, 2024
aa257c2
NGSTACK-843 remove not used service and use statement
May 9, 2024
9e2818f
NGSTACK-843 fix configuration info messages
May 9, 2024
d7b6022
NGSTACK-843 composer.json remove config block
May 9, 2024
8b29ba1
NGSTACK-843
May 9, 2024
4f2964b
NGSTACK-843 remove unnecessary code
May 9, 2024
dd4a7eb
NGSTACK-843 set content-ids as optional
May 9, 2024
453cc61
NGSTACK-843 move command to bundle
May 9, 2024
2db38c7
NGSTACK-843 indexContent method change return type to void
May 9, 2024
f5e28b1
NGSTACK-843 fix command service definition
May 9, 2024
c072b6a
NGSTACK-843 output info about site that is processed
May 9, 2024
cc55f5a
NGSTACK-843 renamed PageTextExtractor to NativePageTextExtractor
May 9, 2024
189be79
NGSTACK-843 moved siteConfig resolver line below cache checks
May 9, 2024
3d15cd0
NGSTACK-843 rename SiteAccessConfigResolver to SiteConfigResolver
May 9, 2024
451b901
NGSTACK-843 remove not used variable from method signature
May 9, 2024
4fd0001
NGSTACK-843 remove unnecessary code
May 9, 2024
e4db8b1
NGSTACK-843
May 10, 2024
db86bac
NGSTACK-843 use symfony style in command
May 10, 2024
f7258c9
NGSTACK-843 remove unused lines
May 10, 2024
e5248c7
NGSTACK-843 php-cs-fixer apply rules from site api and run on edited …
May 10, 2024
94529a4
NGSTACK-843 add documentation for page indexing
Aug 7, 2024
3e8626b
NGSTACK-843 fix
Aug 7, 2024
fb6f609
NGSTACK-843 fix documentation formating
Sep 10, 2024
665b705
NGSTACK-843 remove document factory from the documentation (seperated…
Sep 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .php-cs-fixer.php
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,17 @@
'method_chaining_indentation' => false,
'multiline_whitespace_before_semicolons' => false,
'native_function_invocation' => ['include' => ['@all']],
'no_superfluous_phpdoc_tags' => false,
'no_superfluous_phpdoc_tags' => true,
'no_unset_on_property' => false,
'ordered_imports' => ['imports_order' => ['class', 'function', 'const'], 'sort_algorithm' => 'alpha'],
'php_unit_internal_class' => false,
'php_unit_test_case_static_method_calls' => ['call_type' => 'self'],
'php_unit_test_class_requires_covers' => false,
'phpdoc_align' => false,
'phpdoc_types_order' => ['null_adjustment' => 'always_last', 'sort_algorithm' => 'none'],
'phpdoc_no_alias_tag' => ['replacements' => ['type' => 'var', 'link' => 'see']],
'single_line_comment_style' => false,
'trailing_comma_in_multiline' => ['elements' => ['arrays', 'arguments']],
'trailing_comma_in_multiline' => ['elements' => ['arrays', 'arguments', 'match', 'parameters']],
'yoda_style' => false,
'php_unit_strict' => false,
'php_unit_test_annotation' => false,
Expand All @@ -50,7 +51,8 @@
'static_lambda' => true,
'ternary_to_null_coalescing' => true,
'use_arrow_functions' => true,
])
'no_alias_language_construct_call' => true,
])
->setRiskyAllowed(true)
->setFinder($finder)
;
;
174 changes: 174 additions & 0 deletions bundle/Command/IndexPageContentCommand.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
<?php

declare(strict_types=1);

namespace Netgen\Bundle\IbexaSearchExtraBundle\Command;

use Ibexa\Contracts\Core\Persistence\Handler as PersistenceHandler;
use Ibexa\Contracts\Core\Repository\ContentService;
use Ibexa\Contracts\Core\Repository\Exceptions\InvalidArgumentException;
use Ibexa\Contracts\Core\Repository\Exceptions\NotFoundException;
use Ibexa\Contracts\Core\Repository\Exceptions\UnauthorizedException;
use Ibexa\Contracts\Core\Repository\Values\Content\Content;
use Ibexa\Contracts\Core\Repository\Values\Content\ContentList;
use Ibexa\Contracts\Core\Repository\Values\Content\Query;
use Ibexa\Contracts\Core\Repository\Values\Filter\Filter;
use Ibexa\Contracts\Core\Search\Handler as SearchHandler;
use Netgen\IbexaSearchExtra\Exception\IndexPageUnavailableException;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Helper\ProgressBar;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

use function count;
use function explode;

class IndexPageContentCommand extends Command
pspanja marked this conversation as resolved.
Show resolved Hide resolved
{
protected static $defaultName = 'netgen-search-extra:index-page-content';
private SymfonyStyle $style;

/**
* @param array<string, mixed> $sitesConfig
*/
public function __construct(
private readonly ContentService $contentService,
private readonly SearchHandler $searchHandler,
private readonly PersistenceHandler $persistenceHandler,
private readonly array $sitesConfig,
) {
parent::__construct($this::$defaultName);
}

protected function configure(): void
{
$this
->setDescription('Index content related through layouts')
->addOption(
'content-ids',
null,
InputOption::VALUE_OPTIONAL,
'Comma separated list of content id\'s of content to index.',
);
}

protected function initialize(InputInterface $input, OutputInterface $output): void
{
$this->style = new SymfonyStyle($input, $output);
}

/**
* @throws NotFoundException
* @throws InvalidArgumentException
* @throws UnauthorizedException
*/
protected function execute(InputInterface $input, OutputInterface $output): int
{
foreach ($this->sitesConfig as $site => $siteConfig) {
$this->style->info('Indexing for site ' . $site);
$this->indexContent($output, $input, $siteConfig);
}

return Command::SUCCESS;
}

private function indexContent(OutputInterface $output, InputInterface $input, array $siteConfig): void
{
$contentIds = explode(',', $input->getOption('content-ids'));

$allowedContentTypes = $siteConfig['allowed_content_types'];
$offset = 0;
$limit = 50;
$totalCount = $this->getTotalCount($allowedContentTypes, $contentIds);
$progressBar = new ProgressBar($output, $totalCount);

if ($totalCount <= 0) {
$this->style->info('No content found to index, exiting.');

return;
}

$this->style->info('Found ' . $totalCount . ' content objects...');

$progressBar->start($totalCount);

while ($offset < $totalCount) {
$chunk = $this->getChunk($limit, $offset, $allowedContentTypes, $contentIds);

$this->processChunk($chunk, $progressBar);

$offset += $limit;
}

$progressBar->finish();

$output->writeln('');
$this->style->info('Finished.');
}

/**
* @throws InvalidArgumentException
*/
private function getTotalCount(array $allowedContentTypes, array $contentIds): int
{
$filter = $this->getFilter($allowedContentTypes, $contentIds);

$filter
->withLimit(0)
->withOffset(0);

return $this->contentService->find($filter)->getTotalCount() ?? 0;
}

/**
* @throws InvalidArgumentException
*/
private function getChunk(int $limit, int $offset, array $allowedContentTypes, array $contentIds): ContentList
{
$filter = $this->getFilter($allowedContentTypes, $contentIds);
$filter
->withLimit($limit)
->withOffset($offset)
;

return $this->contentService->find($filter);
}

private function getFilter(array $allowedContentTypes, array $contentIds = []): Filter
{
$filter = new Filter();
$filter->withCriterion(new Query\Criterion\ContentTypeIdentifier($allowedContentTypes));

if (count($contentIds) > 0) {
$filter->andWithCriterion(new Query\Criterion\ContentId($contentIds));
}

return $filter;
}

private function processChunk(ContentList $contentList, ProgressBar $progressBar): void
{
foreach ($contentList->getIterator() as $content) {
try {
$this->indexContentWithLocations($content);
$progressBar->advance();
} catch (IndexPageUnavailableException $exception) {
$this->style->error($exception->getMessage());
}
}
}

private function indexContentWithLocations(Content $content): void
{
$this->searchHandler->indexContent(
$this->persistenceHandler->contentHandler()->load($content->id, $content->versionInfo->versionNo),
);

$locations = $this->persistenceHandler->locationHandler()->loadLocationsByContent($content->id);
foreach ($locations as $location) {
$this->searchHandler->indexLocation($location);
}
}
}
94 changes: 94 additions & 0 deletions bundle/DependencyInjection/Configuration.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
use Symfony\Component\Config\Definition\Builder\TreeBuilder;
use Symfony\Component\Config\Definition\ConfigurationInterface;

use function array_keys;
use function is_string;

class Configuration implements ConfigurationInterface
{
protected string $rootNodeName;
Expand All @@ -25,6 +28,7 @@ public function getConfigTreeBuilder(): TreeBuilder
$this->addIndexableFieldTypeSection($rootNode);
$this->addSearchResultExtractorSection($rootNode);
$this->addAsynchronousIndexingSection($rootNode);
$this->addPageIndexingSection($rootNode);

return $treeBuilder;
}
Expand Down Expand Up @@ -73,4 +77,94 @@ private function addAsynchronousIndexingSection(ArrayNodeDefinition $nodeDefinit
->end()
->end();
}

private function addPageIndexingSection(ArrayNodeDefinition $nodeDefinition): void
{
$keyValidator = static function ($v) {
foreach (array_keys($v) as $key) {
if (!is_string($key)) {
return true;
}
}

return false;
};
$nodeDefinition
->children()
->arrayNode('page_indexing')
->addDefaultsIfNotSet()
->info('Page indexing configuration')
->children()
->booleanNode('enabled')
->info('Use page indexing')
->defaultFalse()
->end()
->arrayNode('sites')
->useAttributeAsKey('name')
->normalizeKeys(false)
->validate()
->ifTrue($keyValidator)
->thenInvalid('Site name must be of string type')
->end()
->arrayPrototype()
->children()
->integerNode('tree_root_location_id')
->info('Site root Location ID')
->beforeNormalization()->always(static fn ($v) => is_string($v) ? (int) $v : $v)->end()
->end()
->arrayNode('languages_siteaccess_map')
->info('Language code mapped to page siteaccess')
->useAttributeAsKey('name')
->normalizeKeys(false)
->validate()
->ifTrue($keyValidator)
->thenInvalid('Language code must be of string type.')
->end()
->scalarPrototype()
->validate()
->ifTrue(static fn ($v) => !is_string($v))
->thenInvalid('Siteaccess name must be of string type.')
->end()
->end()
->end()
->arrayNode('fields')
->info('Mapping of indexed field names to an array of HTML tag selectors')
->validate()
->ifTrue($keyValidator)
->thenInvalid('Indexed field name must be of string type')
->end()
->arrayPrototype()
->useAttributeAsKey('name')
->normalizeKeys(false)
->scalarPrototype()
->validate()
->ifTrue(static fn ($v) => !is_string($v))
->thenInvalid('HTML selector must be of string type.')
->end()
->end()
->end()
->end()
->arrayNode('allowed_content_types')
->info('Content types to index')
->useAttributeAsKey('name')
->normalizeKeys(false)
->scalarPrototype()
->validate()
->ifTrue(static fn ($v) => !is_string($v))
->thenInvalid('Content type identifier must be of string type.')
->end()
->end()
->end()
->scalarNode('host')
->info('Host to index page from, defined in .env files')
->validate()
->ifTrue(static fn ($v) => !is_string($v))
->thenInvalid('Host must be of string type.')
->end()
->end()
->end()
->end()
->end()
->end();
}
}
Loading
Loading