Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[search] Search in downloader by country names. #14081

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mpimenov
Copy link
Contributor

@mpimenov mpimenov commented Dec 23, 2020

https://jira.mail.ru/browse/MAPSME-11158

When a request to search in downloader arrives, we used to
only find features on the world map that match the request
and return the mwms that contain these features.

This commit mixes in the results of search directly in
the country tree (countries.txt), or, to be more precise, by
the translations of the names of the countries there (countries_names.txt).

This is not the most efficient implementation but hopefully
it isolated enough to make improvements easy and it was also
useful as an exploration where our current search APIs are lacking, for example

  • The unnecessary std::string<->UniString conversions.
  • Indexes such as MemSearchIndex pretending to be generic while in fact being
    tailored to a particular use-case.
  • The difficulty of mixing search results from different sources.

map/search_api.cpp Outdated Show resolved Hide resolved
search/CMakeLists.txt Outdated Show resolved Hide resolved
@mpimenov mpimenov force-pushed the palestine branch 4 times, most recently from ea6f4b6 to c35c388 Compare January 24, 2021 11:26
@mpimenov mpimenov force-pushed the palestine branch 2 times, most recently from 70f4a7d to e8e650c Compare March 12, 2021 10:05
@mpimenov mpimenov marked this pull request as ready for review March 12, 2021 10:09
When a request to search in downloader arrives, we used to
only find features on the world map that match the request
and return the mwms that contain these features.

This commit mixes in the results of search directly in
the country tree (countries.txt), or, to be more precise, by
the translations of the names of the countries there (countries_names.txt).

This is not the most efficient implementation but hopefully
it isolated enough to make improvements easy and it was also
useful as an exploration where our current search APIs are lacking, for example

* The unnecessary std::string<->UniString conversions.
* Indexes such as MemSearchIndex pretending to be generic while in fact being
  tailored to a particular use-case.
* The difficulty of mixing search results from different sources.
@@ -89,6 +92,10 @@ class Result
// For Type::SuggestFromFeature.
Result(Result const & res, std::string const & suggest);

// For Type::DownloaderEntry.
Result(storage::CountryId const & countryId, std::string const & matchedName,
bool /* to distinguish from Type::PureSuggest */);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

как вариант, конструкторы могут принимать Type чтобы не угадывать по списку параметров что мы создаем

if (pos == string::npos)
continue;
// Ignore the language code: the language sets differ for StringUtf8Multilang
// and for the translations used by this class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

после добавления в StringUtf8Multilang норвежского и индонезийского языки на которые переведны страны стали подмножеством языков в StringUtf8Multilang, но я не против игнорить, по крайней мере пока нет жалоб что работает не так ха-ха))

Retrieve<strings::LevenshteinDFA>(token, insertId);
}

// todo(@m) Do not bother with tf/idf for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

можно без tf/idf но хотя бы кол-во заматченых токенов учитывать

например:

cat data/countries_names.txt |grep Северная
  ru = Северная территория
  ru = Северная Ютландия
  ru = Северная Финляндия
  ru = Северная Голландия — Алкмар
  ru = Северная Голландия — Амстердам
  ru = Северная Голландия — Занстад
  ru = Северная Каролина
  ru = Северная Дакота
  ru = Северная Осетия
  ru = Северная Швеция
  ru = Северная Ирландия

если я правильно поняла, сейчас можно набрать в поиске "северная осетия", а в результатах ее не увидеть

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

но независимо от реализации я за то чтобы это было не в этом реквесте

storage::DownloaderSearchResult downloaderResult(countryId,
result.GetString() /* m_matchedName */);
if (uniqueResults.find(downloaderResult) == uniqueResults.end())
if (result.GetResultType() == search::Result::Type::LatLon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else вместо result.GetResultType() == search::Result::Type::LatLon ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

или наоборот result.GetResultType() == search::Result::Type::LatLon и остальное в else?

@pataquets
Copy link

@mpimenov see #14157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants