Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

Closed
code-geek opened this issue Feb 5, 2024 · 1 comment · Fixed by #630
Closed

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

code-geek opened this issue Feb 5, 2024 · 1 comment · Fixed by #630
Assignees

Comments

@code-geek
Copy link
Contributor

code-geek commented Feb 5, 2024

Description

I got it working by mapping the fileext to this value pattern : ParseFromLastSep($.url_list.url, ".") defaulting to htm when empty (edited)

this is going to break in most cases though

because for a URL like https://power.larc.nasa.gov/static/publications/, the result of that value pattern will be gov/static/publications/

one option is to first parse from the last / , then from the last . and again defaulting to htm but I'll leave the details to you (edited)

another detail of note

I don't remember what it looked like yesterday, but I see today the $.url_list.title value from the API is coming back with the fileext already included if the file is a pdf

that's fine, but I recommend making that a conscious decision and adhering to it as convention

because I was initially using this pattern for the filename: Concat($.url_list.title, fileext) which would break if the title already had the extension in it

but would work if it didn't, in the case of real HTML webpages, for example

another recommendation regarding mappings the JSON connector does not do automatically for you

the URL you go to when you click on a search result is whatever is mapped to URL1

so right now, if you click a search result for the JSON test collection, you go nowhere

you just have to add a mapping for URL1 like you would for id $.url_list.url

in summary, do the following:
map fileext to the file at the source (pdf, htm, etc)
map filename to a filename that matches the extension (usually title+fileext)
remove the file extension filters you have in the XML right now
and I recommend this additional step:
map URL1 so you can link out to the original source page

also keep in mind you may need to recycle the app pools on IIS in order to flush cached previews when testing this

@code-geek
Copy link
Contributor Author

Reported as done. @CarsonDavis can you confirm?

@code-geek code-geek reopened this Feb 22, 2024
@CarsonDavis CarsonDavis transferred this issue from another repository Feb 22, 2024
@CarsonDavis CarsonDavis linked a pull request Feb 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants