Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

code-geek · 2024-02-05T19:26:03Z

Description

I got it working by mapping the fileext to this value pattern : ParseFromLastSep($.url_list.url, ".") defaulting to htm when empty (edited)

this is going to break in most cases though

because for a URL like https://power.larc.nasa.gov/static/publications/, the result of that value pattern will be gov/static/publications/

one option is to first parse from the last / , then from the last . and again defaulting to htm but I'll leave the details to you (edited)

another detail of note

I don't remember what it looked like yesterday, but I see today the $.url_list.title value from the API is coming back with the fileext already included if the file is a pdf

that's fine, but I recommend making that a conscious decision and adhering to it as convention

because I was initially using this pattern for the filename: Concat($.url_list.title, fileext) which would break if the title already had the extension in it

but would work if it didn't, in the case of real HTML webpages, for example

another recommendation regarding mappings the JSON connector does not do automatically for you

the URL you go to when you click on a search result is whatever is mapped to URL1

so right now, if you click a search result for the JSON test collection, you go nowhere

you just have to add a mapping for URL1 like you would for id $.url_list.url

in summary, do the following:
map fileext to the file at the source (pdf, htm, etc)
map filename to a filename that matches the extension (usually title+fileext)
remove the file extension filters you have in the XML right now
and I recommend this additional step:
map URL1 so you can link out to the original source page

also keep in mind you may need to recycle the app pools on IIS in order to flush cached previews when testing this

code-geek · 2024-02-22T20:28:11Z

Reported as done. @CarsonDavis can you confirm?

code-geek assigned bishwaspraveen Feb 5, 2024

code-geek assigned CarsonDavis and unassigned bishwaspraveen Feb 19, 2024

code-geek closed this as completed Feb 22, 2024

code-geek reopened this Feb 22, 2024

CarsonDavis transferred this issue from another repository Feb 22, 2024

CarsonDavis linked a pull request Feb 22, 2024 that will close this issue

add document_type and file_extension to the json template #630

Merged

bishwaspraveen closed this as completed in #630 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

code-geek commented Feb 5, 2024 •

edited by CarsonDavis

Loading

code-geek commented Feb 22, 2024

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

Make sure pdf/doc files etc are indexed correctly via the JSON v2 connector #628

Comments

code-geek commented Feb 5, 2024 • edited by CarsonDavis Loading

Description

code-geek commented Feb 22, 2024

code-geek commented Feb 5, 2024 •

edited by CarsonDavis

Loading