You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cursor disconnect RemoteSolrException Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search
#490
Open
WolfgangFahl opened this issue
Jul 8, 2020
· 1 comment
total=$rows
downloadWithCursor $rows $index ""
while [ $totalRows -lt $total ]
do
target=$sampledir/crossref-$index.json
status=$(jq '.status' $target | tr -d '"')
total=$(jq '.message["total-results"]' $target)
# get and remove quotes from cursor
cursor=$(jq '.message["next-cursor"]' $target | tr -d '"')
startindex=$(jq '.message.query["start-index"]' $target)
perpage=$(jq '.message["items-per-page"]' $target)
index=$[$index+1]
if [ "$status" == "ok" ]
then
totalRows=$[$totalRows+$rows]
else
# force while exit
totalRows=1
total=0
# remove invalid
mv $target $target.err
fi
echo "status: $status index: $index $totalRows of $total startindex: $startindex perpage=$perpage cursor:$cursor"
if [ $totalRows -lt $total ]
then
# wait a bit
sleep 2
downloadWithCursor $rows $index "$cursor"
fi
done
cat $sampledir/crossref-.json | jq .message.items[].title | cut -f2 -d'[' | cut -f2 -d'"' | grep -v "]" | tr -s '\n' > $sampledir/proceedings-crossref.txt
}
`
I run into a similar issue:
{
"status": "error",
"message-type": "exception",
"message-version": "1.0.0",
"message": {
"name": "class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException",
"description": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",
"message": "Error from server at http:\/\/mds3:8984\/solr\/crmds1: Unable to parse 'cursorMark' after totem: value must either be '*' or the 'nextCursorMark' returned by a previous search: AoJ4 NDNi\/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xNzc1OC9laXJhaTU=",
jq . *.err | grep "search:" | cut -f7 -d:
gives me:
value must either be '*' or the 'nextCursorMark' returned by a previous search
AoJ7o 7Hk/ECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzI1MzA1NDQ=",
value must either be '*' or the 'nextCursorMark' returned by a previous search
AoJ3pL 1svECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4xMTQ1LzExMzg5NTM=",
value must either be '*' or the 'nextCursorMark' returned by a previous search
AoJ teyWtfECPwJodHRwOi8vZHguZG9pLm9yZy8xMC4zMTE1LzEyMjU3MzM=",
value must either be '*' or the 'nextCursorMark' returned by a previous search
AoJx6NyU0 8CPwhodHRwOi8vZHguZG9pLm9yZy8xMC4xMDYxLzk3ODA3ODQ0ODEwMTE="
so i suspect the space in the token is the issue.
Please update the documentation of what kind of encoding you expect or better fix the upstream library to use tokens that need no encoding (do not use spaces). Also improving the error message and point to the FAQ would be helpful.
To close this issue please let me know whether my space assumption is right and replacing space with "+" will fix the problem.
The text was updated successfully, but these errors were encountered:
#427 already points to an issue with cursors. With my script:
`#
download from crossref RESTful API via cursor
downloadWithCursor() {
local l_rows="$1"
local l_index="$2"
local l_cursor="$3"
target=$sampledir/crossref-$l_index.json
src="https://api.crossref.org/types/proceedings/works?select=event,title,DOI&rows=$l_rows&cursor=$l_cursor"
download $src $target
}
get Crossref data
see also https://github.com/TIBHannover/confIDent-dataScraping
getCrossRef() {
rows=1000
index=1
totalRows=0
force while entry
total=$rows
downloadWithCursor $rows $index ""
while [ $totalRows -lt $total ]
do
target=$sampledir/crossref-$index.json
status=$(jq '.status' $target | tr -d '"')
total=$(jq '.message["total-results"]' $target)
# get and remove quotes from cursor
cursor=$(jq '.message["next-cursor"]' $target | tr -d '"')
startindex=$(jq '.message.query["start-index"]' $target)
perpage=$(jq '.message["items-per-page"]' $target)
index=$[$index+1]
if [ "$status" == "ok" ]
then
totalRows=$[$totalRows+$rows]
else
# force while exit
totalRows=1
total=0
# remove invalid
mv $target $target.err
fi
echo "status: $status index: $index $totalRows of $total startindex: $startindex perpage=$perpage cursor:$cursor"
if [ $totalRows -lt $total ]
then
# wait a bit
sleep 2
downloadWithCursor $rows $index "$cursor"
fi
done
cat $sampledir/crossref-.json | jq .message.items[].title | cut -f2 -d'[' | cut -f2 -d'"' | grep -v "]" | tr -s '\n' > $sampledir/proceedings-crossref.txt
}
`
I run into a similar issue:
jq . *.err | grep "search:" | cut -f7 -d:
gives me:
value must either be '*' or the 'nextCursorMark' returned by a previous search
so i suspect the space in the token is the issue.
Please update the documentation of what kind of encoding you expect or better fix the upstream library to use tokens that need no encoding (do not use spaces). Also improving the error message and point to the FAQ would be helpful.
To close this issue please let me know whether my space assumption is right and replacing space with "+" will fix the problem.
The text was updated successfully, but these errors were encountered: