SearchX is a scalable collaborative search system being developed by Lambda Lab of TU Delft. It is based on Pienapple Search and is further developed to facilitate collaborative search and sensemaking. SearchX includes features that enable crowdsourced user studies on collaborative search, and is easily extensible for new research.
The backend is responsible for fetching search requests to the search provider and managing the application's data. It is built on NodeJS and exposes its endpoints through express (API) and socket.io (Websockets). Use it together with the SearchX Front End to get a web-based collaborative search interface.
These instructions are for Ubuntu Linux. The steps can be adapted for all major platforms.
-
Install NodeJS (at least version 8.0)
sudo apt install npm // Check if node is installed which node
-
Install MongoDB:
Execute the four steps of the MongoDB installation instructions
// Check if MongoDB is running mongo // You should see the mongo client connect to the MongoDB server and show its version number. // Exit the client using: > exit
-
Install Redis
sudo apt install redis-server // Start Redis redis-server // Check if Redis is running redis-cli > PING // Should return PONG > QUIT
-
Set up the server
// Clone the repository git clone https://github.com/felipemoraes/searchx-backend.git // Change directory to repository cd searchx-backend // Install dependencies npm install // Copy example configuration cp .env.example .env
-
Choose a search provider You can choose between one of the three search providers for which SearchX has included provider modules:
The Elasticsearch provider is the easiest to setup with your own dataset, while the Indri provider supports more advanced features such as relevance feedback. The Bing provider is suitable for web search, but requires a (paid) Bing API key. Please see the sections linked for each provider on how to configure and use them. The Bing provider is suitable for web search. If you wish to use another search provider, please see the custom search providers section below.
-
Run the server
// Start the development server npm run start // If you get any errors connecting to MongoDB or Redis they may be running on a different // port, instructions for changing the port are in the configuration section below. // Check if API is running (curl or through browser) curl http://localhost:4443
You can install the supported search providers as follows. See the configuration section for how to configure which search provider is used by default.
Execute the Elasticsearch installation instructions.
- Execute the node-indri installation instructions.
- Copy the built node-indri module from
build/Release/node-indri
inside your node-indri folder tolib/node-indri
inside your searchx-backend folder (you need to create the lib and node-indri folders first).
SearchX requires a Bing API key to use the Bing web search provider.
Once you have a Bing API key, you can paste it into your .env
file under the key BING_ACCESS_KEY
. Be careful not to check the key into version control, since this may lead to abuse if the key leaks.
// Search
[address]/v1/search/[vertical]/?query=[query]&page=[pageNumber]&provider=[provider]
- address: address set in the configuration file (testUrl:PORT)
- vertical: search vertical to use, as specified by search provider, eg. (web, images, videos, news) for bing
- userId: the identifier for the user that is issuing this API call
- sessionId: the identifier for the session that the this API call belongs to
- query: query string
- page: page number
- providerName [optional]: the search provider to use (elasticsearch, indri, bing), defaults to DEFAULT_SEARCH_PROVIDER if unset
- relevanceFeedback [optional, false by default]: whether to use relevance feedback (false, individual, shared)
- distributionOfLabour [optional, false by default]: whether to use distribution of labour (false, unbookmarkedSoft, unbookmarkedOnly)
The main production configuration keys can be set in the .env
file, example values can be found in .example.env
. These keys are:
NODE_ENV
: the node environment (production or development)PORT
: the port server will run onDB
: the database urlREDIS
: the redis server urlDEFAULT_SEARCH_PROVIDER
: the search provider that is used by default if the provider url parameter of the API is not setBING_ACCESS_KEY
(optional): the API access key for when the Bing search provider is usedELASTICSEARCH_URI
(optional): the Elasticsearch urlSUGGESTIONS_TYPE
(optional, defaul=none): choose from bing, indri, or none. Indri makes a suffix-prefix lookup.
Further development configuration can be found inside app/config/config.js
:
module.exports = {
outDir: './out',
testDb: 'mongodb://localhost/searchx-test',
testUrl: 'http://localhost',
cacheFreshness: 3600,
scrapFreshness: 60 * 60 * 24
};
The tests require that the Elasticsearch search provider is installed.
// Load the test dataset into elasticsearch
./node_modules/elasticdump/bin/elasticdump --input=test/data/test_index_mapping.json --output=http://localhost:9200/cranfield --type=mapping
./node_modules/elasticdump/bin/elasticdump --input=test/data/test_index.json --output=http://localhost:9200/cranfield --type=data
// Run tests
npm test
SearchX can be extended to define tasks, and to support new providers for search results.
Tasks define extra functionality that can be used in the frontend for user studies, for example placing users in groups according to predefined criteria, giving them search instructions, and asking them questions on what they found. Two example tasks have been added in app/services/session/tasks/
:
exampleGroupAsync.js
is a basic example of a task that can be performed by a group. When a new user requests this task, they enter into a group and can try to start solving a search puzzle. When more new users request the task, they join the same group (until it is full) and can collaborate in solving the puzzle. This example is asynchronous, since it users do not need to search at the same time. Please note that the front-end part of this task contains more components to form the complete task (e.g. submitting the answer to the puzzle), but the task specification on the backend is not concerned with them, because they are handled by the standard logging functionality of the backend. See the frontend documentation for the complete task description.exampleGroupSync.js
is a more elaborate example that shows how tasks can be used to for synchronous collaboration. After a user has completed a pre-test and needs to be assigned to a group, the frontend calls thepushSyncsubmit
socket (seeapp/api/controllers/socket/session.js
for the entry point), which causes thehandleSyncSubmit
function in the example to be called. Users are assigned to groups in a similar fashion to the async example, but the groups are stored in a database to ensure each topic is assigned to a group once. Also, the user is not assigned a task until the group has reached the required number of members, so they have to wait until the group is filled, causing the task to be synchronous. When the group is assigned a task, the socket is used to notify all other group members allowing them to start the task. The notification is automatically handled by SearchX's session management, so the task code only needs to mark the group as modified.
You can modify these tasks as follows:
-
Increasing group size
- For the async example the
MAX_MEMBERS
constant inexampleGroupAsync.js
can be changed. - For the sync example the group size is defined by the frontend.
- For the async example the
-
Adding new puzzles or topics The puzzles for the asynchronous example are defined in
app/services/session/tasks/data/topics.json
. Learning topics for the synchronous example are defined insideapp/services/session/tasks/data/topics.json
. To add a new topic, you can add a new entry to these json files.
To define a new task in the backend, you can add a new service inside app/services/session/tasks/
and then change app/services/session/index.js
to serve the task description from the new service.
Three search provider services are included: Elasticsearch, Indri, and Bing. These services can be found in app/services/search/providers/
, and can serve as example of how to implement new search providers. New search providers can be implemented by adding a service in to the same folder, and adding it to the provider mapping in app/services/search/provider.js
. The set of possible verticals and number of results per page can be defined as desired by the provider implementation. The provider service must implement the fetch(query, vertical, pageNumber, resultsPerPage, relevanceFeedbackDocuments)
function, which must return a promise that resolves to an object containing the results if retrieving the search results is successful. The resultsPerPage
and relevanceFeedbackDocuments
can be ignored if the provider does not support these functions, see the bing provider for an example of how to handle this case by throwing errors for unsupported values.
The object containing the results needs to have the following fields:
{ matches: <number of matches>,
results: [
<result>,
...
]}
The data structure of the <result>
depends on the result type, which is defined by the component that will be used to display the result in the frontend. See the searchx-frontend documentation for an explanation of how to add custom result types.
The included result types are (fields preceded by <OPTIONAL>
are optional):
{
name: <name of the result>,
url: <full url>,
displayUrl: <url formatted for display>,
snippet: <part of text to display on search engine results page>
}
{
name: <name of the image>,
url: <full url>,
thumbnailUrl: <url of the thumbnail to display for this image>
}
{
name: <name of the video>,
thumbnailUrl: <url of the thumbnail to display for this result>,
publisher: [
{name: <name of the first publisher of this video>}
...
],
viewCount: <number of times this video has been viewed (integer)>,
<OPTIONAL> creator: {name: <name of the creator of this video>},
}
{
name: <name of the news article>,
url: <full url>,
datePublished: <date the article has been published (in format compatible with Date() constructor)>,
description: <description of the article to display on search engine results page>,
provider: [
{name: <name of the first news provider that published this story>}
...
],
<OPTIONAL> image: {thumbnail: {contentUrl: <url of the thumbnail to display for this result>}}
}
id: <unique identifier of document>,
name: <name of the document>,
date: <date the document was published (in format compatible with Date() constructor)>,
source: <name of the publisher of this document>,
snippet: <part of text to display on search engine results page>,
text: <full document text>
If you use SearchX to produce results for your scientific publication, please refer to our SIGIR 2018 paper.
@inproceedings{putra2018searchx,
title={SearchX: Empowering Collaborative Search Research.},
author={Putra, Sindunuraga Rikarno and Moraes, Felipe and Hauff, Claudia},
booktitle={SIGIR},
pages={1265--1268},
year={2018}
}
@article{moraes2019impact,
title={On the impact of group size on collaborative search effectiveness},
author={Moraes, Felipe and Grashoff, Kilian and Hauff, Claudia},
journal={Information Retrieval Journal},
pages={1--23},
year={2019},
publisher={Springer}
}
@inproceedings{moraes2019node,
title={node-indri: moving the Indri toolkit to the modern Web stack},
author={Moraes, Felipe and Hauff, Claudia},
booktitle={ECIR},
pages={241--245},
year={2019}
}
@inproceedings{moraes2018contrasting,
title={Contrasting Search as a Learning Activity with Instructor-designed Learning},
author={Moraes, Felipe and Putra, Sindunuraga Rikarno and Hauff, Claudia},
booktitle={CIKM},
pages={167--176},
year={2018}
}
@inproceedings{putra2018development,
title={On the Development of a Collaborative Search System},
author={Putra, Sindunuraga Rikarno and Grashoff, Kilian and Moraes, Felipe and Hauff, Claudia},
booktitle={DESIRES},
pages={76--82},
year={2018}
}
MIT License