MappIt is an application that allows users to discover new places to visit and share their adventures and experiences, helping creators to promote their contents on the community.
MappIt is an application based on the client-server paradigm. We designed and implemented in detail the server part.
MappIt was deployed on a cluster of three servers, in which each of them was in charge of a different part of the service.
In particular we had:
- server A: which run the Java backend of the service and was part of the MongoDB cluster
- server B: which run the Neo4j server and was part of the MongoDB cluster
- server C: which run the data population periodic scripts and was part of the MongoDB cluster
The main entities of MappIt are:
- user
- place
- post
- activity
There are different kind of relations between these three entities, and some attributes of the entities are stored only on the document database (MongoDB) while some other information are stored only in the graph database (Neo4j).
In the following it is reported a schema of the entities and relations declared in the graph database:
In this section we analyze queries that requires to be handled as they involve both the databases.
We designed data flow schemas in the cases of successfull or failed operations for each instruction, always aiming at preserving a state of consistency for the data.
In addition to perform an automatic attempt of consistency recovery, we decided to log errors into an
errors.txt
file, allowing administrators to manually check and enforce consistency and restored the nominal state.
In the following are reported some queries we performed over the database to get interesting overviews of the data and extract information.
Description:
this query selects the most appreciated posts, in terms of likes received, in a period between two dates and filtering by an activity.
Mandatory parameters: fromDate, toDate
Optional parameters: activity name and maximum number of posts to return
Java method: PostService.getPopularPosts
db.post.aggregate([
{$match:
{activity:{$in:["activityName"]}},
},
{$match:
{postDate:{$gte:"fromDate", $lt: "toDate" }}
},
{$sort:{likes:-1}},
{$project:
{
_id:0,
title:1,
authorUsername:1,
placeName:1,
desc:1,
postDate:1
}
},
{$limit: "howManyResults"}
])
Domain query:
What are the most visited places, between the ones visited by the followings of a specified user?
Graph-centric query:
Considering U as all the User vertices that have an incoming edge “FOLLOWS” from a specific User vertex, select Place vertices that have an incoming edge “VISITED” from U vertices. Then count the incoming “VISITED” edges for each of those places.
Equivalent query in Cypher:
MATCH (u:User{username:$username})-[f:FOLLOWS]->(followings:User)-[v:VISITED]
->(p:Place)
WITH p.id AS id, p.name AS place, count(v) AS visitTimes
ORDER BY visitTimes DESC
LIMIT $howManyResults
RETURN id, place, visitTimes
Domain query:
Makes suggestions about new posts in the same places to check, basing on users’ liked posts and ordering by number of likes
Graph-centric query:
Considering P as the Post vertices with an incoming edge “LIKES” from a specific User vertex, select PL as the Place vertices that have an incoming edge “LOCATION” from P. Then considering the Post vertices with an outgoing edge “LOCATION” from PL, count the incoming “LIKES” edges and sort Posts by this value.
Equivalent query in Cypher:
MATCH (u:User{username:$username})-[:LIKES]->(p:Post)-[:LOCATION]->(pl:Place)
WITH DISTINCT pl AS places, COLLECT(p) AS likedPosts
MATCH (:User)-[l:LIKES]->(sp:Post WHERE NOT(sp IN likedPosts))-[:LOCATION]->(places)
WITH DISTINCT sp AS suggestedPosts, COUNT(l) AS likeReceived
ORDER BY likeReceived DESC
RETURN suggestedPosts.id, likeReceived, suggestedPosts.title
Since MappIt is a service that exploits two kinds of databases the data population procedures must be in charge also of handling the storage of information in the two systems.
The creation of a new place, for example, not only consists in inserting a document in MongoDB, but also a node in Neo4j, while the generation of social relations mainly consists in accessing the Neo4j entities. By the way there are some redundancies, like for example the total likes counter, that are cross-database.
Those redundancies were designed in order to improve the execution time of frequent database operations, but can introduce inconsitencies of the data.
In order to restore eventual inconsistencies that could be present, we implemented the redundancies updater procedure, which is responsible to update the redundancies counters that we inserted in the documents of some entities in Mongo like the field “likes” in the Post documents, the field “followers” in the User documents or the fields “favourites” and “totalLikes” in the Place documents.
The update of these redundancies is needed only when a user, all the posts of a user, or a post are deleted from the application, because the consistency in this case is demanded to this procedure, to prevent too much load on the server for the entities’ deletion.
This procedure is scheduled periodically.
We analyzed deeper and more broadly the aforementioned aspects and even others, like:
- databases queries analysis by means of the Operation Frequency Table
- Indexes on certain collections and documents attributes to improve performances of certain frequent queries
- Redundant fields in documents to improve queries performances in terms of executionStats
- Database sharding: we proposed a database sharding based on the country code of places and users in order to grant higher service availability
- Java application packages organization
- Java application databases connection handling
- Service endpoints
- Application use cases
- Functional requirements
- Non-Functional requirements
- Data population service
Have a look at the full project documentation at this link