Skip to content
This repository has been archived by the owner on Apr 28, 2021. It is now read-only.

Commit

Permalink
Added fragment indexing (merged feature branch)
Browse files Browse the repository at this point in the history
  • Loading branch information
mlbrgl committed Sep 7, 2017
1 parent 07f2a6a commit 14f0a7f
Show file tree
Hide file tree
Showing 6 changed files with 638 additions and 72 deletions.
62 changes: 55 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,39 @@ This app enables [Ghost](https://ghost.org) sites owners to index their content

When you work on a story, and publish it, the content of that story is sent to Algolia's indexing engine. Any change you make to that story or its state afterwards (updating content, deleting the story or unpublishing it) is automatically synchronised with your index.

## Fragment indexing

Fragment indexing refers to breaking up an HTML document into smaller blocks (or fragments) before sending them to the indexing engine. Those fragments are generally composed of a heading (h1, h2, ...) and some text. You may read about the rationale behind fragment indexing on the [KirbyAlgolia project page](https://github.com/mlbrgl/kirby-algolia#kirby--algolia-integration).

Here is how the fragmenting engine handles the different types of fragments, in terms of when the indexing events are fired:

```
line
line
--> INDEXING (headless fragment)
# heading
line
--> INDEXING
## heading
--> INDEXING (content-less heading)
### subheading
line
line
--> INDEXING
# unlikely heading
--> INDEXING by code convenience but very little value
```

## Structure of a fragment

- `objectID`: automatically generated by Algolia (e.g. 565098020)
- `post_uuid`: automatically generated by Ghost (e.g. 8693c79d-7880-4e17-903d-7afd448e3517)
- `heading`: the heading of the fragment being indexed (e.g. My first paragraph)
- `id`: the ID of the fragment being indexed (e.g. my-new-blog-post#card-markdown--My-first-paragraph--1)
- `importance`: an integer reprensenting how deep in the article structure a fragment is located (e.g. 1). The deeper the less relevant.
- `post_title`: the title of the post being indexed (e.g. My new blog post)
- `content`: the content of the fragment being indexed (e.g. The content of the first paragraph)

# What it does not do

This app only deals with the indexing side of things. Adding the actual search widget is not part of the scope at this point. A good option to look into is [InstantSearch.js](https://community.algolia.com/instantsearch.js/v2/).
Expand All @@ -24,25 +57,40 @@ This app only deals with the indexing side of things. Adding the actual search w

2. Install dependencies by running `yarn`(recommended) or `npm install` in the `ghost-algolia` folder.

3. Locate your ghost config file (config.production.json if ghost is running in production mode) and append the algolia object to it:
3. Configure Algolia's index

Create a new API key on Algolia's dashboard. You want to make sure that the generated key has the following authorizations on your index:
- Search (search)
- Add records (addObject)
- Delete records (deleteObject)

Next add the following attributes as searcheable attributes, in the ranking tab under the "Basic settings" section:
- `post_title`
- `heading`
- `content`
- `post_uuid`

Ignore any warnings about the attributes not being found in a sample of your records, as you should not have any records at that stage yet.

Finally, add `importance` as a custom ranking attribute in the ranking tab under the "Ranking Formula & Custom Ranking" section. This will allow the tie-break algorithm to give preference to higher fragments in the document structure. In other words, h1 tags will rank higher than h2 tags if they otherwise have the same textual score.

4. Locate your ghost config file (config.production.json if ghost is running in production mode) and append the algolia object to it:

```json
"algolia": {
"active": true,
"applicationID": "[YOUR_ALGOLIA_APP_ID]",
"apiKey": "[YOUR_ALGOLIA_API_KEY]",
"index": "[YOUR_ALGOLIA_INDEX]"
}
```

*NB: you want to make sure that the generated apiKey has write access to your index. Please refer to Algolia's documentation for more information.*

4. Apply the `ghost_algolia_register_events.patch` patch found in the app download by running the following command from the ghost root:
5. Apply the `ghost_algolia_register_events.patch` patch found in the app download by running the following command from the ghost root:

```shell
patch -p1 < ./content/apps/ghost-algolia/ghost_algolia_register_events.patch
```

5. Restart ghost
6. Restart ghost


# Usage
Expand All @@ -59,6 +107,6 @@ Tested against Ghost 1.x.x releases.

# Roadmap

- Switching to [fragment indexing](https://github.com/mlbrgl/kirby-algolia#principle).
- ~~Switching to [fragment indexing](https://github.com/mlbrgl/kirby-algolia#principle).~~
- Bulk indexing existing articles.
- Upgrade to App API when available, to remove core hacking and simplify the installation process.
104 changes: 40 additions & 64 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,82 +17,58 @@
//
// module.exports = GhostAlgolia;

var algoliasearch = require('algoliasearch'),
h2p = require('html2plaintext'),
config = require('../../../current/core/server/config');
const converter = require('../../../current/core/server/utils/markdown-converter'),
indexFactory = require('./lib/indexFactory'),
parserFactory = require('./lib/parserFactory');

var GhostAlgolia = Object.create(null),
algolia = config.get('algolia');
const GhostAlgolia = {};


/*
* Register (post) events to react to admin panel actions
*/
GhostAlgolia.registerEvents = function registerEvents(events) {

if(algolia.applicationID && algolia.apiKey && algolia.index) {
// React to post being published (from unpublished)
events.on('post.published', function(post) {
let index = indexFactory();
if(index.connect() && parserFactory().parse(post, index)) {
index.add(post)
.then(() => { console.log('GhostAlgolia: post "' + post.attributes.title + '" has been added to the index.'); })
.catch((err) => console.log(err));
};
});

var client = algoliasearch(algolia.applicationID, algolia.apiKey),
index = client.initIndex(algolia.index);

// React to post being edited in a published state
events.on('post.published.edited', function(post) {
index.addObject(buildPostObject(post), function(err, content) {
if(!err) {
console.log('GhostAlgolia: post "' + post.attributes.title + '" has been updated in the index.');
}
});
});

// React to post being published (from unpublished)
events.on('post.published', function(post) {
index.addObject(buildPostObject(post), function(err, content) {
if(!err) {
console.log('GhostAlgolia: post "' + post.attributes.title + '" has been added to the index.');
}
});
});

// React to post being unpublished (from published)
events.on('post.unpublished', function(post) {
index.deleteObject(post.attributes.uuid, function(err) {
if (!err) {
console.log('GhostAlgolia: post "' + post.attributes.title + '" has been removed from the index.');
}
});
});

// React to post being deleted
events.on('post.deleted', function(post) {
// No need to try and remove a draft from the index as drafts are not sent
// for indexing in the first place.
if(post.attributes.status !== 'draft') {
index.deleteObject(post.attributes.uuid, function(err) {
if (!err) {
console.log('GhostAlgolia: post "' + post.attributes.title + '" has been removed from the index.');
}
});
// React to post being edited in a published state
events.on('post.published.edited', function(post) {
let index = indexFactory();
if(index.connect()) {
let promisePublishedEdited;
if(parserFactory().parse(post, index)) {
promisePublishedEdited = index.delete(post)
.then(() => { index.add(post) });
} else {
promisePublishedEdited = index.delete(post);
}
});
} else {
console.log('Check Algolia configuration options.')
}
}

/*
* Parse the post object and only expose certain attributes to Algolia.
*/
function buildPostObject(post) {
var postToIndex = Object.create(null);
promisePublishedEdited
.then(() => { console.log('GhostAlgolia: post "' + post.attributes.title + '" has been updated in the index.'); })
.catch((err) => console.log(err));
};
});

postToIndex.objectID = post.attributes.uuid;
postToIndex.title = post.attributes.title;
postToIndex.slug = post.attributes.slug;
// @TODO parse plaintext attribute in order to index paragraphs
// (vs cramming the entire content into one record)
postToIndex.text = h2p(post.attributes.html);
// React to post being unpublished (from published)
// Also handles deletion of published posts as the unpublished event is emitted
// before the deleted event which becomes redundant. Deletion of unpublished posts
// is of no concern as they never made it to the index.
events.on('post.unpublished', function(post) {
let index = indexFactory();
if(index.connect()) {
index.delete(post)
.then(() => { console.log('GhostAlgolia: post "' + post.attributes.title + '" has been removed from the index.'); })
.catch((err) => console.log(err));
};
});

return postToIndex;
}

module.exports = GhostAlgolia;
43 changes: 43 additions & 0 deletions lib/indexFactory.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
const algoliaSearch = require('algoliasearch'),
config = require('../../../../current/core/server/config'),
algoliaSettings = config.get('algolia');


const indexFactory = () => {
let _fragments = [];
let index;

return {
connect: () => {
if(algoliaSettings && algoliaSettings.active === true) {
if(algoliaSettings.applicationID && algoliaSettings.apiKey && algoliaSettings.index) {
let client = algoliaSearch(algoliaSettings.applicationID, algoliaSettings.apiKey);
index = client.initIndex(algoliaSettings.index);
return true;
} else {
// TODO better error output on frontend
console.log('Please check your Algolia for a missing configuration option: applicationID, apiKey, index.');
return false;
}
} else {
console.log('Algolia indexing deactivated.')
}
},
addFragment: (fragment) => {
if(fragment.content !== undefined || fragment.heading !== undefined) {
_fragments.push(fragment);
}
},
hasFragments: () => {
return _fragments.length > 0;
},
add: (post) => {
return index.addObjects(_fragments);
},
delete: (post) => {
return index.deleteByQuery(post.attributes.uuid, {restrictSearchableAttributes: 'post_uuid'});
},
}
}

module.exports = indexFactory;
56 changes: 56 additions & 0 deletions lib/parserFactory.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
const parse = require('markdown-to-ast').parse,
removeMd = require('remove-markdown');
slug = require('slug');

const parserFactory = () => {
return {

// Returns true if any fragments have been successfully parsed
parse: (post, index) => {
let fragment = {},
headingCount = 0;

const markdown = JSON.parse(post.attributes.mobiledoc).cards[0][1].markdown,
astChildren = parse(markdown).children;

if(astChildren.length !== 0) {
// Set first hypothetical headless fragment attributes.
if(astChildren[0].type !== 'Header') {
fragment.id = post.attributes.slug;
fragment.importance = 0; // we give a higher importance to the intro
// aka the first headless fragment.
fragment.post_uuid = post.attributes.uuid;
fragment.post_title = post.attributes.title;
fragment.post_published_at = post.attributes.published_at;
}

astChildren.forEach(function(element){
if(element.type === 'Header') {
// Send previous fragment
index.addFragment(fragment);

fragment = {};
headingCount ++;
fragment.heading = element.children[0].value;
fragment.id = post.attributes.slug + '#card-markdown--' + slug(fragment.heading) + '--' + headingCount;
fragment.importance = element.depth;
fragment.post_uuid = post.attributes.uuid;
fragment.post_title = post.attributes.title;
fragment.post_published_at = post.attributes.published_at;
} else {
if(fragment.content === undefined) fragment.content = '';
fragment.content += removeMd(element.raw) + '\n';
}
});

// Saving the last fragment (as saving only happens as a new heading
// is found). This also takes care of heading-less articles.
index.addFragment(fragment);
}

return index.hasFragments();
}
}
}

module.exports = parserFactory;
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
"dependencies": {
"algoliasearch": "^3.24.0",
"ghost-app": "0.0.2",
"html2plaintext": "^1.1.1"
"markdown-to-ast": "^4.0.0",
"remove-markdown": "^0.2.2",
"slug": "^0.9.1"
}
}
Loading

0 comments on commit 14f0a7f

Please sign in to comment.