v0.1.0 Logging bot #1

shrikrishnaholla · 2013-12-22T18:56:38Z

This pull request adds the following features

Base architecture for the project. Find more details in the wiki page on architecture
Adds a few adapters and libraries to be used in the future (Ex: ElasticSearch and turk.js)
Adds a logger bot that stores every message in files and retrieves the last n messages when queried with the syntax !logs <number> or !log <number>
Logs are stored on a regularly; ie, by default, each month's logs go to a separate file. Implemented using a scheduling library

sandeepraju · 2013-12-23T05:15:37Z

@shrikrishnaholla Lets write test cases. Most of us ignore the importance of test cases. If we start writing test cases early in the project, it makes sure that there are minimal bugs as we move along and it helps in preventing regression bugs.

phalgun · 2013-12-23T05:44:20Z

I agree with with Sandeep.

I gave it a quick run through and it looks good.
Usually, logs are archived as a tarball after a certain period. In your scheduler maybe we could archive them to save memory. But then, getting the last 'n' logs would have to decompress the tarball if required.

And, ship it!

sathyamvellal · 2013-12-24T02:34:47Z

@phalgun When me and @shrikrishnaholla were discussing about this a few days back, we thought using a cloud storage which has an API (for example, Dropbox) to place the old logs. What we had thought of was to have the most recent logs (say of the last 7 days) in memory/db. Everything is logged to a file but only the last 7 days' logs are kept fresh. Very old logs would be pushed to a cloud storage and each log file has a set of metadata (keywords, discussion topics which are mined, etc) and this also adds to some metadata which is stored in a db about every log file.
So if the user wants some very old log file, use the API to fetch exact log file and give it to the user.

To summarize, the storage goes something like this -

Have the most recent in memory/db (recent = past 7 days)
Have some more in files.
Push very old files to a cloud storage.

@shrikrishnaholla hoping that this hasn't changed.
@phalgun and @sandeepraju how does this sound?

shrikrishnaholla · 2013-12-24T03:27:39Z

@sathyamvellal search is not implemented in 0.1 . But yeah, the plan is to store the mined metadata of complete history in elasticsearch indices (it is recommended to keep the number of indexed documents in elasticsearch as small as possible) and as @phalgun said, maybe archive very past history, keeping recent history in files. Cloud storage may not be advisable because the network latency while downloading the stored documents would be very expensive.
@sandeepraju I completely agree. Am going through Mocha which is supposed to be the best testing framework for node
Another issue is that currently, every message is generating a separate write operation to a file. Even with a lower messages/minute, this is slow. Is a buffer mechanism preferrable? (Buffer incoming messages when busy and write into file when idle). This would make searching more complicated, but vastly improve performance. Any suggestions to implementing this?

phalgun · 2013-12-24T03:54:50Z

I wouldn't suggest cloud storage either except for periodic backups of archived logs.

My 2 cents - since #pes-os gets very less traffic unless there are meetings scheduled, using an complicated storage structure might be an overkill. Memory management can be worked on when there is a need to do so.

As with the other issue, I think using a buffer is a good idea and its implementation would be simple too. To tackle search inconsistencies, maybe we could write the buffer into the DB/file/ES before starting the search operation. But then it depends on the frequency of search or list commands we get.

akshayms · 2013-12-24T04:25:01Z

@shrikrishnaholla https://github.com/shrikrishnaholla I don't think a
buffer is required here. Since this is in Node, make sure that all the
requests are written to disk by a non-blocking write operation and have an
async callback that can continue the work from there. The application will
not block incoming requests even if a previous write is incomplete and it
will automatically queue it up.

But if we really want to use a buffer, how about providing an option for
the user to set the buffer size for a later release? By default we can
start it with no buffer and immediate writes to db/disk and the user should
have an option to set the buffer size like "-b 20M" while starting the
application. But the problem with this is that if there's some exception
somewhere that crashes the application or if the server which it is hosted
on itself goes down, then you lose everything in the buffer unless you have
some redundancies in place.

On Tue, Dec 24, 2013 at 9:24 AM, Phalgun Guduthur
[email protected]:

I wouldn't suggest cloud storage either except for periodic backups of
archived logs.

My 2 cents - since #pes-os gets very less traffic unless there are
meetings scheduled, using an complicated storage structure might be an
overkill. Memory management can be worked on when there is a need to do so.

As with the other issue, I think using a buffer is a good idea and its
implementation would be simple too. To tackle search inconsistencies, maybe
we could write the buffer into the DB/file/ES before starting the search
operation. But then it depends on the frequency of search or list commands
we get.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-31157299
.

Regards,
Akshay

sandeepraju · 2013-12-24T05:47:34Z

controller/index.js

+	default:
+		break;
+	};
+	message = '[' + (new Date()).toISOString() + ']:\t' + from + ':\t' + message + '\n';


This should be made more generic so that the format can be picked up from the config file as per the user wish. We cannot assume here that everyone needs logs in this format.

Fixed in 3deec14

sandeepraju · 2013-12-24T05:59:28Z

@shrikrishnaholla @akshayms @phalgun @sathyamvellal couple of things based on this pull request,

There is a clear need to abstract out the config part of the entire application. When someone wants to set this up, they should ideally be able to edit one config file and setup. (inline comments)
Don't think too much into log storage. Right now we can follow the default process of writing to a file. The process of archival and searching should be a separate module and only that module should bother about tar-ing, uploading to whatever fancy storage services we need. Right now lets write it into file in a non-blocking fashion and move on. (imho). All these elastic search and stuff should be independent and pluggable if required. Lets not do any in memory, db stuff right now. Lets get this up in iterations.
+1 on mocha. I haven't used any testing framework in js until now but i've heard http://pivotal.github.io/jasmine/ is pretty good as well. Evaluate both and pick the best.. 😄

sandeepraju · 2013-12-24T06:05:41Z

and to get some ideas on log archival & quick searching we can draw some inspiration by looking at how phenny handles it.

akshayms · 2013-12-24T06:09:01Z

@sandeepraju +1 The default process of writing into a file with non-blocking calls should suffice.

And let's think of a design where we could support plugins for Optimus at a later stage - Users should be able to customize and add special plugins to the codebase. For example, if someone wants to check the latest 5 commits on pesos, it should be written as a plugin to Optimus than a core feature.

shrikrishnaholla · 2013-12-24T15:41:52Z

@akshayms @sandeepraju The file writes are implemented in async mode itself. I suggested buffers because while testing it, I found that, if there are a lot of comments coming in, and in between someone requests the logs, there is a huge delay. This is because each message generates a file write, and since there are a lot of messages coming in, the operations are queued, and the log command generates a read operation which joins this queue - it has to wait till all previous operations are complete. One way to avoid file write delay was to use streams. I tried using them in two approaches.

Create a readable and a writable stream and pipe between them. Dropped it because didn't have any way to exactly get last n lines
Create a stream to write into the file and use the tail -n to read last n lines. Found that the writestream buffers writes until the stream is closed.
So, switched back to the good old write-read implementation

akshayms · 2013-12-24T15:50:40Z

Can't we maintain the last 200 messages in memory? That shouldn't take too
long and with the right data structure, printing the last n lines will be
trivial.
On Dec 24, 2013 9:11 PM, "Shrikrishna Holla" [email protected]
wrote:

@akshayms https://github.com/akshayms @sandeeprajuhttps://github.com/sandeeprajuThe file writes are implemented in async mode itself. I suggested buffers
because while testing it, I found that, if there are a lot of comments
coming in, and in between someone requests the logs, there is a huge delay.
This is because each message generates a file write, and since there are a
lot of messages coming in, the operations are queued, and the log command
generates a read operation which joins this queue - it has to wait till all
previous operations are complete. One way to avoid file write delay was to
use streams. I tried using them in two approaches.

Create a readable and a writable stream and pipe between them. Dropped
it because didn't have any way to exactly get last n lines

Create a stream to write into the file and use the tail -n to read
last n lines. Found that the writestream buffers writes until the stream is
closed.
So, switched back to the good old write-read implementation

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-31175564
.

shrikrishnaholla · 2013-12-24T16:00:43Z

Yep. Only, instead of some number like 200, we can keep them in a buffer in memory when there is a lot of activity (say, during meetings) and write them in bulk when idle. It's okay if someone asks for logs when the channel is idle and the bot has to fetch it from file. A single read/write operation doesn't take a lot of time. The problem occurs when there are a lot of operations queued up. As for detecting whether the channel is busy or idle, I have no idea how to implement it (yet). But seems the best method to tackle this problem.

1. Adds a configuration file that replaces previously hardcoded values with configuration values that are picked up by relevant modules (ref: pesos#1 (comment) , pesos#1 (comment), pesos#1 (comment) ) 2. Separates scheduler from the main util module and adds one more (mkfilenamer) 3. Interaction between view and controller is made event driven. Previously was callback driven. 4. Elasticsearch is commented out because it is not being used in v0.1.0 anyway

shrikrishnaholla · 2013-12-25T07:36:09Z

Can we merge this pull request? Tests #2 , Tail #3 and Paste #4 are added as issue and can be merged separately

v0.1.0 Logging bot

shrikrishnaholla added 3 commits December 21, 2013 20:57

Echo bot

ba46530

v0.1.0 Logging bot. Returns last n messages in a channel

4460e41

Added trailing line to ends of files

a47d992

sandeepraju reviewed Dec 24, 2013
View reviewed changes

This was referenced Dec 25, 2013

Implement Tail functionality #3

Open

Implement Buffer mechanism #5

Open

shrikrishnaholla added a commit that referenced this pull request Dec 25, 2013

Merge pull request #1 from shrikrishnaholla/dev

90e2314

v0.1.0 Logging bot

shrikrishnaholla merged commit 90e2314 into pesos:dev Dec 25, 2013

shrikrishnaholla deleted the dev branch December 27, 2013 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0 Logging bot #1

v0.1.0 Logging bot #1

shrikrishnaholla commented Dec 22, 2013

sandeepraju commented Dec 23, 2013

phalgun commented Dec 23, 2013

sathyamvellal commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

phalgun commented Dec 24, 2013

akshayms commented Dec 24, 2013

sandeepraju Dec 24, 2013

shrikrishnaholla Dec 25, 2013

sandeepraju commented Dec 24, 2013

sandeepraju commented Dec 24, 2013

akshayms commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

akshayms commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

shrikrishnaholla commented Dec 25, 2013

v0.1.0 Logging bot #1

v0.1.0 Logging bot #1

Conversation

shrikrishnaholla commented Dec 22, 2013

sandeepraju commented Dec 23, 2013

phalgun commented Dec 23, 2013

sathyamvellal commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

phalgun commented Dec 24, 2013

akshayms commented Dec 24, 2013

sandeepraju Dec 24, 2013

Choose a reason for hiding this comment

shrikrishnaholla Dec 25, 2013

Choose a reason for hiding this comment

sandeepraju commented Dec 24, 2013

sandeepraju commented Dec 24, 2013

akshayms commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

akshayms commented Dec 24, 2013

shrikrishnaholla commented Dec 24, 2013

shrikrishnaholla commented Dec 25, 2013