Skip to content
This repository has been archived by the owner on Jul 10, 2019. It is now read-only.

v0.1.0 Logging bot #1

Merged
merged 4 commits into from
Dec 25, 2013
Merged

v0.1.0 Logging bot #1

merged 4 commits into from
Dec 25, 2013

Conversation

shrikrishnaholla
Copy link
Member

This pull request adds the following features

  1. Base architecture for the project. Find more details in the wiki page on architecture
  2. Adds a few adapters and libraries to be used in the future (Ex: ElasticSearch and turk.js)
  3. Adds a logger bot that stores every message in files and retrieves the last n messages when queried with the syntax !logs <number> or !log <number>
  4. Logs are stored on a regularly; ie, by default, each month's logs go to a separate file. Implemented using a scheduling library

@sandeepraju
Copy link
Member

@shrikrishnaholla Lets write test cases. Most of us ignore the importance of test cases. If we start writing test cases early in the project, it makes sure that there are minimal bugs as we move along and it helps in preventing regression bugs.

@phalgun
Copy link
Member

phalgun commented Dec 23, 2013

I agree with with Sandeep.

I gave it a quick run through and it looks good.
Usually, logs are archived as a tarball after a certain period. In your scheduler maybe we could archive them to save memory. But then, getting the last 'n' logs would have to decompress the tarball if required.

And, ship it!

@sathyamvellal
Copy link
Member

@phalgun When me and @shrikrishnaholla were discussing about this a few days back, we thought using a cloud storage which has an API (for example, Dropbox) to place the old logs. What we had thought of was to have the most recent logs (say of the last 7 days) in memory/db. Everything is logged to a file but only the last 7 days' logs are kept fresh. Very old logs would be pushed to a cloud storage and each log file has a set of metadata (keywords, discussion topics which are mined, etc) and this also adds to some metadata which is stored in a db about every log file.
So if the user wants some very old log file, use the API to fetch exact log file and give it to the user.

To summarize, the storage goes something like this -

  1. Have the most recent in memory/db (recent = past 7 days)
  2. Have some more in files.
  3. Push very old files to a cloud storage.

@shrikrishnaholla hoping that this hasn't changed.
@phalgun and @sandeepraju how does this sound?

@shrikrishnaholla
Copy link
Member Author

@sathyamvellal search is not implemented in 0.1 . But yeah, the plan is to store the mined metadata of complete history in elasticsearch indices (it is recommended to keep the number of indexed documents in elasticsearch as small as possible) and as @phalgun said, maybe archive very past history, keeping recent history in files. Cloud storage may not be advisable because the network latency while downloading the stored documents would be very expensive.
@sandeepraju I completely agree. Am going through Mocha which is supposed to be the best testing framework for node
Another issue is that currently, every message is generating a separate write operation to a file. Even with a lower messages/minute, this is slow. Is a buffer mechanism preferrable? (Buffer incoming messages when busy and write into file when idle). This would make searching more complicated, but vastly improve performance. Any suggestions to implementing this?

@phalgun
Copy link
Member

phalgun commented Dec 24, 2013

I wouldn't suggest cloud storage either except for periodic backups of archived logs.

My 2 cents - since #pes-os gets very less traffic unless there are meetings scheduled, using an complicated storage structure might be an overkill. Memory management can be worked on when there is a need to do so.

As with the other issue, I think using a buffer is a good idea and its implementation would be simple too. To tackle search inconsistencies, maybe we could write the buffer into the DB/file/ES before starting the search operation. But then it depends on the frequency of search or list commands we get.

@akshayms
Copy link
Member

@shrikrishnaholla https://github.com/shrikrishnaholla I don't think a
buffer is required here. Since this is in Node, make sure that all the
requests are written to disk by a non-blocking write operation and have an
async callback that can continue the work from there. The application will
not block incoming requests even if a previous write is incomplete and it
will automatically queue it up.

But if we really want to use a buffer, how about providing an option for
the user to set the buffer size for a later release? By default we can
start it with no buffer and immediate writes to db/disk and the user should
have an option to set the buffer size like "-b 20M" while starting the
application. But the problem with this is that if there's some exception
somewhere that crashes the application or if the server which it is hosted
on itself goes down, then you lose everything in the buffer unless you have
some redundancies in place.

On Tue, Dec 24, 2013 at 9:24 AM, Phalgun Guduthur
[email protected]:

I wouldn't suggest cloud storage either except for periodic backups of
archived logs.

My 2 cents - since #pes-os gets very less traffic unless there are
meetings scheduled, using an complicated storage structure might be an
overkill. Memory management can be worked on when there is a need to do so.

As with the other issue, I think using a buffer is a good idea and its
implementation would be simple too. To tackle search inconsistencies, maybe
we could write the buffer into the DB/file/ES before starting the search
operation. But then it depends on the frequency of search or list commands
we get.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-31157299
.

Regards,
Akshay

default:
break;
};
message = '[' + (new Date()).toISOString() + ']:\t' + from + ':\t' + message + '\n';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be made more generic so that the format can be picked up from the config file as per the user wish. We cannot assume here that everyone needs logs in this format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3deec14

@sandeepraju
Copy link
Member

@shrikrishnaholla @akshayms @phalgun @sathyamvellal couple of things based on this pull request,

  1. There is a clear need to abstract out the config part of the entire application. When someone wants to set this up, they should ideally be able to edit one config file and setup. (inline comments)
  2. Don't think too much into log storage. Right now we can follow the default process of writing to a file. The process of archival and searching should be a separate module and only that module should bother about tar-ing, uploading to whatever fancy storage services we need. Right now lets write it into file in a non-blocking fashion and move on. (imho). All these elastic search and stuff should be independent and pluggable if required. Lets not do any in memory, db stuff right now. Lets get this up in iterations.
  3. +1 on mocha. I haven't used any testing framework in js until now but i've heard http://pivotal.github.io/jasmine/ is pretty good as well. Evaluate both and pick the best.. 😄

@sandeepraju
Copy link
Member

and to get some ideas on log archival & quick searching we can draw some inspiration by looking at how phenny handles it.

@akshayms
Copy link
Member

@sandeepraju +1 The default process of writing into a file with non-blocking calls should suffice.

And let's think of a design where we could support plugins for Optimus at a later stage - Users should be able to customize and add special plugins to the codebase. For example, if someone wants to check the latest 5 commits on pesos, it should be written as a plugin to Optimus than a core feature.

@shrikrishnaholla
Copy link
Member Author

@akshayms @sandeepraju The file writes are implemented in async mode itself. I suggested buffers because while testing it, I found that, if there are a lot of comments coming in, and in between someone requests the logs, there is a huge delay. This is because each message generates a file write, and since there are a lot of messages coming in, the operations are queued, and the log command generates a read operation which joins this queue - it has to wait till all previous operations are complete. One way to avoid file write delay was to use streams. I tried using them in two approaches.

  1. Create a readable and a writable stream and pipe between them. Dropped it because didn't have any way to exactly get last n lines
  2. Create a stream to write into the file and use the tail -n to read last n lines. Found that the writestream buffers writes until the stream is closed.
    So, switched back to the good old write-read implementation

@akshayms
Copy link
Member

Can't we maintain the last 200 messages in memory? That shouldn't take too
long and with the right data structure, printing the last n lines will be
trivial.
On Dec 24, 2013 9:11 PM, "Shrikrishna Holla" [email protected]
wrote:

@akshayms https://github.com/akshayms @sandeeprajuhttps://github.com/sandeeprajuThe file writes are implemented in async mode itself. I suggested buffers
because while testing it, I found that, if there are a lot of comments
coming in, and in between someone requests the logs, there is a huge delay.
This is because each message generates a file write, and since there are a
lot of messages coming in, the operations are queued, and the log command
generates a read operation which joins this queue - it has to wait till all
previous operations are complete. One way to avoid file write delay was to
use streams. I tried using them in two approaches.

  1. Create a readable and a writable stream and pipe between them. Dropped
    it because didn't have any way to exactly get last n lines
  2. Create a stream to write into the file and use the tail -n to read
    last n lines. Found that the writestream buffers writes until the stream is
    closed.
    So, switched back to the good old write-read implementation


Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-31175564
.

@shrikrishnaholla
Copy link
Member Author

Yep. Only, instead of some number like 200, we can keep them in a buffer in memory when there is a lot of activity (say, during meetings) and write them in bulk when idle. It's okay if someone asks for logs when the channel is idle and the bot has to fetch it from file. A single read/write operation doesn't take a lot of time. The problem occurs when there are a lot of operations queued up. As for detecting whether the channel is busy or idle, I have no idea how to implement it (yet). But seems the best method to tackle this problem.

1. Adds a configuration file that replaces previously hardcoded values with configuration values that are picked up by relevant modules (ref: pesos#1 (comment) , pesos#1 (comment), pesos#1 (comment) )
2. Separates scheduler from the main util module and adds one more (mkfilenamer)
3. Interaction between view and controller is made event driven. Previously was callback driven.
4. Elasticsearch is commented out because it is not being used in v0.1.0 anyway
@shrikrishnaholla
Copy link
Member Author

Can we merge this pull request? Tests #2 , Tail #3 and Paste #4 are added as issue and can be merged separately

shrikrishnaholla added a commit that referenced this pull request Dec 25, 2013
@shrikrishnaholla shrikrishnaholla merged commit 90e2314 into pesos:dev Dec 25, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants