Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31774 Implement API logging, hiding, deleting and querying global messages #18650

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

shamser
Copy link
Contributor

@shamser shamser commented May 14, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31774

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@shamser shamser changed the title HPCC-31774 Implement API add and query sys errors from dali HPCC-31774 Implement API logging, hiding, deleting and querying global messages May 14, 2024
@shamser shamser marked this pull request as ready for review May 14, 2024 16:00
@shamser shamser requested a review from jakesmith May 14, 2024 16:01
@AttilaVamos
Copy link
Contributor

@shamser unit test crashed with segfault. Core trace:

Core was generated by `/opt/HPCCSystems/bin/unittests -e CSysInfoLoggerTester'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f371d40d72b in deleteOlderThanLogSysInfoMsg(bool, bool, unsigned int, unsigned int, unsigned int) () from /opt/HPCCSystems/lib/libsysinfologger.so
[Current thread is 1 (Thread 0x7f372b69c300 (LWP 19183))]

Thread 3 (Thread 0x7f371b0fb700 (LWP 19186)):
#0  0x00007f372d4790e3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f37302f1e32 in CSocketEpollThread::run() () from /opt/HPCCSystems/lib/libjlib.so
#2  0x00007f373031db71 in Thread::begin() () from /opt/HPCCSystems/lib/libjlib.so
#3  0x00007f373031cbe9 in Thread::_threadmain(void*) () from /opt/HPCCSystems/lib/libjlib.so
#4  0x00007f372e179ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f372d478b0d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f371c0fd700 (LWP 19184)):
#0  0x00007f372e17fb3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f372e17fbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00007f372e17fc6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00007f37302dfc65 in Semaphore::wait(unsigned int) () from /opt/HPCCSystems/lib/libjlib.so
#4  0x00007f3732b3d720 in CMPNotifyClosedThread::run() () from /opt/HPCCSystems/lib/libmp.so
#5  0x00007f373031db71 in Thread::begin() () from /opt/HPCCSystems/lib/libjlib.so
#6  0x00007f373031cbe9 in Thread::_threadmain(void*) () from /opt/HPCCSystems/lib/libjlib.so
#7  0x00007f372e179ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f372d478b0d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f372b69c300 (LWP 19183)):
#0  0x00007f371d40d72b in deleteOlderThanLogSysInfoMsg(bool, bool, unsigned int, unsigned int, unsigned int) () from /opt/HPCCSystems/lib/libsysinfologger.so
#1  0x00007f371d411f76 in CSysInfoLoggerTester::testSysInfoLogger() () from /opt/HPCCSystems/lib/libsysinfologger.so
#2  0x00007f3735c4e102 in CppUnit::TestCaseMethodFunctor::operator()() const () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#3  0x00007f3735c446a0 in CppUnit::DefaultProtector::protect(CppUnit::Functor const&, CppUnit::ProtectorContext const&) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#4  0x00007f3735c4b2c8 in CppUnit::ProtectorChain::protect(CppUnit::Functor const&, CppUnit::ProtectorContext const&) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#5  0x00007f3735c560d2 in CppUnit::TestResult::protect(CppUnit::Functor const&, CppUnit::Test*, std::string const&) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#6  0x00007f3735c4ddf3 in CppUnit::TestCase::run(CppUnit::TestResult*) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#7  0x00007f3735c4e3c3 in CppUnit::TestComposite::doRunChildTests(CppUnit::TestResult*) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#8  0x00007f3735c4e449 in CppUnit::TestComposite::run(CppUnit::TestResult*) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#9  0x00007f3735c55743 in CppUnit::TestResult::runTest(CppUnit::Test*) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#10 0x00007f3735c5806f in CppUnit::TestRunner::run(CppUnit::TestResult&, std::string const&) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#11 0x00007f3735c59fa2 in CppUnit::TextTestRunner::run(std::string, bool, bool, bool) () from /opt/HPCCSystems/lib/libcppunit-1.15.so.1
#12 0x0000000000447a1d in ?? ()
#13 0x00007f372d39c555 in __libc_start_main () from /lib64/libc.so.6
#14 0x000000000044bb22 in ?? ()

@shamser shamser marked this pull request as draft May 15, 2024 13:01
@shamser shamser marked this pull request as ready for review May 15, 2024 14:39
@shamser shamser marked this pull request as draft May 16, 2024 09:03
@shamser shamser marked this pull request as ready for review May 16, 2024 09:25
@shamser shamser marked this pull request as draft May 17, 2024 11:20
@shamser shamser force-pushed the issue31774 branch 5 times, most recently from 991cd08 to fe55f89 Compare May 21, 2024 11:46
@shamser shamser marked this pull request as ready for review May 21, 2024 11:48
@shamser shamser force-pushed the issue31774 branch 4 times, most recently from 63b8534 to cd8ecf6 Compare May 22, 2024 12:32
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good overall. I've added some initial comments. Let's discuss on Tuesday.

common/sysinfologger/sysinfologger.cpp Outdated Show resolved Hide resolved
common/sysinfologger/sysinfologger.cpp Outdated Show resolved Hide resolved
#ifdef _USE_CPPUNIT
#include "unittests.hpp"

#define SOURCE_CPPUNIT "cppunit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking in terms of jlog fields, is source akin to component ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source – the component that generated the exception

I see from re-reading the JIRA it is - I think we should consolidate on calling this "component"
[ with a value of "sysinfologger-unittest" here perhaps ]

std::atomic_bool initialized {false};
CriticalSection crit;

void daliClientInit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume these (this and daliClientEnd()) are the same or very similar as the ones in dalitests.cpp..
It may be worth commoning them up in unittests.*

dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
@shamser shamser force-pushed the issue31774 branch 4 times, most recently from 4e4c400 to e360243 Compare May 29, 2024 15:58
@shamser shamser requested a review from jakesmith May 29, 2024 15:58
@shamser
Copy link
Contributor Author

shamser commented May 29, 2024

@jakesmith Please note comment regarding using message node name in the form: msg## and unit tests accessing dali.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good in general. Please see comments. Most significant is the discussion re. avoiding SDS inefficiencies per new message added. 2 connects should dispel most concerns.

@@ -3034,4 +3035,233 @@ class CFileNameNormalizeUnitTest : public CppUnit::TestFixture, CDfsLogicalFileN
CPPUNIT_TEST_SUITE_REGISTRATION( CFileNameNormalizeUnitTest );
CPPUNIT_TEST_SUITE_NAMED_REGISTRATION( CFileNameNormalizeUnitTest, "CFileNameNormalizeUnitTest" );

#define SOURCE_COMPONENT "sysinfologger-unittest"
#define BOOL_STR(b) (b?"true":"false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

therre's already an inline function in jilb - boolToStr

dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dayPT->addProp(ATTR_VERSION, SYS_INFO_VERSION);
}
dayPT->setPropInt(ATTR_LASTID, ++id);
dayPT->addPropTree(MSG_NODE, CSysInfoLoggerMsg::createMsgPTree(id, cat, code, source, msg, ts, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to relate more directly to a CSysInfoLoggerMsg, it feels like this should be something like:

CSysInfoLoggerMsg msg;
msg.set(id, cat, code, source, msg, ts, false));
dayPT->addPropTree(MSG_NODE, msg.getTree());

virtual const char * queryMsg() const = 0;
};

interface ISysInfoLoggerMsgFilter : public IInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be worth considering splitting this into IConstSystInfoLoggerMsgFilter (with the reader only methods), and a derived ISysInfoLoggerMsgFilter, that include the setters.

e.g. then CSysInfoLoggerMsgIterator can use the IConstSystInfoLoggerMsgFilter version.

extractDate(queryTimeStamp(), year, month, day);
VStringBuffer xpath("m%04u%02u/d%02u", year, month, day);
IPropertyTree * parent = root->queryPropTree(xpath.str());
parent->removeTree(msgPtree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root->removeProp(xpath);

is more succinct, and guards against the possility it doesn't exist, which shouldn't ever happen, but may want to log a warning if returns false.

dali/base/sysinfologger.cpp Show resolved Hide resolved
dali/base/sysinfologger.cpp Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
@@ -186,6 +186,7 @@ int main(int argc, const char *argv[])
excludeNames.append("*stress*");
excludeNames.append("*timing*");
excludeNames.append("*slow*");
excludeNames.append("CSysInfoLoggerTester"); // disabled by default as dali not available when executed by smoketest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding "Dali" to test name, then adding dali here (need to check other unittests follow same naming convention).
Then all dali tests can be excluded from github action unittests, but OBT can run with -a and run them.

StringBuffer xpath;
unsigned year, month, day;
extractDate(ts, year, month, day);
xpath.appendf("m%04u%02u/d%02u[" ATTR_VERSION "='%s']", year, month, day, SYS_INFO_VERSION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have still have a nagging concern that this is going to cause some SDS pain/inefficiency.
It will mean that each new message, will cause every /SysLogs/mYYYYMM to be downloaded, and every dDD unter the month.

Now, if there aren't a huge amount, or not that frequently added, then it's not too bad, but it would still be best to avoid.

instead could create a 2nd connection that creates the final node:

    xpath.appendf("%s/m%04u%02u/%s/d%02u/%s", SYS_INFO_ROOT, year, month, SYS_INFO_VERSION, day, MSG_NODE);
    // NB: this will create intermediate nodes if missing
    Owned<IRemoteConnection> conn = querySDS().connect(SYS_INFO_ROOT, myProcessSession(), RTM_CREATE_ADD, SDS_LOCK_TIMEOUT);
    assertex(conn);
    IPropertryTree *dayPT = conn->queryRoot();
    // could do this way, but prob. better to have a CSysInfoLoggerMsg::setPTree(IPropertyTree *target, ....)
    Owned<IPropertyTree> msgTree = CSysInfoLoggerMsg::createMsgPTree(id, cat, code, source, msg, ts, false);
    dayPT->setPropTree(nullptr, msgTree); // replaces
    dayPT->setPropInt(ATTR_LASTID, ++id);

This would avoid any unnecessary layers being downloaded.
NB: it means that the day SYS_INFO_VERSION version, would have to be either it's own node, or could be part of the day node name itself.

Less concerned in the read case, because I'm assuming that the normal case is to walk over multiple messages anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking clarification: does querySDS().connect or conn->queryRoot(), download that entire root path? Or is the path only downloaded when conn->queryRoot()->queryPropTree() is called?

The @lastid used is stored with the day node. Would the querying of the @lastid attribute day pt download the entire day PT?

@jakesmith

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking clarification: does querySDS().connect or conn->queryRoot(), download that entire root path?

no, there's a lazy fetch mechanism at each level.
Neither the connection nor queryPropTree will cause layers to be fetch at that point, but any interaction with children of a ptree node, will cause all the children of that node to be downloaded, including if a new node is added to a node.

The @lastid used is stored with the day node. Would the querying of the @lastid attribute day pt download the entire day PT?

Yes, tt would download all child nodes (and attributes).

One issue with the suggested code above is that you won't have access to the /@lastid
Since it has a write lock to SYS_INFO_ROOT, it could get/update a global lastId there (you have plenty of bits available in the logid size, for a 2^32 to be combined in makeMessageId).

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments

{
char c = *str++;
if(!isdigit(c))
throwError1(JLIBERR_BadlyFormedDateTime, str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: now this has moved and in name is a function simply for reading digits from a string, the error message doesn't make sense, should be something more generic.

while(numDigits--)
{
char c = *str++;
if(!isdigit(c))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/formatting/style: I know because moved, but worth adding a space after "if" and after "while"

}
};

class CSysInfoLoggerMsgFilter : public CInterfaceOf<ISysInfoLoggerMsgFilter>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless beforeDispose needed, better if a CSimpleInterfaceOf<>

return new CSysInfoLoggerMsgFilter(msgId);
}

class CSysInfoLoggerMsgIterator : public CInterfaceOf<ISysInfoLoggerMsgIterator>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless beforeDispose needed, better if a CSimpleInterfaceOf<>

}
CSysInfoLoggerMsg & queryInfoLoggerMsg()
{
return infoMsg.set(&(msgIter->get()), root.get(), updateable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: we don't normally use Owned/Linked .get() unless, normally rely on operator.
it's confusing here particularly, because the get on the iterator does what you'd expect (links) vs the get on root doesn't (correct but confusing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, see prev. comments : 0886cdc#r1613656104

linking the msgIter item isn't helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see the previous comments for some reason.

I've removed the get for root. However, as I need a IPropertyTree * from the msgIter, I still need IPropertyTreeIterator::get(). Within CSysInfoLoggerMsg, I need to use an Owned object, as it creates one locally (when CSysInfoLoggerMsg::set isn't called).

unsigned id = root->getPropInt64(ATTR_LASTID, 1);
if (id==INT64_MAX)
id=0;
root->setPropInt64(ATTR_LASTID, id+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setPropInt() as should be a 32bit id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"last id" is a bit confusing..
1st entry to this code will get "last", which will be 0, default 1, use that for new message, and set last to 2

I think "next id" would make more sense:

    unsigned id = root->getPropInt(ATTR_NEXTID, 1);
    root->setPropInt(ATTR_NEXTID, id+1); // NB: it's okay if id wraps 


CSysInfoLoggerMsg sysInfoMsg(id, cat, code, source, msg, ts, false);
msgPT->setPropTree(nullptr, sysInfoMsg.getTree());
msgPT->setProp(".", msg); // previous setPropTree doesn't set the node value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eek that's a bug! we can leave it like this, but can you open and reference a new JIRA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteXpathList.push_back(xpath.str());
}
}
Owned<IRemoteConnection> conn = querySDS().connect(SYS_INFO_ROOT, myProcessSession(), RTM_LOCK_WRITE, SDS_LOCK_TIMEOUT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read connection created/used by the iterator above has now been relinquished, which means potentially, some of the matches that were found are no longer there. It also means that it has to refetch SDS layers, so not that efficient.
It would be better to share the same connection. Still perform the collection of matches 1st, followed by the delete loop.

unsigned count = 0;
for (auto xpath: deleteXpathList)
{
if (root->removeProp(xpath.c_str()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a guard is okay, but if using a single (write) connection this should never be false.

throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: month and year must be provided when filtering by day");
if (!_year && _month)
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: year must be provided when filtering by month");
haveDateRange = matchEndYear||matchStartYear||matchStartMonth|matchEndMonth||matchStartDay||matchEndDay;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/formatting: indentation wrong.

@jakesmith
Copy link
Member

@shamser - NB: this is lower priority than the temp stat. PR's (especially finishing off the peak tracking one(s))

@shamser shamser changed the base branch from candidate-9.6.x to candidate-9.8.x September 13, 2024 10:54
@shamser shamser requested a review from jakesmith September 13, 2024 10:57
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good - a few comments.

One thing that's not implemented here, but can be addressed in the next update I think is:
[comment from JIRA]

To ally concerns about Dali bloat (memory) if there end up being a lot of entries. It would be sensible if each DD supported being represented as a serialized blob (plain-text fields can be supported too). Supporting binary blobs will allow Dali to externalize these on disk avoiding them permanently residing in memory.

dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
public:
CSysInfoLoggerMsgIterator(IConstSysInfoLoggerMsgFilter * _filter, bool _updateable=false, IRemoteConnection *_conn=nullptr) : filter(_filter), updateable(_updateable), conn(_conn)
{
if (_conn==nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/style: normally if (!conn) (and is that on line 450)

dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
return new CSysInfoLoggerMsgFilter(msgId);
}

class CSysInfoLoggerMsgIterator : public CSimpleInterfaceOf<ISysInfoLoggerMsgIterator>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for now, but I wonder if this should not hold a [locked] connection whilst the caller is iterating, but instead iterate at create time and stash the results (and complain if an excessive amount perhaps), and release the connection.
If not, the danger is that a slow client that is using one of these iterators, is blocking all write updates.
It should be okay, as long as the client uses of these never leak or keep these iterators for too long..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment about this a few lines down.

dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
dali/base/sysinfologger.cpp Outdated Show resolved Hide resolved
@shamser shamser requested a review from jakesmith September 16, 2024 14:21
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I've opened the sub-tasks, but looks v. close.

@@ -443,9 +433,9 @@ class CSysInfoLoggerMsgIterator : public CSimpleInterfaceOf<ISysInfoLoggerMsgIte
public:
CSysInfoLoggerMsgIterator(IConstSysInfoLoggerMsgFilter * _filter, bool _updateable=false, IRemoteConnection *_conn=nullptr) : filter(_filter), updateable(_updateable), conn(_conn)
{
if (_conn==nullptr)
if (conn==nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/style: normally if (!conn) (and is that on line 450)

trivial: but the previous comment was really saying, why not !conn (vs conn==nullptr), which is more how it is on line 440.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to change, but maybe missed this comment?

dali/base/sysinfologger.cpp Show resolved Hide resolved
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - see final minor comments

throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: month and year must be provided when filtering by day");
if (!_year && _month)
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: year must be provided when filtering by month");
haveDateRange = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: superfluous if always false - already initialized as member on line 169

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setDateRange can set haveDataRange to true, but it is unconditionally set to false here. Also this ctor does some validation that setDateRange does not.

Can you combine this validation code along with the conditional haveDataRange setting into setDateRange, and then call setDateRange from this ctor?

dali/base/sysinfologger.cpp Show resolved Hide resolved
@shamser shamser requested a review from jakesmith September 24, 2024 15:41
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good. 3 trivial comments (re. makeStringException vs makeStringExceptionV). Please address and go ahead and squash.

matchStartYear(_year), matchStartMonth(_month), matchStartDay(_day)
{
if (hiddenOnly && visibleOnly)
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: cannot filter by both hiddenOnly and visibleOnly");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: should be using makeStringException()

{
if ( (startDay && (!startMonth||!startYear)) ||
(endDay && (!endMonth||!endYear)) )
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: month and year must be provided when filtering by day");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: should be using makeStringException()

(endDay && (!endMonth||!endYear)) )
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: month and year must be provided when filtering by day");
if ((!startYear && startMonth) || (!endYear && endMonth))
throw makeStringExceptionV(-1, "ISysInfoLoggerMsgFilter: year must be provided when filtering by month");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: should be using makeStringException()

@shamser shamser force-pushed the issue31774 branch 2 times, most recently from 2185c5c to 7e0a205 Compare September 26, 2024 13:59
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser generally looks good. I have added a few comments, but I suspect only 1 needs fixing before this is merged. A few questions/comments for @jakesmith for future consideration around efficiency.

It would have been good to have a comment in the jira to describe where to find the design (I added one). Some of the questions about efficiency are only relevant if there are going to be large numbers of entries - it isn't clear what the expectations are at the moment.

void testSysInfoLogger()
{
// cleanup - remove messages that may have been left over from previous run
deleteOlderThanLogSysInfoMsg(false, false, 2001, 03, 00);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause any problems if this unit test is run on a live system? I don't see a filter by SOURCE_COMPONENT...

unsigned ret = 0;
while (numDigits--)
{
char c = *str++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to increment after the test - otherwise your error message will not include the invalid character..
(I see it is in moved code, so not a new issue.)

interface ISysInfoLoggerMsg
{
virtual bool queryIsHidden() const = 0;
virtual unsigned __int64 queryTimeStamp() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Better to use timestamp_type for all timestamps.

}
CSysInfoLoggerMsgFilter(bool _visibleOnly, bool _hiddenOnly, unsigned _year, unsigned _month, unsigned _day) :
visibleOnly(_visibleOnly), hiddenOnly(_hiddenOnly),
matchEndYear(_year), matchEndMonth(_month), matchEndDay(_day),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: Assignments are unnecessary since setDateRange() is called.


if (isEmptyString(source))
source = "unknown";
Owned<IRemoteConnection> conn = querySDS().connect(SYS_INFO_ROOT, myProcessSession(), RTM_LOCK_WRITE|RTM_CREATE_QUERY, SDS_LOCK_TIMEOUT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith
Would it be worth having a flag RMT_CREATE_PATH which could be set on the subsequent connection to ensure the enclosing scope exists - it would often save an extra dali connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back - jake says it already does!

unsigned year, month, day;
extractDate(ts, year, month, day);
xpath.appendf("%s/m%04u%02u/d%02u/%s", SYS_INFO_ROOT, year, month, day, MSG_NODE);
Owned<IRemoteConnection> connMsgRoot = querySDS().connect(xpath.str(), myProcessSession(), RTM_CREATE_ADD, SDS_LOCK_TIMEOUT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith: It would be worthwhile to have a version of connect that atomically adds a pre-created property tree to an xpath. It would avoid data coming back from the client and holding a lock for long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned count = 0;
for (auto & xpath: deleteXpathList)
{
if (root->removeProp(xpath.c_str()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to generate warnings from static checkers - that the condition has no effect.
better would be
[[maybe_unused] bool success = root->removeProp(xpath.c_str());
or check and log if it is not as expected.

Ah the code is actually wrong - there is an extra semicolon on the end of the line.

msgMonth = readDigits(p, 2, false);
}
if (msgYear == 0 || msgMonth == 0)
throw makeStringExceptionV(-1, "child of " SYS_INFO_ROOT " is invalid: %s", monthPT.queryName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this log at the end of the function? Otherwise an invalid node will prevent items from being deleted.


std::vector<std::string> deleteXpathList;
IPropertyTree * root = conn->queryRoot();
Owned<IPropertyTreeIterator> monthIter = root->getElements("*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith If you requested a sorted iterator would that be relatively efficient? If would mean you could break out the loop once you hit a date that was newer than your threshold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be worthwhile.

bool fullDayMatch=false;
bool hardMatchYear = matchStartYear && (matchStartYear==matchEndYear);
bool hardMatchMonth = matchStartMonth && (matchStartMonth==matchEndMonth);
if (hardMatchYear && hardMatchMonth)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith is there any scope for m%04u* if the month is not known?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would make sense, it would still fetch all at the client level (via lazy fetch), but at the iterator wouldn't see any non-matching years, so would improve the efficiency of this code.

But also, it would probably be worth using iptiter_remote (or probably iptiter_remotegetbranch) to getElements. That will cause the filtering to be done at the server side, which means, if there's many parent nodes that don't have any results below them, it will avoid the client pulling them down as it iterates.

@ghalliday
Copy link
Member

Again, it is probably best to deal with some of the improvements as separate PRs so that something is merged and available.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser please squash

…l messages

This provides an API for loggging, hiding, deleting messages and query using Dali
as the message store.

The API has not been integrated into any existing software, so existing
software is not affected.

Unittests have been implemented for API, named DaliSysInfoLoggerTester.

Signed-off-by: Shamser Ahmed <[email protected]>

Changes following review

Signed-off-by: Shamser Ahmed <[email protected]>
@ghalliday ghalliday merged commit 8b07c8a into hpcc-systems:candidate-9.8.x Oct 10, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants