Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32281 Grafana/Loki logaccess 2nd phase improvements #18900

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Jul 20, 2024

  • Ensures proper log output format
  • Utilizes context log fields available
  • Updates grafana logaccess sample config
  • Replaces unnecessary funtion w/ inline logic
  • Removes invalid wildcard filter restriction
  • Trims log data at server due to added json newlines
  • Reenables several stream columns

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32281

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR
Assigning user: [email protected]

@rpastrana rpastrana force-pushed the HPCC-32281-NEW-Loki-context-columns branch 2 times, most recently from 8363a0e to 4590538 Compare July 22, 2024 19:44
fieldValues->setProp(m_messageColumn.name, stream->queryProp(m_messageColumn.name));
break;
}
case RETURNCOLS_MODE_custom: //not supported yet
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fail and let user know

@@ -605,14 +647,14 @@ bool GrafanaLogAccessCurlClient::fetchLog(LogQueryResultDetails & resultDetails,
//from https://grafana.com/docs/loki/latest/query/log_queries/
//Adding | json to your pipeline will extract all json properties as labels if the log line is a valid json document. Nested properties are flattened into label keys using the _ separator.
logLineParser.set(" | json log"); //this parses the log entry and extracts the log field into a label
logLineParser.append(" | line_format \"{{.log}}\""); //Formats output line to only contain log label
logLineParser.append(" | line_format \"{{.log | trim}}\""); //Formats output line to only contain log label
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line_format output doesn't clean up all json new lines

@rpastrana
Copy link
Member Author

@jeclrsg this change implements the log output format I provided the other day. Let me know if there are any changes needed so I can move forward w/ this PR. Thanks.

//This drops the stream, and various insert timestamps

//we're always going to get a stream container, and a the log line...
//the stream container contains unnecessary, and redundant lines
//there's documentation of a 'drop' command whch doesn't work in practice
//online recomendation is to clear those stream entries...
logLineParser.append(" | label_format log=\"\", filename=\"\", namespace=\"\", node_name=\"\", job=\"\"");// app=\"\", component=\"\", container=\"\", instance=\"\");
logLineParser.append(" | label_format log=\"\", filename=\"\"");//, namespace=\"\", job=\"\""// app=\"\", component=\"\", container=\"\", instance=\"\");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the "stream" values are useful, don't clear them

@@ -73,22 +73,24 @@ class GrafanaLogAccessCurlClient : public CInterfaceOf<IRemoteLogAccess>
LogField m_instanceColumn = LogField("instance", true);
LogField m_podColumn = LogField("pod", true);
LogField m_containerColumn = LogField("container", true);
LogField m_messageColumn = LogField("MSG");
LogField m_messageColumn = LogField("log");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll stuff the entire result log line in the message column.

LogField m_logProcIDColumn = LogField("PID");
LogField m_logThreadIDColumn = LogField("TID");
LogField m_logDateTimstampColumn = LogField("tsNs");
//LogField m_logTimestampColumn = LogField("TIME");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need these until we extract all columns

Copy link
Contributor

@jeclrsg jeclrsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that column mapping in the yaml looks correct to me based on the conversation we had last week.

@rpastrana
Copy link
Member Author

Yeah, that column mapping in the yaml looks correct to me based on the conversation we had last week.

Thanks @jeclrsg can you also review the newly proposed response format? Keep in mind the timestamp we receive from Grafana is an epoch value which can be converted into human readable timestamp:

<GetLogsResponse>
<LogLines>
{
      "lines": [
            {
                  "fields": [
                        {"tsNs": "1721343553880265366"},
                        {"pod": "sasha-thor-qmon-57cc47dc78-tcks5"},
                        {"log": "00000038 AUD INF 2024-07-18 22:59:13.872 1 22 UNK \",ThorQueueMonitor,thor,0,0,1,1,49,---,---\" sasha-thor-qmon-57cc47dc78-tcks5"}
                  ]
            },
            {
                  "fields": [
                        {"tsNs": "1721343561154324789"},
                        {"pod": "eclwatch-775b856c5b-qzwgl"},
                        {"log": "00000141 USR PRO 2024-07-18 22:59:21.027 7 23 UNK \"CInfoCacheReaderThread K8s Resource Reader: InfoCache collected (0 seconds).\" eclwatch-775b856c5b-qzwgl"}
                  ]
            }
      ]
}
</LogLines>
<LogLineCount>10</LogLineCount>
<TotalLogLinesAvailable>0</TotalLogLinesAvailable>
</GetLogsResponse>

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues spotted - see comments.

Owned<IPropertyIterator> fieldsIter = resultLine->getIterator();
ForEach(*fieldsIter)
{
const char * prop = resultLine->queryProp(fieldsIter->getPropKey());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char * prop = fieldsIter>queryPropValue() is more efficient/simpler.

same comments for the other formats below.

encodeXML(prop, returnbuf);
returnbuf.appendf("</%s>", fieldsIter->getPropKey());
}
returnbuf.appendf("</lin>");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "line" - would generate invalid xml

firstField = false;

StringBuffer prop;
resultLine->getProp(fieldsIter->getPropKey(), prop);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should use queryPropValue()

fieldValues->setProp(m_nodeColumn.name, stream->queryProp(m_nodeColumn.name));
fieldValues->setProp(m_containerColumn.name, stream->queryProp(m_containerColumn.name));
fieldValues->setProp(m_instanceColumn.name, stream->queryProp(m_instanceColumn.name));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment //fallthrough whenever a case falls through to the next case.

I initially read the code incorrectly because it wasn't there, and coverity will also complain.

case RETURNCOLS_MODE_default:
{
fieldValues->setProp(m_podColumn.name, stream->queryProp(m_podColumn.name));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//fallthrough

logLineIter.setown(result.getElements("values"));
}

processValues(returnbuf, logLineIter, result.queryPropTree("stream"), format, retcolmode, isFirstLine);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will crash if logLineIter is null - either it should not be called, or processValues() needs protecting.

@@ -832,6 +873,24 @@ GrafanaLogAccessCurlClient::GrafanaLogAccessCurlClient(IPropertyTree & logAccess
if (logMap.hasProp(logMapSearchColAtt))
m_podColumn.name = logMap.queryProp(logMapSearchColAtt);
}
else if (streq(logMapType, "message"))
{
if (logMap.hasProp(logMapIndexPatternAtt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks like it is duplicated multiple times. A candidate for cleaning up in a later PR.
(Not being commoned up makes it look as though each of the different branches are doing something different..)

@jeclrsg
Copy link
Contributor

jeclrsg commented Jul 26, 2024

@rpastrana No, I don't see any problems in that response formatting either. It might be nice to have the timestamps be consistent coming out of all the engines, either dates or as ints. But I don't think there'd be any problem with adding some code on the client side to check the engine and be prepared to convert to a date.

@rpastrana
Copy link
Member Author

Thanks

@rpastrana No, I don't see any problems in that response formatting either. It might be nice to have the timestamps be consistent coming out of all the engines, either dates or as ints. But I don't think there'd be any problem with adding some code on the client side to check the engine and be prepared to convert to a date.

Thanks for the feedback @jeclrsg.
I'll add a field mapping type of "epoch" and declare the tsNs field as such in the column map. This would let the UI know to map the value to a human readable TS

            <Column>
                  <Name>tsNs</Name>
                  <LogType>timestamp</LogType>
                  <ColumnMode>ALL</ColumnMode>
                  <ColumnType>epoch</ColumnType><!--is there a better type name-->
            </Column>

@rpastrana rpastrana requested a review from ghalliday July 30, 2024 17:48
@rpastrana rpastrana force-pushed the HPCC-32281-NEW-Loki-context-columns branch from b18c15a to 9dfd13a Compare July 30, 2024 18:13
fieldValues->setProp(m_nodeColumn.name, stream->queryProp(m_nodeColumn.name));
fieldValues->setProp(m_containerColumn.name, stream->queryProp(m_containerColumn.name));
fieldValues->setProp(m_instanceColumn.name, stream->queryProp(m_instanceColumn.name));
[[fallthrough]];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C17 feature, not sure if there's any concern using c17 specific constructs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I hadn't picked up on that feature. We require c++17, so no problem using the feature - it has the advantage it will be checked by the compiler.

if (result.hasProp("values"))
{
logLineIter.setown(result.getElements("values")); // if no values elements found, will get NullPTreeIterator
if (!logLineIter)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this will always be non-null. You have to check the result of logLineIter->first() to know if there are no elements found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, the goal was to protect processValues from null logLineIter, removing in favor of L271

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana looks good. Agree that one line is redundant. Please squash.

if (result.hasProp("values"))
{
logLineIter.setown(result.getElements("values")); // if no values elements found, will get NullPTreeIterator
if (!logLineIter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this will always be non-null. You have to check the result of logLineIter->first() to know if there are no elements found.

@rpastrana rpastrana force-pushed the HPCC-32281-NEW-Loki-context-columns branch from e75057f to e0c6ada Compare August 2, 2024 19:08
@rpastrana
Copy link
Member Author

@ghalliday squashed

@rpastrana
Copy link
Member Author

@jeclrsg what needs to be done on the UI side to support these changes? The "epoch" column type is the only one I can think of.

@ghalliday
Copy link
Member

@rpastrana looks good. Only question is should this target 9.6.x?

- Ensures proper log output format
- Utilizes context log fields available
- Updates grafana logaccess sample config
- Replaces unnecessary funtion w/ inline logic
- Removes invalid wildcard filter restriction
- Trims log data at server due to added json newlines
- Reenables several stream columns
- Adds epoch log field type
- Ensures logfield type default is set to string

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-32281-NEW-Loki-context-columns branch from e0c6ada to a0283eb Compare August 5, 2024 13:30
@rpastrana rpastrana changed the base branch from candidate-9.8.x to candidate-9.6.x August 5, 2024 14:03
@rpastrana
Copy link
Member Author

@ghalliday rebased to 9.6.x but had trouble building locally, awaiting build results on gihubactions...

@ghalliday ghalliday closed this Aug 8, 2024
@ghalliday ghalliday reopened this Aug 8, 2024
Copy link

github-actions bot commented Aug 8, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32281

Jirabot Action Result:

@ghalliday ghalliday merged commit 2755a2c into hpcc-systems:candidate-9.6.x Aug 8, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants