Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 Initial attempt to support distributed traces for the cloud lifecycle #138

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
cdcb114
Initial attempt to support distributed traces for the cloud lifecycle
v1v Jun 24, 2021
38c6cbc
Noop
v1v Jun 24, 2021
79e16b3
Fix cloud onFailure with https://github.com/jenkinsci/jenkins/pull/4922
v1v Jun 24, 2021
5ec10da
Add specific plugin details and create started span
v1v Jun 24, 2021
301f2ae
Revert "Add specific plugin details and create started span"
v1v Jul 4, 2021
df3556e
Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisioner#lam…
v1v Jul 4, 2021
baa08c8
Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisioner#lam…
v1v Jul 4, 2021
091504e
Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisioner#lam…
v1v Jul 4, 2021
3fee581
Cloud Root naming strategy
v1v Jul 4, 2021
59c294f
Collect cloud plugin details
v1v Jul 4, 2021
be3c4b2
Add Google Cloud attributes
v1v Jul 4, 2021
4a22a50
Add InstanceConfiguration google configuration attributes
v1v Jul 4, 2021
e33cfc2
Add cloud name attribute
v1v Jul 4, 2021
57ef907
Refactor and use transform functions
v1v Jul 4, 2021
c89c6a5
Revert "Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisi…
v1v Jul 4, 2021
bc77989
Revert "Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisi…
v1v Jul 4, 2021
4a72a04
Revert "Bump dependency to fix the 'WARNING hudson.slaves.NodeProvisi…
v1v Jul 4, 2021
ba72571
Merge remote-tracking branch 'upstream/master' into feature/traces-fo…
v1v Jul 4, 2021
3de8781
Fix dependencies
v1v Jul 4, 2021
6d8bd76
Merge remote-tracking branch 'upstream/master' into feature/traces-fo…
v1v Jul 8, 2021
d28294a
Fix javadoc
v1v Jul 8, 2021
57ff1a6
Use object instead a cast
v1v Jul 12, 2021
4f3e52b
Add k8s span attributes for the cloud transactions
v1v Jul 12, 2021
d06274d
Add more logs and _onRollback logic
v1v Jul 12, 2021
bc2297a
Support onStarted with a list of plannedNodes and add more debug
v1v Jul 12, 2021
81104e5
Enrich containers/pods with a new Handler
v1v Jul 12, 2021
b2bfecb
Ensure the root transaction last for the whole lifecycle
v1v Jul 12, 2021
c525190
Revert "Ensure the root transaction last for the whole lifecycle"
v1v Jul 12, 2021
5ba4c0c
merge log traces
v1v Jul 12, 2021
9733536
Add attributes in the docs
v1v Jul 12, 2021
07c7c75
cosmetic log change
v1v Jul 13, 2021
873dd50
Add cloud name in the transactions if possible
v1v Jul 13, 2021
56efa12
Cosmetic log
v1v Jul 13, 2021
6409b2a
Cloud attributes should not use the label but the node
v1v Jul 13, 2021
a6dda42
Add started span
v1v Jul 13, 2021
1e885f5
Revert "Add started span"
v1v Jul 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<properties>
<revision>0.16</revision>
<changelist>-SNAPSHOT</changelist>
<jenkins.version>2.235.5</jenkins.version>
<jenkins.version>2.277.1</jenkins.version>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to workaround the issue in my local environment since I get some stacktrace errors:

WARNING hudson.slaves.NodeProvisioner#lambda: Unexpected exception encountered while provisioning agent

So maybe, my machine is too slow that somehow that particular issue happens....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

<java.level>8</java.level>
<gitHubRepo>jenkinsci/${project.artifactId}-plugin</gitHubRepo>
<opentelemetry.version>1.2.0</opentelemetry.version>
Expand All @@ -32,8 +32,8 @@
<dependencies>
<dependency>
<groupId>io.jenkins.tools.bom</groupId>
<artifactId>bom-2.235.x</artifactId>
<version>29</version>
<artifactId>bom-2.263.x</artifactId>
<version>887.vae9c8ac09ff7</version>
<scope>import</scope>
<type>pom</type>
</dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@
import com.google.common.base.Strings;
import hudson.Extension;
import hudson.PluginWrapper;
import hudson.model.Descriptor;
import hudson.util.FormValidation;
import io.jenkins.plugins.opentelemetry.authentication.NoAuthentication;
import io.jenkins.plugins.opentelemetry.backend.ObservabilityBackend;
import io.jenkins.plugins.opentelemetry.authentication.OtlpAuthentication;
import io.jenkins.plugins.opentelemetry.backend.ObservabilityBackend;
import io.jenkins.plugins.opentelemetry.computer.CloudSpanNamingStrategy;
import io.jenkins.plugins.opentelemetry.job.SpanNamingStrategy;
import io.jenkins.plugins.opentelemetry.semconv.JenkinsOtelSemanticAttributes;
import jenkins.model.GlobalConfiguration;
Expand All @@ -21,7 +23,6 @@
import org.jenkinsci.Symbol;
import org.jenkinsci.plugins.workflow.cps.nodes.StepAtomNode;
import org.jenkinsci.plugins.workflow.cps.nodes.StepStartNode;
import org.jenkinsci.plugins.workflow.steps.StepDescriptor;
import org.kohsuke.stapler.DataBoundConstructor;
import org.kohsuke.stapler.DataBoundSetter;
import org.kohsuke.stapler.QueryParameter;
Expand Down Expand Up @@ -66,6 +67,8 @@ public class JenkinsOpenTelemetryPluginConfiguration extends GlobalConfiguration

private transient SpanNamingStrategy spanNamingStrategy;

private transient CloudSpanNamingStrategy cloudSpanNamingStrategy;

private transient ConcurrentMap<String, StepPlugin> loadedStepsPlugins = new ConcurrentHashMap<>();

private String serviceName;
Expand Down Expand Up @@ -219,6 +222,15 @@ public SpanNamingStrategy getSpanNamingStrategy() {
return spanNamingStrategy;
}

@Inject
public void setCloudSpanNamingStrategy(CloudSpanNamingStrategy cloudSpanNamingStrategy) {
this.cloudSpanNamingStrategy = cloudSpanNamingStrategy;
}

public CloudSpanNamingStrategy getCloudSpanNamingStrategy() {
return cloudSpanNamingStrategy;
}

@Nonnull
public ConcurrentMap<String, StepPlugin> getLoadedStepsPlugins() {
return loadedStepsPlugins;
Expand All @@ -239,7 +251,7 @@ public StepPlugin findStepPluginOrDefault(@Nonnull String stepName, @Nonnull Ste
}

@Nonnull
public StepPlugin findStepPluginOrDefault(@Nonnull String stepName, @Nullable StepDescriptor descriptor) {
public StepPlugin findStepPluginOrDefault(@Nonnull String stepName, @Nullable Descriptor descriptor) {
StepPlugin data = loadedStepsPlugins.get(stepName);
if (data!=null) {
LOGGER.log(Level.FINEST, " found the plugin for the step '" + stepName + "' - " + data);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
* Copyright The Original Author or Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.jenkins.plugins.opentelemetry.computer;

import com.google.common.annotations.VisibleForTesting;
import hudson.Extension;
import hudson.slaves.NodeProvisioner;
import jenkins.YesNoMaybe;
import org.jenkinsci.Symbol;
import org.kohsuke.stapler.DataBoundConstructor;

import javax.annotation.Nonnull;

/**
* Use same root span name for all pull cloud labels
*/
@Extension(dynamicLoadable = YesNoMaybe.YES)
@Symbol("cloudSpanNamingStrategy")
public class CloudSpanNamingStrategy {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From:

image

To:

image


@DataBoundConstructor
public CloudSpanNamingStrategy() {
}

@Nonnull
public String getRootSpanName(@Nonnull NodeProvisioner.PlannedNode plannedNode) {
return getNodeRootSpanName(plannedNode.displayName);
}

@VisibleForTesting
@Nonnull
protected String getNodeRootSpanName(@Nonnull String displayName) {
// format: <namePrefix>-<id>
// e.g. "obs11-ubuntu-18-linux-beyyg2"
// remove last -<.*>
if (displayName.contains("-")) {
return displayName.substring(0, displayName.lastIndexOf("-")) + "-{id}";
}
return displayName;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,67 +5,147 @@

package io.jenkins.plugins.opentelemetry.computer;

import com.google.errorprone.annotations.MustBeClosed;
import com.google.inject.Inject;
import edu.umd.cs.findbugs.annotations.NonNull;
import hudson.Extension;
import hudson.model.Label;
import hudson.model.Node;
import hudson.slaves.CloudProvisioningListener;
import hudson.slaves.Cloud;
import hudson.slaves.NodeProvisioner;
import io.jenkins.plugins.opentelemetry.OpenTelemetrySdkProvider;
import io.jenkins.plugins.opentelemetry.JenkinsOpenTelemetryPluginConfiguration;
import io.jenkins.plugins.opentelemetry.OtelUtils;
import io.jenkins.plugins.opentelemetry.computer.opentelemetry.OtelContextAwareAbstractCloudProvisioningListener;
import io.jenkins.plugins.opentelemetry.computer.opentelemetry.context.PlannedNodeContextKey;
import io.jenkins.plugins.opentelemetry.semconv.JenkinsOtelSemanticAttributes;
import io.jenkins.plugins.opentelemetry.semconv.JenkinsSemanticMetrics;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanBuilder;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.Scope;

import javax.annotation.Nonnull;
import javax.annotation.PostConstruct;
import javax.inject.Inject;
import java.util.Collection;
import java.util.logging.Level;
import java.util.logging.Logger;

import static com.google.common.base.Verify.verifyNotNull;

@Extension
public class MonitoringCloudListener extends CloudProvisioningListener {
public class MonitoringCloudListener extends OtelContextAwareAbstractCloudProvisioningListener {
private final static Logger LOGGER = Logger.getLogger(MonitoringCloudListener.class.getName());

protected Meter meter;

private LongCounter failureCloudCounter;
private LongCounter totalCloudCount;

private CloudSpanNamingStrategy cloudSpanNamingStrategy;

@PostConstruct
public void postConstruct() {
failureCloudCounter = meter.longCounterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_FAILURE)
failureCloudCounter = getMeter().longCounterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_FAILURE)
.setDescription("Number of failed cloud agents when provisioning")
.setUnit("1")
.build();
totalCloudCount = meter.longCounterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_COMPLETED)
totalCloudCount = getMeter().longCounterBuilder(JenkinsSemanticMetrics.JENKINS_CLOUD_AGENTS_COMPLETED)
.setDescription("Number of provisioned cloud agents")
.setUnit("1")
.build();
}

@Override
public void onFailure(NodeProvisioner.PlannedNode plannedNode, Throwable t) {
public void _onStarted(Cloud cloud, Label label, Collection<NodeProvisioner.PlannedNode> plannedNodes) {
LOGGER.log(Level.FINE, () -> "_onStarted(" + label + ")");
if (plannedNodes.size() != 1) {
return;
}
NodeProvisioner.PlannedNode plannedNode = plannedNodes.iterator().next();

String rootSpanName = this.cloudSpanNamingStrategy.getRootSpanName(plannedNode);
JenkinsOpenTelemetryPluginConfiguration.StepPlugin stepPlugin = JenkinsOpenTelemetryPluginConfiguration.get().findStepPluginOrDefault("cloud", cloud.getDescriptor());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

SpanBuilder rootSpanBuilder = getTracer().spanBuilder(rootSpanName).setSpanKind(SpanKind.SERVER);

// TODO move this to a pluggable span enrichment API with implementations for different observability backends
// Regarding the value `unknown`, see https://github.com/jenkinsci/opentelemetry-plugin/issues/51
rootSpanBuilder
.setAttribute(JenkinsOtelSemanticAttributes.ELASTIC_TRANSACTION_TYPE, "unknown")
.setAttribute(JenkinsOtelSemanticAttributes.CI_CLOUD_NAME, plannedNode.displayName)
.setAttribute(JenkinsOtelSemanticAttributes.CI_CLOUD_LABEL, label.getExpression())
.setAttribute(JenkinsOtelSemanticAttributes.JENKINS_STEP_PLUGIN_NAME, stepPlugin.getName())
.setAttribute(JenkinsOtelSemanticAttributes.JENKINS_STEP_PLUGIN_VERSION, stepPlugin.getVersion());

// ENRICH attributes with every Cloud specifics

// START ROOT SPAN
Span rootSpan = rootSpanBuilder.startSpan();

this.getTraceService().putSpan(plannedNode, rootSpan);
rootSpan.makeCurrent();
LOGGER.log(Level.FINE, () -> plannedNode.displayName + " - begin root " + OtelUtils.toDebugString(rootSpan));
}

@Override
public void _onCommit(@NonNull NodeProvisioner.PlannedNode plannedNode, @NonNull Node node) {
LOGGER.log(Level.FINE, () -> "_onCommit(" + node + ")");
try (Scope parentScope = endCloudPhaseSpan(plannedNode)) {
Span runSpan = getTracer().spanBuilder(JenkinsOtelSemanticAttributes.CLOUD_SPAN_PHASE_COMMIT_NAME).setParent(Context.current()).startSpan();
LOGGER.log(Level.FINE, () -> plannedNode.displayName + " - begin " + OtelUtils.toDebugString(runSpan));
runSpan.makeCurrent();
this.getTraceService().putSpan(plannedNode, runSpan);
}
}

@Override
public void _onFailure(NodeProvisioner.PlannedNode plannedNode, Throwable t) {
LOGGER.log(Level.FINE, () -> "_onFailure(" + plannedNode + ")");
failureCloudCounter.add(1);
LOGGER.log(Level.FINE, () -> "onFailure(" + plannedNode + ")");
try (Scope parentScope = endCloudPhaseSpan(plannedNode)) {
Span span = getTracer().spanBuilder(JenkinsOtelSemanticAttributes.CLOUD_SPAN_PHASE_FAILURE_NAME).setParent(Context.current()).startSpan();
span.recordException(t);
span.setStatus(StatusCode.ERROR, t.getMessage());
span.end();
LOGGER.log(Level.FINE, () -> plannedNode.displayName + " - begin " + OtelUtils.toDebugString(span));
}
}

@Override
public void onRollback(@NonNull NodeProvisioner.PlannedNode plannedNode, @NonNull Node node,
@NonNull Throwable t) {
public void _onRollback(@NonNull NodeProvisioner.PlannedNode plannedNode, @NonNull Node node,
@NonNull Throwable t){
LOGGER.log(Level.FINE, () -> "_onRollback(" + plannedNode + ")");
failureCloudCounter.add(1);
LOGGER.log(Level.FINE, () -> "onRollback(" + plannedNode + ")");
}

@Override
public void onComplete(NodeProvisioner.PlannedNode plannedNode, Node node) {
public void _onComplete(NodeProvisioner.PlannedNode plannedNode, Node node) {
LOGGER.log(Level.FINE, () -> "_onComplete(" + plannedNode + ")");
totalCloudCount.add(1);
LOGGER.log(Level.FINE, () -> "onComplete(" + plannedNode + ")");
try (Scope parentScope = endCloudPhaseSpan(plannedNode)) {
Span span = getTracer().spanBuilder(JenkinsOtelSemanticAttributes.CLOUD_SPAN_PHASE_COMPLETE_NAME).setParent(Context.current()).startSpan();
span.setStatus(StatusCode.OK);
span.end();
LOGGER.log(Level.FINE, () -> plannedNode.displayName + " - begin " + OtelUtils.toDebugString(span));
}
}

@MustBeClosed
@Nonnull
protected Scope endCloudPhaseSpan(@NonNull NodeProvisioner.PlannedNode plannedNode) {
Span cloudPhaseSpan = verifyNotNull(Span.current(), "No cloudPhaseSpan found in context");
cloudPhaseSpan.end();
LOGGER.log(Level.FINE, () -> plannedNode.displayName + " - end " + OtelUtils.toDebugString(cloudPhaseSpan));

//this.getTraceService().removeJobPhaseSpan(run, pipelinePhaseSpan);
Span newCurrentSpan = this.getTraceService().getSpan(plannedNode);
Scope newScope = newCurrentSpan.makeCurrent();
Context.current().with(PlannedNodeContextKey.KEY, plannedNode);
return newScope;
}

/**
* Jenkins doesn't support {@link com.google.inject.Provides} so we manually wire dependencies :-(
*/
@Inject
public void setMeter(@Nonnull OpenTelemetrySdkProvider openTelemetrySdkProvider) {
this.meter = openTelemetrySdkProvider.getMeter();
public void setCloudSpanNamingStrategy(CloudSpanNamingStrategy cloudSpanNamingStrategy) {
this.cloudSpanNamingStrategy = cloudSpanNamingStrategy;
}
}
Loading