-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create opentelemetry sdk package #5388
Conversation
This pull request adds or modifies JavaScript ( |
|
||
export type TracingOptions = { | ||
/** This is a temporary property to signal preloading is enabled, can be replaced with `enabled` once we retire build's internal sdk setup */ | ||
preloadingEnabled: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should allow us to rollout this new module without conflicting with the ongoing tracing
setup in @netlify/build
. We can retire it after the rollout.
/** Sets attributes to be propagated across child spans under the current active context | ||
* TODO this method will be removed from this package once we move it to a dedicated one to be shared between build, | ||
* this setup and any other node module which might use our open telemetry setup | ||
* */ | ||
export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) { | ||
const currentBaggage = propagation.getBaggage(context.active()) | ||
// Create a baggage if there's none | ||
let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage | ||
Object.entries(attributes).forEach(([key, value]) => { | ||
baggage = baggage.setEntry(key, { value }) | ||
}) | ||
|
||
return propagation.setBaggage(context.active(), baggage) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy/paste from:
build/packages/build/src/tracing/main.ts
Lines 81 to 91 in 97fd98c
/** Sets attributes to be propagated across child spans under the current active context */ export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) { const currentBaggage = propagation.getBaggage(context.active()) // Create a baggage if there's none let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage Object.entries(attributes).forEach(([key, value]) => { baggage = baggage.setEntry(key, { value }) }) return propagation.setBaggage(context.active(), baggage) }
Which we currently need in order to setup the context passed over to this process (baggage and trace information) to do the tracing and attributes stitching. Once we move the remainder of logic in build's tracing directory to its own utility library we can remove this and use that directly.
/** Starts the tracing SDK, if there's already a tracing service this will be a no-op */ | ||
export const startTracing = async function (options: TracingOptions, packageJson: PackageJson) { | ||
if (!options.preloadingEnabled) return | ||
if (sdk) return | ||
|
||
sdk = new HoneycombSDK({ | ||
resource: new Resource({ | ||
[SemanticResourceAttributes.SERVICE_VERSION]: packageJson.version, | ||
}), | ||
serviceName: packageJson.name, | ||
protocol: 'grpc', | ||
apiKey: options.apiKey, | ||
endpoint: `${options.httpProtocol}://${options.host}:${options.port}`, | ||
sampleRate: options.sampleRate, | ||
// Turn off auto resource detection so that we fully control the attributes we export | ||
autoDetectResources: false, | ||
}) | ||
|
||
// Set the diagnostics logger to our system logger. We also need to suppress the override msg | ||
// in case there's a default console logger already registered (it would log a msg to it) | ||
diag.setLogger(getOtelLogger(options.debug, options.systemLogFile), { | ||
logLevel: options.debug ? DiagLogLevel.DEBUG : DiagLogLevel.INFO, | ||
suppressOverrideMessage: true, | ||
}) | ||
|
||
sdk.start() | ||
|
||
// Loads the contents of the passed baggageFilePath into the baggage | ||
const baggageAttributes = await loadBaggageFromFile(options.baggageFilePath) | ||
const baggageCtx = setMultiSpanAttributes(baggageAttributes) | ||
|
||
const traceFlags = options.traceFlags !== undefined ? options.traceFlags : TraceFlags.NONE | ||
// Sets the current trace ID and span ID based on the options received | ||
// this is used as a way to propagate trace context from other processes such as Buildbot | ||
if (options.traceId !== undefined && options.parentSpanId !== undefined) { | ||
return trace.setSpanContext(baggageCtx, { | ||
traceId: options.traceId, | ||
spanId: options.parentSpanId, | ||
traceFlags: traceFlags, | ||
isRemote: true, | ||
}) | ||
} | ||
|
||
return context.active() | ||
} | ||
|
||
/** Stops the tracing service if there's one running. This will flush any ongoing events */ | ||
export const stopTracing = async function () { | ||
if (!sdk) return | ||
try { | ||
// The shutdown method might return an error if we fail to flush the traces | ||
// We handle it and use our diagnostics logger | ||
await sdk.shutdown() | ||
sdk = undefined | ||
} catch (e) { | ||
diag.error(e) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole start/stop logic is a move of:
build/packages/build/src/tracing/main.ts
Lines 32 to 79 in 97fd98c
/** Starts the tracing SDK, if there's already a tracing service this will be a no-op */ export const startTracing = function (options: TracingOptions, logger: (...args: any[]) => void) { if (!options.enabled) return if (sdk) return sdk = new HoneycombSDK({ serviceName: ROOT_PACKAGE_JSON.name, protocol: 'grpc', apiKey: options.apiKey, endpoint: `${options.httpProtocol}://${options.host}:${options.port}`, sampleRate: options.sampleRate, // Turn off auto resource detection so that we fully control the attributes we export autoDetectResources: false, }) // Set the diagnostics logger to our system logger. We also need to suppress the override msg // in case there's a default console logger already registered (it would log a msg to it) diag.setLogger(getOtelLogger(logger), { logLevel: DiagLogLevel.INFO, suppressOverrideMessage: true }) sdk.start() // Loads the contents of the passed baggageFilePath into the baggage const baggageCtx = loadBaggageFromFile(options.baggageFilePath) // Sets the current trace ID and span ID based on the options received // this is used as a way to propagate trace context from Buildbot const ctx = trace.setSpanContext(baggageCtx, { traceId: options.traceId, spanId: options.parentSpanId, traceFlags: options.traceFlags, isRemote: true, }) return ctx } /** Stops the tracing service if there's one running. This will flush any ongoing events */ export const stopTracing = async function () { if (!sdk) return try { // The shutdown method might return an error if we fail to flush the traces // We handle it and use our diagnostics logger await sdk.shutdown() sdk = undefined } catch (e) { diag.error(e) } }
Once we fully roll this out in prod we can delete and remove build's sdk setup.
//** Loads the baggage attributes from a baggabe file which follows W3C Baggage specification */ | ||
export const loadBaggageFromFile = async function (baggageFilePath?: string) { | ||
if (baggageFilePath === undefined || baggageFilePath.length === 0) { | ||
diag.warn('No baggage file path provided, no context loaded') | ||
return {} | ||
} | ||
let baggageString: string | ||
try { | ||
baggageString = await readFile(baggageFilePath, 'utf-8') | ||
} catch (error) { | ||
diag.error(error) | ||
return {} | ||
} | ||
return parseKeyPairsIntoRecord(baggageString) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, moved from build
build/packages/build/src/tracing/main.ts
Lines 116 to 131 in 97fd98c
//** Loads the baggage attributes from a baggabe file which follows W3C Baggage specification */ export const loadBaggageFromFile = function (baggageFilePath: string) { if (baggageFilePath.length === 0) { diag.warn('Empty baggage file path provided, no context loaded') return context.active() } let baggageString: string try { baggageString = readFileSync(baggageFilePath, 'utf-8') } catch (error) { diag.error(error) return context.active() } const parsedBaggage = parseKeyPairsIntoRecord(baggageString) return setMultiSpanAttributes(parsedBaggage) }
/** | ||
* Given a path to a node executable (potentially a symlink) get the module packageJson | ||
*/ | ||
export const findExecutablePackageJSON = async function (path: string): Promise<PackageJson> { | ||
let pathToSearch: string | ||
try { | ||
// resolve symlinks | ||
pathToSearch = await realpath(path) | ||
} catch { | ||
// bail early if we can't resolve the path | ||
return {} | ||
} | ||
|
||
try { | ||
const result = await readPackageUp({ cwd: pathToSearch, normalize: false }) | ||
if (result === undefined) return {} | ||
const { packageJson } = result | ||
return packageJson | ||
} catch { | ||
// packageJson read failed, we ignore the error and return an empty obj | ||
return {} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility method created in order to gather the target script's package.json in order for us to extract the package name and version for our telemetry purposes.
/** | ||
* Sets global context to be used when initialising our root span | ||
* TODO this will move to a shared package (opentelemetry-utils) to scope the usage of this global property there | ||
*/ | ||
export const setGlobalContext = function (ctx: Context) { | ||
global['NETLIFY_GLOBAL_CONTEXT'] = ctx | ||
} | ||
|
||
/** | ||
* Gets the global context to be used when initialising our root span | ||
* TODO this will move to a shared package (opentelemetry-utils) to scope the usage of this global property there | ||
*/ | ||
export const getGlobalContext = function (): Context { | ||
if (global['NETLIFY_GLOBAL_CONTEXT'] === undefined) { | ||
return context.active() | ||
} | ||
return global['NETLIFY_GLOBAL_CONTEXT'] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods will also be moved to the opentelemetry utility package, as way to scope as much as we can the global
use (all the interactions should go through these methods).
Once we do the initial rollout of this module, we'll work on moving the remainder of the pieces of the methods in build (and this global context interaction methods) to a opentelemetry-util
module, which only depends on @opentelemetry/api
and both modules (build and this) can depend on.
// The shutdown method might return an error if we fail to flush the traces | ||
// We handle it and use our diagnostics logger | ||
await sdk.shutdown() | ||
sdk = undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we won't set sdk
to undefined
if the shutdown fails. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the least problematic scenario here, where we don't end up loosing a reference to an sdk that we were unable to correctly shutdown and could potentially mean we were unable to flush any ongoing events... It is a tricky situation though but I also think it's an edge case that we're not that likely to fall into 🤔
Co-authored-by: Eduardo Bouças <[email protected]>
🎉 Thanks for submitting a pull request! 🎉
Summary
Part of COM-110 - https://linear.app/netlify/issue/COM-110/make-opentelemetry-an-optional-dependency-in-build - which covers this in more detail in terms of the overall rollout plan.
This new module will allow us to hook up our node executions in our prod CD system without having to depend directly on a lot of the
opentelemetry
packages (which are unecessary for local dev purposes). It will also allow us to reuse this whole initialisation logic across any place we might be interested in adding support for our opentelemetry setup.Most of the logic added here is being copied over from:
Once we fully rollout this lib we can look into deleting the methods there ☝️
For us to review and ship your PR efficiently, please perform the following steps:
we can discuss the changes and get feedback from everyone that should be involved. If you`re fixing a typo or
something that`s on fire 🔥 (e.g. incident related), you can skip this step.
your code follows our style guide and passes our tests.
A picture of a cute animal (not mandatory, but encouraged)