Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create opentelemetry sdk package #5388

Merged
merged 18 commits into from
Nov 21, 2023

Conversation

JGAntunes
Copy link
Contributor

@JGAntunes JGAntunes commented Nov 14, 2023

🎉 Thanks for submitting a pull request! 🎉

Summary

Part of COM-110 - https://linear.app/netlify/issue/COM-110/make-opentelemetry-an-optional-dependency-in-build - which covers this in more detail in terms of the overall rollout plan.

This new module will allow us to hook up our node executions in our prod CD system without having to depend directly on a lot of the opentelemetry packages (which are unecessary for local dev purposes). It will also allow us to reuse this whole initialisation logic across any place we might be interested in adding support for our opentelemetry setup.

Most of the logic added here is being copied over from:


For us to review and ship your PR efficiently, please perform the following steps:

  • Open a bug/issue before writing your code 🧑‍💻. This ensures
    we can discuss the changes and get feedback from everyone that should be involved. If you`re fixing a typo or
    something that`s on fire 🔥 (e.g. incident related), you can skip this step.
  • Read the contribution guidelines 📖. This ensures
    your code follows our style guide and passes our tests.
  • Update or add tests (if any source code was changed or added) 🧪
  • Update or add documentation (if features were changed or added) 📝
  • Make sure the status checks below are successful ✅

A picture of a cute animal (not mandatory, but encouraged)

@JGAntunes JGAntunes self-assigned this Nov 14, 2023
Copy link
Contributor

This pull request adds or modifies JavaScript (.js, .cjs, .mjs) files.
Consider converting them to TypeScript.


export type TracingOptions = {
/** This is a temporary property to signal preloading is enabled, can be replaced with `enabled` once we retire build's internal sdk setup */
preloadingEnabled: boolean
Copy link
Contributor Author

@JGAntunes JGAntunes Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow us to rollout this new module without conflicting with the ongoing tracing setup in @netlify/build. We can retire it after the rollout.

Comment on lines +91 to +104
/** Sets attributes to be propagated across child spans under the current active context
* TODO this method will be removed from this package once we move it to a dedicated one to be shared between build,
* this setup and any other node module which might use our open telemetry setup
* */
export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) {
const currentBaggage = propagation.getBaggage(context.active())
// Create a baggage if there's none
let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage
Object.entries(attributes).forEach(([key, value]) => {
baggage = baggage.setEntry(key, { value })
})

return propagation.setBaggage(context.active(), baggage)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy/paste from:

  • /** Sets attributes to be propagated across child spans under the current active context */
    export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) {
    const currentBaggage = propagation.getBaggage(context.active())
    // Create a baggage if there's none
    let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage
    Object.entries(attributes).forEach(([key, value]) => {
    baggage = baggage.setEntry(key, { value })
    })
    return propagation.setBaggage(context.active(), baggage)
    }

Which we currently need in order to setup the context passed over to this process (baggage and trace information) to do the tracing and attributes stitching. Once we move the remainder of logic in build's tracing directory to its own utility library we can remove this and use that directly.

Comment on lines 32 to 89
/** Starts the tracing SDK, if there's already a tracing service this will be a no-op */
export const startTracing = async function (options: TracingOptions, packageJson: PackageJson) {
if (!options.preloadingEnabled) return
if (sdk) return

sdk = new HoneycombSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_VERSION]: packageJson.version,
}),
serviceName: packageJson.name,
protocol: 'grpc',
apiKey: options.apiKey,
endpoint: `${options.httpProtocol}://${options.host}:${options.port}`,
sampleRate: options.sampleRate,
// Turn off auto resource detection so that we fully control the attributes we export
autoDetectResources: false,
})

// Set the diagnostics logger to our system logger. We also need to suppress the override msg
// in case there's a default console logger already registered (it would log a msg to it)
diag.setLogger(getOtelLogger(options.debug, options.systemLogFile), {
logLevel: options.debug ? DiagLogLevel.DEBUG : DiagLogLevel.INFO,
suppressOverrideMessage: true,
})

sdk.start()

// Loads the contents of the passed baggageFilePath into the baggage
const baggageAttributes = await loadBaggageFromFile(options.baggageFilePath)
const baggageCtx = setMultiSpanAttributes(baggageAttributes)

const traceFlags = options.traceFlags !== undefined ? options.traceFlags : TraceFlags.NONE
// Sets the current trace ID and span ID based on the options received
// this is used as a way to propagate trace context from other processes such as Buildbot
if (options.traceId !== undefined && options.parentSpanId !== undefined) {
return trace.setSpanContext(baggageCtx, {
traceId: options.traceId,
spanId: options.parentSpanId,
traceFlags: traceFlags,
isRemote: true,
})
}

return context.active()
}

/** Stops the tracing service if there's one running. This will flush any ongoing events */
export const stopTracing = async function () {
if (!sdk) return
try {
// The shutdown method might return an error if we fail to flush the traces
// We handle it and use our diagnostics logger
await sdk.shutdown()
sdk = undefined
} catch (e) {
diag.error(e)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole start/stop logic is a move of:

  • /** Starts the tracing SDK, if there's already a tracing service this will be a no-op */
    export const startTracing = function (options: TracingOptions, logger: (...args: any[]) => void) {
    if (!options.enabled) return
    if (sdk) return
    sdk = new HoneycombSDK({
    serviceName: ROOT_PACKAGE_JSON.name,
    protocol: 'grpc',
    apiKey: options.apiKey,
    endpoint: `${options.httpProtocol}://${options.host}:${options.port}`,
    sampleRate: options.sampleRate,
    // Turn off auto resource detection so that we fully control the attributes we export
    autoDetectResources: false,
    })
    // Set the diagnostics logger to our system logger. We also need to suppress the override msg
    // in case there's a default console logger already registered (it would log a msg to it)
    diag.setLogger(getOtelLogger(logger), { logLevel: DiagLogLevel.INFO, suppressOverrideMessage: true })
    sdk.start()
    // Loads the contents of the passed baggageFilePath into the baggage
    const baggageCtx = loadBaggageFromFile(options.baggageFilePath)
    // Sets the current trace ID and span ID based on the options received
    // this is used as a way to propagate trace context from Buildbot
    const ctx = trace.setSpanContext(baggageCtx, {
    traceId: options.traceId,
    spanId: options.parentSpanId,
    traceFlags: options.traceFlags,
    isRemote: true,
    })
    return ctx
    }
    /** Stops the tracing service if there's one running. This will flush any ongoing events */
    export const stopTracing = async function () {
    if (!sdk) return
    try {
    // The shutdown method might return an error if we fail to flush the traces
    // We handle it and use our diagnostics logger
    await sdk.shutdown()
    sdk = undefined
    } catch (e) {
    diag.error(e)
    }
    }

Once we fully roll this out in prod we can delete and remove build's sdk setup.

Comment on lines 65 to 79
//** Loads the baggage attributes from a baggabe file which follows W3C Baggage specification */
export const loadBaggageFromFile = async function (baggageFilePath?: string) {
if (baggageFilePath === undefined || baggageFilePath.length === 0) {
diag.warn('No baggage file path provided, no context loaded')
return {}
}
let baggageString: string
try {
baggageString = await readFile(baggageFilePath, 'utf-8')
} catch (error) {
diag.error(error)
return {}
}
return parseKeyPairsIntoRecord(baggageString)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, moved from build

  • //** Loads the baggage attributes from a baggabe file which follows W3C Baggage specification */
    export const loadBaggageFromFile = function (baggageFilePath: string) {
    if (baggageFilePath.length === 0) {
    diag.warn('Empty baggage file path provided, no context loaded')
    return context.active()
    }
    let baggageString: string
    try {
    baggageString = readFileSync(baggageFilePath, 'utf-8')
    } catch (error) {
    diag.error(error)
    return context.active()
    }
    const parsedBaggage = parseKeyPairsIntoRecord(baggageString)
    return setMultiSpanAttributes(parsedBaggage)
    }

Comment on lines +81 to +103
/**
* Given a path to a node executable (potentially a symlink) get the module packageJson
*/
export const findExecutablePackageJSON = async function (path: string): Promise<PackageJson> {
let pathToSearch: string
try {
// resolve symlinks
pathToSearch = await realpath(path)
} catch {
// bail early if we can't resolve the path
return {}
}

try {
const result = await readPackageUp({ cwd: pathToSearch, normalize: false })
if (result === undefined) return {}
const { packageJson } = result
return packageJson
} catch {
// packageJson read failed, we ignore the error and return an empty obj
return {}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a utility method created in order to gather the target script's package.json in order for us to extract the package name and version for our telemetry purposes.

Comment on lines +105 to +122
/**
* Sets global context to be used when initialising our root span
* TODO this will move to a shared package (opentelemetry-utils) to scope the usage of this global property there
*/
export const setGlobalContext = function (ctx: Context) {
global['NETLIFY_GLOBAL_CONTEXT'] = ctx
}

/**
* Gets the global context to be used when initialising our root span
* TODO this will move to a shared package (opentelemetry-utils) to scope the usage of this global property there
*/
export const getGlobalContext = function (): Context {
if (global['NETLIFY_GLOBAL_CONTEXT'] === undefined) {
return context.active()
}
return global['NETLIFY_GLOBAL_CONTEXT']
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods will also be moved to the opentelemetry utility package, as way to scope as much as we can the global use (all the interactions should go through these methods).

Once we do the initial rollout of this module, we'll work on moving the remainder of the pieces of the methods in build (and this global context interaction methods) to a opentelemetry-util module, which only depends on @opentelemetry/api and both modules (build and this) can depend on.

@JGAntunes JGAntunes marked this pull request as ready for review November 17, 2023 20:42
@JGAntunes JGAntunes requested review from a team as code owners November 17, 2023 20:42
eduardoboucas
eduardoboucas previously approved these changes Nov 20, 2023
// The shutdown method might return an error if we fail to flush the traces
// We handle it and use our diagnostics logger
await sdk.shutdown()
sdk = undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we won't set sdk to undefined if the shutdown fails. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the least problematic scenario here, where we don't end up loosing a reference to an sdk that we were unable to correctly shutdown and could potentially mean we were unable to flush any ongoing events... It is a tricky situation though but I also think it's an edge case that we're not that likely to fall into 🤔

packages/opentelemetry-sdk-setup/src/util.ts Outdated Show resolved Hide resolved
packages/opentelemetry-sdk-setup/src/util.ts Outdated Show resolved Hide resolved
@JGAntunes JGAntunes changed the title chore: create opentelemetry sdk package feat: create opentelemetry sdk package Nov 20, 2023
@kodiakhq kodiakhq bot merged commit 9c8c452 into main Nov 21, 2023
34 checks passed
@kodiakhq kodiakhq bot deleted the chore/create-opentelemetry-sdk-package branch November 21, 2023 10:29
@JGAntunes JGAntunes mentioned this pull request Nov 21, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants