Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create opentelemetry sdk package #5388

Merged
merged 18 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,236 changes: 1,234 additions & 2 deletions package-lock.json

Large diffs are not rendered by default.

70 changes: 70 additions & 0 deletions packages/opentelemetry-sdk-setup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Opentelemetry SDK Setup

This package extracts the logic necessary to initialise the Opentelemetry JS SDK using our tracing exporter. This not
only allows us to reuse the initialisation logic across different node process executions but also means **our modules
don't need to depend on any @opentelemetry module other than the @opentelemetry/api**

## How to use it?

This module is designed to be preloaded via [--import](https://nodejs.org/docs/latest-v18.x/api/cli.html#--importmodule)
on any node execution. For example:

```
$> node --import=./lib/bin.js ../build/lib/core/bin.js --debug --tracing.enabled=false --tracing.httpProtocol=https --tracing.host=api.honeycomb.io --tracing.port=443 --tracing.debug=true --tracing.preloadingEnabled=true .
```

On the script we're instrumenting we can just rely on `@opentelemetry/api` to create spans and interact with the SDK:

```ts
import { trace } from '@opentelemetry/api'
const tracer = trace.getTracer('secrets-scanning')

const myInstrumentedFunction = async function() {
await tracer.startActiveSpan(
'scanning-files',
{ attributes: { myAttribute: 'foobar' } },
async (span) => {
doSomeWork()
span.end()
}
}

```

## Sharing and receiving context from outside of the process

Our SDK initialisation is prepared to receive [trace](https://opentelemetry.io/docs/concepts/signals/traces/) and
[baggage](https://opentelemetry.io/docs/concepts/signals/baggage/) context from outside of the process. This allow us
to, for example, hook this node process execution to an ongoing trace which is already taking place or share the baggage
attributes for that execution with the spans created in this process. The list of tracing options show the options
available to the executable and what they mean.

Unfortunately, to our knowledge, the current `@opentelemetry` setup does not allow us to define an initial global
context that the root span can inherit from. As a consequence we had to get creative in order to pass the ingested
attributes to our main script execution, so that the root span can get the newly ingested attributes. We're relying on a
global property which can be accessed via `@netlify/opentelemetry-utils`. If your process receives any outside
attributes you can do the following:

```
$> node --import=./lib/bin.js my-instrumented-script --tracing.httpProtocol=https --tracing.host=api.honeycomb.io --tracing.port=443 --tracing.debug=true --tracing.preloadingEnabled=true --tracing.baggageFilePath='./my-baggage-filepath' --tracing.traceId=<my-trace-id> --tracing.parentSpanId=<the-span-id-of-the-parent>
```

And on the instrumented script:

```ts
import { trace } from '@opentelemetry/api'
import { getGlobalContext } from '@netlify/opentelemetry-utils'
const tracer = trace.getTracer('secrets-scanning')

const myInstrumentedFunction = async function() {
await tracer.startActiveSpan(
'scanning-files',
{ attributes: { myAttribute: 'foobar' } },
getGlobalContext(),
async (span) => {
doSomeWork()
span.end()
}
}

```
5 changes: 5 additions & 0 deletions packages/opentelemetry-sdk-setup/bin.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env node

// This is a workaround for npm issue: https://github.com/npm/cli/issues/2632

import './lib/bin.js'
50 changes: 50 additions & 0 deletions packages/opentelemetry-sdk-setup/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"name": "@netlify/opentelemetry-sdk-setup",
"version": "1.0.0",
"description": "Opentelemetry SDK setup script",
"type": "module",
"bin": {
"otel-sdk-setup": "./bin.js"
},
"files": [
"bin.js",
"lib/**/*"
],
"scripts": {
"build": "tsc",
"build:logos": "vite build",
"test": "vitest run",
"test:dev": "vitest --ui",
"test:ci": "vitest run --reporter=default"
},
"keywords": [],
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/netlify/build.git",
"directory": "packages/opentelemetry-sdk-setup"
},
"bugs": {
"url": "https://github.com/netlify/build/issues"
},
"author": "Netlify Inc.",
"dependencies": {
"@honeycombio/opentelemetry-node": "^0.6.0",
"@opentelemetry/api": "~1.6.0",
"@opentelemetry/core": "^1.17.1",
"@opentelemetry/resources": "^1.18.1",
"@opentelemetry/semantic-conventions": "^1.18.1",
"yargs-parser": "^21.1.1"
},
"devDependencies": {
"@types/node": "^14.18.53",
"@vitest/coverage-c8": "^0.30.1",
"@vitest/ui": "^0.30.1",
"typescript": "^5.0.0",
"vite": "^4.0.4",
"vitest": "^0.30.1"
},
"engines": {
"node": ">=18.0.0"
}
}
58 changes: 58 additions & 0 deletions packages/opentelemetry-sdk-setup/src/bin.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import process from 'node:process'

import { diag } from '@opentelemetry/api'
import argsParser from 'yargs-parser'

import { startTracing, stopTracing, TracingOptions } from './sdk-setup.js'
import { findExecutablePackageJSON, setGlobalContext } from './util.js'

const DEFAULT_OTEL_TRACING_PORT = 4317
const DEFAULT_OTEL_ENDPOINT_PROTOCOL = 'http'

const defaultOptions: TracingOptions = {
preloadingEnabled: false,
httpProtocol: DEFAULT_OTEL_ENDPOINT_PROTOCOL,
host: 'locahost',
port: DEFAULT_OTEL_TRACING_PORT,
sampleRate: 1,
baggageFilePath: '',
apiKey: '-',
parentSpanId: '',
traceId: '',
debug: false,
}

const args = argsParser(process.argv) as unknown as {
/** _ holds args0 and args1 respectively, args1 will include the executable we're trying to run */
_: [string, string]
tracing: TracingOptions
}

// Apply the defaults making sure we're not tripped by falsy values
const options = Object.entries(defaultOptions)
.map(([key, defaultValue]) => {
if (args.tracing !== undefined && args.tracing[key] !== undefined) {
return { [key]: args.tracing[key] }
}
return { [key]: defaultValue }
})
.reduce((acc, prop) => ({ ...acc, ...prop }), {}) as TracingOptions

const executablePath = args._[1]

try {
const pkg = await findExecutablePackageJSON(executablePath)
const rootCtx = await startTracing(options, pkg)
if (rootCtx !== undefined) {
diag.debug('Setting global root context imported from bagage file')
setGlobalContext(rootCtx)
} else {
diag.debug('Root context undefined, skip setting global root context')
}
} catch {
// don't blow up the execution in case something fails
}

//TODO handle `stopTracing` via `process` event emitter for all the other cases such as
//SIGINT and SIGTERM signals and potential uncaught exceptions
process.on('beforeExit', async () => await stopTracing())
104 changes: 104 additions & 0 deletions packages/opentelemetry-sdk-setup/src/sdk-setup.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import { HoneycombSDK } from '@honeycombio/opentelemetry-node'
import { trace, diag, context, propagation, DiagLogLevel, TraceFlags } from '@opentelemetry/api'
import { Resource } from '@opentelemetry/resources'
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
import type { PackageJson } from 'read-pkg-up'

import { getOtelLogger, loadBaggageFromFile } from './util.js'

export type TracingOptions = {
/** This is a temporary property to signal preloading is enabled, can be replaced with `enabled` once we retire build's internal sdk setup */
preloadingEnabled: boolean
Copy link
Contributor Author

@JGAntunes JGAntunes Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow us to rollout this new module without conflicting with the ongoing tracing setup in @netlify/build. We can retire it after the rollout.

httpProtocol: string
host: string
port: number
/** API Key used for a dedicated trace provider */
apiKey: string
/** Sample rate being used for this trace, this allows for consistent probability sampling */
sampleRate: number
/** Properties of the root span and trace id used to stitch context */
traceId?: string
traceFlags?: number
parentSpanId?: string
baggageFilePath?: string
/** Debug mode enabled - logs to stdout */
debug: boolean
/** System log file descriptor */
systemLogFile?: number
}

let sdk: HoneycombSDK | undefined

/** Starts the tracing SDK, if there's already a tracing service this will be a no-op */
export const startTracing = async function (options: TracingOptions, packageJson: PackageJson) {
if (!options.preloadingEnabled) return
if (sdk) return

sdk = new HoneycombSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_VERSION]: packageJson.version,
}),
serviceName: packageJson.name,
protocol: 'grpc',
apiKey: options.apiKey,
endpoint: `${options.httpProtocol}://${options.host}:${options.port}`,
sampleRate: options.sampleRate,
// Turn off auto resource detection so that we fully control the attributes we export
autoDetectResources: false,
})

// Set the diagnostics logger to our system logger. We also need to suppress the override msg
// in case there's a default console logger already registered (it would log a msg to it)
diag.setLogger(getOtelLogger(options.debug, options.systemLogFile), {
logLevel: options.debug ? DiagLogLevel.DEBUG : DiagLogLevel.INFO,
suppressOverrideMessage: true,
})

sdk.start()

// Loads the contents of the passed baggageFilePath into the baggage
const baggageAttributes = await loadBaggageFromFile(options.baggageFilePath)
const baggageCtx = setMultiSpanAttributes(baggageAttributes)

const traceFlags = options.traceFlags !== undefined ? options.traceFlags : TraceFlags.NONE
// Sets the current trace ID and span ID based on the options received
// this is used as a way to propagate trace context from other processes such as Buildbot
if (options.traceId !== undefined && options.parentSpanId !== undefined) {
return trace.setSpanContext(baggageCtx, {
traceId: options.traceId,
spanId: options.parentSpanId,
traceFlags: traceFlags,
isRemote: true,
})
}

return context.active()
}

/** Stops the tracing service if there's one running. This will flush any ongoing events */
export const stopTracing = async function () {
if (!sdk) return
try {
// The shutdown method might return an error if we fail to flush the traces
// We handle it and use our diagnostics logger
await sdk.shutdown()
sdk = undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we won't set sdk to undefined if the shutdown fails. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the least problematic scenario here, where we don't end up loosing a reference to an sdk that we were unable to correctly shutdown and could potentially mean we were unable to flush any ongoing events... It is a tricky situation though but I also think it's an edge case that we're not that likely to fall into πŸ€”

} catch (e) {
diag.error(e)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole start/stop logic is a move of:

  • /** Starts the tracing SDK, if there's already a tracing service this will be a no-op */
    export const startTracing = function (options: TracingOptions, logger: (...args: any[]) => void) {
    if (!options.enabled) return
    if (sdk) return
    sdk = new HoneycombSDK({
    serviceName: ROOT_PACKAGE_JSON.name,
    protocol: 'grpc',
    apiKey: options.apiKey,
    endpoint: `${options.httpProtocol}://${options.host}:${options.port}`,
    sampleRate: options.sampleRate,
    // Turn off auto resource detection so that we fully control the attributes we export
    autoDetectResources: false,
    })
    // Set the diagnostics logger to our system logger. We also need to suppress the override msg
    // in case there's a default console logger already registered (it would log a msg to it)
    diag.setLogger(getOtelLogger(logger), { logLevel: DiagLogLevel.INFO, suppressOverrideMessage: true })
    sdk.start()
    // Loads the contents of the passed baggageFilePath into the baggage
    const baggageCtx = loadBaggageFromFile(options.baggageFilePath)
    // Sets the current trace ID and span ID based on the options received
    // this is used as a way to propagate trace context from Buildbot
    const ctx = trace.setSpanContext(baggageCtx, {
    traceId: options.traceId,
    spanId: options.parentSpanId,
    traceFlags: options.traceFlags,
    isRemote: true,
    })
    return ctx
    }
    /** Stops the tracing service if there's one running. This will flush any ongoing events */
    export const stopTracing = async function () {
    if (!sdk) return
    try {
    // The shutdown method might return an error if we fail to flush the traces
    // We handle it and use our diagnostics logger
    await sdk.shutdown()
    sdk = undefined
    } catch (e) {
    diag.error(e)
    }
    }

Once we fully roll this out in prod we can delete and remove build's sdk setup.


/** Sets attributes to be propagated across child spans under the current active context
* TODO this method will be removed from this package once we move it to a dedicated one to be shared between build,
* this setup and any other node module which might use our open telemetry setup
* */
export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) {
const currentBaggage = propagation.getBaggage(context.active())
// Create a baggage if there's none
let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage
Object.entries(attributes).forEach(([key, value]) => {
baggage = baggage.setEntry(key, { value })
})

return propagation.setBaggage(context.active(), baggage)
}
Comment on lines +91 to +104
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy/paste from:

  • /** Sets attributes to be propagated across child spans under the current active context */
    export const setMultiSpanAttributes = function (attributes: { [key: string]: string }) {
    const currentBaggage = propagation.getBaggage(context.active())
    // Create a baggage if there's none
    let baggage = currentBaggage === undefined ? propagation.createBaggage() : currentBaggage
    Object.entries(attributes).forEach(([key, value]) => {
    baggage = baggage.setEntry(key, { value })
    })
    return propagation.setBaggage(context.active(), baggage)
    }

Which we currently need in order to setup the context passed over to this process (baggage and trace information) to do the tracing and attributes stitching. Once we move the remainder of logic in build's tracing directory to its own utility library we can remove this and use that directly.

Loading
Loading