Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling context improvements #3847

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 103 additions & 96 deletions MIGRATION_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,102 +20,109 @@ Looking to upgrade from Sentry SDK 2.x to 3.x? Here's a comprehensive list of wh
- Redis integration: In Redis pipeline spans there is no `span["data"]["redis.commands"]` that contains a dict `{"count": 3, "first_ten": ["cmd1", "cmd2", ...]}` but instead `span["data"]["redis.commands.count"]` (containing `3`) and `span["data"]["redis.commands.first_ten"]` (containing `["cmd1", "cmd2", ...]`).
- clickhouse-driver integration: The query is now available under the `db.query.text` span attribute (only if `send_default_pii` is `True`).
- `sentry_sdk.init` now returns `None` instead of a context manager.
- The `sampling_context` argument of `traces_sampler` now additionally contains all span attributes known at span start.
- If you're using the Celery integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `celery_job` dictionary anymore. Instead, the individual keys are now available as:

| Dictionary keys | Sampling context key |
| ---------------------- | -------------------- |
| `celery_job["args"]` | `celery.job.args` |
| `celery_job["kwargs"]` | `celery.job.kwargs` |
| `celery_job["task"]` | `celery.job.task` |

Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.

- If you're using the AIOHTTP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ---------------- | ------------------------------- |
| `path` | `url.path` |
| `query_string` | `url.query` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `scheme` | `url.scheme` |
| full URL | `url.full` |

- If you're using the Tornado integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ---------------- | --------------------------------------------------- |
| `path` | `url.path` |
| `query` | `url.query` |
| `protocol` | `url.scheme` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `version` | `network.protocol.name`, `network.protocol.version` |
| full URL | `url.full` |

- If you're using the generic WSGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:

| Env property | Sampling context key(s) |
| ----------------- | ------------------------------------------------- |
| `PATH_INFO` | `url.path` |
| `QUERY_STRING` | `url.query` |
| `REQUEST_METHOD` | `http.request.method` |
| `SERVER_NAME` | `server.address` |
| `SERVER_PORT` | `server.port` |
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
| `wsgi.url_scheme` | `url.scheme` |
| full URL | `url.full` |

- If you're using the generic ASGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:

| Scope property | Sampling context key(s) |
| -------------- | ------------------------------- |
| `type` | `network.protocol.name` |
| `scheme` | `url.scheme` |
| `path` | `url.path` |
| `query` | `url.query` |
| `http_version` | `network.protocol.version` |
| `method` | `http.request.method` |
| `server` | `server.address`, `server.port` |
| `client` | `client.address`, `client.port` |
| full URL | `url.full` |

- If you're using the RQ integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:

| RQ property | Sampling context key(s) |
| --------------- | ---------------------------- |
| `rq_job.args` | `rq.job.args` |
| `rq_job.kwargs` | `rq.job.kwargs` |
| `rq_job.func` | `rq.job.func` |
| `queue.name` | `messaging.destination.name` |
| `rq_job.id` | `messaging.message.id` |

Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.

- If you're using the AWS Lambda integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:

| AWS property | Sampling context key(s) |
| ------------------------------------------- | ----------------------- |
| `aws_event["httpMethod"]` | `http.request.method` |
| `aws_event["queryStringParameters"]` | `url.query` |
| `aws_event["path"]` | `url.path` |
| full URL | `url.full` |
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
| `aws_event["headers"]["Host"]` | `server.address` |
| `aws_context["function_name"]` | `faas.name` |

- If you're using the GCP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:

| Old sampling context key | New sampling context key |
| --------------------------------- | -------------------------- |
| `gcp_env["function_name"]` | `faas.name` |
| `gcp_env["function_region"]` | `faas.region` |
| `gcp_env["function_project"]` | `gcp.function.project` |
| `gcp_env["function_identity"]` | `gcp.function.identity` |
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
| `gcp_event.method` | `http.request.method` |
| `gcp_event.query_string` | `url.query` |
- The `sampling_context` argument of `traces_sampler` and `profiles_sampler` now additionally contains all span attributes known at span start.
- The integration-specific content of the `sampling_context` argument of `traces_sampler` and `profiles_sampler` now looks different.
- The Celery integration doesn't add the `celery_job` dictionary anymore. Instead, the individual keys are now available as:

| Dictionary keys | Sampling context key | Example |
| ---------------------- | --------------------------- | ------------------------------ |
| `celery_job["args"]` | `celery.job.args.{index}` | `celery.job.args.0` |
| `celery_job["kwargs"]` | `celery.job.kwargs.{kwarg}` | `celery.job.kwargs.kwarg_name` |
| `celery_job["task"]` | `celery.job.task` | |

Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.

- The AIOHTTP integration doesn't add the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ----------------- | ------------------------------- |
| `path` | `url.path` |
| `query_string` | `url.query` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `scheme` | `url.scheme` |
| full URL | `url.full` |
| `request.headers` | `http.request.header.{header}` |

- The Tornado integration doesn't add the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ----------------- | --------------------------------------------------- |
| `path` | `url.path` |
| `query` | `url.query` |
| `protocol` | `url.scheme` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `version` | `network.protocol.name`, `network.protocol.version` |
| full URL | `url.full` |
| `request.headers` | `http.request.header.{header}` |

- The WSGI integration doesn't add the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:

| Env property | Sampling context key(s) |
| ----------------- | ------------------------------------------------- |
| `PATH_INFO` | `url.path` |
| `QUERY_STRING` | `url.query` |
| `REQUEST_METHOD` | `http.request.method` |
| `SERVER_NAME` | `server.address` |
| `SERVER_PORT` | `server.port` |
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
| `wsgi.url_scheme` | `url.scheme` |
| full URL | `url.full` |
| `HTTP_*` | `http.request.header.{header}` |

- The ASGI integration doesn't add the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:

| Scope property | Sampling context key(s) |
| -------------- | ------------------------------- |
| `type` | `network.protocol.name` |
| `scheme` | `url.scheme` |
| `path` | `url.path` |
| `query` | `url.query` |
| `http_version` | `network.protocol.version` |
| `method` | `http.request.method` |
| `server` | `server.address`, `server.port` |
| `client` | `client.address`, `client.port` |
| full URL | `url.full` |
| `headers` | `http.request.header.{header}` |

-The RQ integration doesn't add the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:

| RQ property | Sampling context key | Example |
| --------------- | ---------------------------- | ---------------------- |
| `rq_job.args` | `rq.job.args.{index}` | `rq.job.args.0` |
| `rq_job.kwargs` | `rq.job.kwargs.{kwarg}` | `rq.job.args.my_kwarg` |
| `rq_job.func` | `rq.job.func` | |
| `queue.name` | `messaging.destination.name` | |
| `rq_job.id` | `messaging.message.id` | |

Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.

- The AWS Lambda integration doesn't add the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:

| AWS property | Sampling context key(s) |
| ------------------------------------------- | ------------------------------- |
| `aws_event["httpMethod"]` | `http.request.method` |
| `aws_event["queryStringParameters"]` | `url.query` |
| `aws_event["path"]` | `url.path` |
| full URL | `url.full` |
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
| `aws_event["headers"]["Host"]` | `server.address` |
| `aws_context["function_name"]` | `faas.name` |
| `aws_event["headers"]` | `http.request.headers.{header}` |

- The GCP integration doesn't add the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:

| Old sampling context key | New sampling context key |
| --------------------------------- | ------------------------------ |
| `gcp_env["function_name"]` | `faas.name` |
| `gcp_env["function_region"]` | `faas.region` |
| `gcp_env["function_project"]` | `gcp.function.project` |
| `gcp_env["function_identity"]` | `gcp.function.identity` |
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
| `gcp_event.method` | `http.request.method` |
| `gcp_event.query_string` | `url.query` |
| `gcp_event.headers` | `http.request.header.{header}` |


### Removed
Expand Down
16 changes: 15 additions & 1 deletion sentry_sdk/integrations/_wsgi_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import sentry_sdk
from sentry_sdk.scope import should_send_default_pii
from sentry_sdk.utils import AnnotatedValue, logger
from sentry_sdk.utils import AnnotatedValue, logger, SENSITIVE_DATA_SUBSTITUTE

try:
from django.http.request import RawPostDataException
Expand Down Expand Up @@ -221,6 +221,20 @@ def _filter_headers(headers):
}


def _request_headers_to_span_attributes(headers):
# type: (dict[str, str]) -> dict[str, str]
attributes = {}

headers = _filter_headers(headers)

for header, value in headers.items():
if isinstance(value, AnnotatedValue):
value = SENSITIVE_DATA_SUBSTITUTE
attributes[f"http.request.header.{header.lower()}"] = value

return attributes


def _in_http_status_code_range(code, code_ranges):
# type: (object, list[HttpStatusCodeRange]) -> bool
for target in code_ranges:
Expand Down
7 changes: 4 additions & 3 deletions sentry_sdk/integrations/aiohttp.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from sentry_sdk.sessions import track_session
from sentry_sdk.integrations._wsgi_common import (
_filter_headers,
_request_headers_to_span_attributes,
request_body_within_bounds,
)
from sentry_sdk.tracing import (
Expand Down Expand Up @@ -389,11 +390,11 @@ def _prepopulate_attributes(request):
except ValueError:
attributes["server.address"] = request.host

try:
with capture_internal_exceptions():
url = f"{request.scheme}://{request.host}{request.path}" # noqa: E231
if request.query_string:
attributes["url.full"] = f"{url}?{request.query_string}"
except Exception:
pass

attributes.update(_request_headers_to_span_attributes(dict(request.headers)))

return attributes
11 changes: 7 additions & 4 deletions sentry_sdk/integrations/asgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
)
from sentry_sdk.integrations._wsgi_common import (
DEFAULT_HTTP_METHODS_TO_CAPTURE,
_request_headers_to_span_attributes,
)
from sentry_sdk.sessions import track_session
from sentry_sdk.tracing import (
Expand All @@ -32,6 +33,7 @@
)
from sentry_sdk.utils import (
ContextVar,
capture_internal_exceptions,
event_from_exception,
HAS_REAL_CONTEXTVARS,
CONTEXTVARS_ERROR_MESSAGE,
Expand Down Expand Up @@ -348,19 +350,20 @@ def _prepopulate_attributes(scope):
try:
host, port = scope[attr]
attributes[f"{attr}.address"] = host
attributes[f"{attr}.port"] = port
if port is not None:
attributes[f"{attr}.port"] = port
except Exception:
pass

try:
with capture_internal_exceptions():
full_url = _get_url(scope)
query = _get_query(scope)
if query:
attributes["url.query"] = query
full_url = f"{full_url}?{query}"

attributes["url.full"] = full_url
except Exception:
pass

attributes.update(_request_headers_to_span_attributes(_get_headers(scope)))

return attributes
15 changes: 12 additions & 3 deletions sentry_sdk/integrations/aws_lambda.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
reraise,
)
from sentry_sdk.integrations import Integration
from sentry_sdk.integrations._wsgi_common import _filter_headers
from sentry_sdk.integrations._wsgi_common import (
_filter_headers,
_request_headers_to_span_attributes,
)

from typing import TYPE_CHECKING

Expand Down Expand Up @@ -162,7 +165,7 @@ def sentry_handler(aws_event, aws_context, *args, **kwargs):
name=aws_context.function_name,
source=TRANSACTION_SOURCE_COMPONENT,
origin=AwsLambdaIntegration.origin,
attributes=_prepopulate_attributes(aws_event, aws_context),
attributes=_prepopulate_attributes(request_data, aws_context),
):
try:
return handler(aws_event, aws_context, *args, **kwargs)
Expand Down Expand Up @@ -468,6 +471,7 @@ def _event_from_error_json(error_json):


def _prepopulate_attributes(aws_event, aws_context):
# type: (Any, Any) -> dict[str, Any]
attributes = {
"cloud.provider": "aws",
}
Expand All @@ -486,10 +490,15 @@ def _prepopulate_attributes(aws_event, aws_context):
url += f"?{aws_event['queryStringParameters']}"
attributes["url.full"] = url

headers = aws_event.get("headers") or {}
headers = {}
if aws_event.get("headers") and isinstance(aws_event["headers"], dict):
headers = aws_event["headers"]

if headers.get("X-Forwarded-Proto"):
attributes["network.protocol.name"] = headers["X-Forwarded-Proto"]
if headers.get("Host"):
attributes["server.address"] = headers["Host"]

attributes.update(_request_headers_to_span_attributes(headers))

return attributes
13 changes: 10 additions & 3 deletions sentry_sdk/integrations/celery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
ensure_integration_enabled,
event_from_exception,
reraise,
_serialize_span_attribute,
)

from typing import TYPE_CHECKING
Expand Down Expand Up @@ -514,9 +513,17 @@ def sentry_publish(self, *args, **kwargs):


def _prepopulate_attributes(task, args, kwargs):
# type: (Any, *Any, **Any) -> dict[str, str]
attributes = {
"celery.job.task": task.name,
"celery.job.args": _serialize_span_attribute(args),
"celery.job.kwargs": _serialize_span_attribute(kwargs),
}

for i, arg in enumerate(args):
with capture_internal_exceptions():
attributes[f"celery.job.args.{i}"] = str(arg)

for kwarg, value in kwargs.items():
with capture_internal_exceptions():
attributes[f"celery.job.kwargs.{kwarg}"] = str(value)

return attributes
Loading
Loading