-
Notifications
You must be signed in to change notification settings - Fork 16
Email Connector: Send Email with Erasure Instructions [#1158] #1246
Email Connector: Send Email with Erasure Instructions [#1158] #1246
Conversation
…est execution. - Add a migration to create a new audit log type. Create an audit log for the email send. - Throw an exception for email-based connectors and catch to override the default execution log. - Add a draft of an email template - Connect sending a "test email" with dummy data. A fidesops admin could configure to check their email config was working.
…ions from which we can resume privacy request execution without having to run from the beginning. - Add more options to CurrentStep Enum - Cache the checkpoint if an email send fails, so we can retry from the same step.
…e no updates to be applied to any of the collections on the dataset.
…ailed privacy request can be resumed outside of the traversal.
Hi @ethyca/docs-authors, can you lend your expertise in helping me with the email copy here? In short, this is an Email Connector that sends an email to a third party telling them how to erase data that we can't erase automatically. There may be multiple collections that they should delete data from, and multiple fields/values they should query on their collection to find the records that should be masked, with multiple fields that need to be deleted. It is possible that they need to look up data on one of their collections to find data in another one of their collections. It is possible that some of the collections have no data to be erased but they are included because they need to locate a record on that collection in order to find data on another collection. However, the scenario where there are multiple collections is probably less likely. We imagine the fidesops user won't have good knowledge of the third party's data structure, and will create one collection with the fields they should mask. In my email example below, the customer has two collections, |
@ethyca/docs-authors I've also added a first draft of docs for the email connector here - |
…onnector_email_send # Conflicts: # CHANGELOG.md
|
||
class CollectionActionRequired(BaseSchema): | ||
"""Describes actions needed on a given collection. | ||
class CheckpointActionRequired(BaseSchema): | ||
"""Describes actions needed on a particular checkpoint. | ||
|
||
Examples are a paused collection that needs manual input, a failed collection that | ||
needs to be restarted, or a collection where instructions need to be emailed to a third | ||
party to complete the request. | ||
""" | ||
|
||
step: CurrentStep | ||
collection: CollectionAddress | ||
collection: Optional[CollectionAddress] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just using this existing structure that we already have to cache details about a request that paused or failed in the traversal to extend it to describe when a request fails outside of the traversal. I'm removing the mandate that a collection exists, because this is only relevant inside the traversal.
I've renamed several related pieces.
if not paused_collection: | ||
raise HTTPException( | ||
status_code=HTTP_422_UNPROCESSABLE_ENTITY, | ||
detail="Cannot save manual data on paused collection. No paused collection saved'.", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just adding a new check here now that I've generally removed the requirement that a collection is saved on a CheckpointActionRequired
but this is one place where it's needed.
(To be clear, this shouldn't ever be hit though)
def can_run_checkpoint( | ||
request_checkpoint: CurrentStep, from_checkpoint: Optional[CurrentStep] = None | ||
) -> bool: | ||
"""Determine whether we should run a specific checkpoint in privacy request execution | ||
|
||
If there's no from_checkpoint specified we should always run the current checkpoint. | ||
""" | ||
if not from_checkpoint: | ||
return True | ||
return EXECUTION_CHECKPOINTS.index( | ||
request_checkpoint | ||
) >= EXECUTION_CHECKPOINTS.index(from_checkpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already had this ability in privacy request execution to resume from a specific point in the cause of a pause or failure, but this formalizes it a bit, and makes it easier to extend now that we are starting to add multiple locations from which a privacy request can be resumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this abstraction. Makes it easier to understand as we scale the number of execution checkpoints 💯
config = EmailSchema(**self.configuration.secrets or {}) | ||
logger.info("Starting test connection to %s", self.configuration.key) | ||
|
||
db = Session.object_session(self.configuration) | ||
|
||
try: | ||
dispatch_email( | ||
db=db, | ||
action_type=EmailActionType.EMAIL_ERASURE_REQUEST_FULFILLMENT, | ||
to_email=config.test_email, | ||
email_body_params={ | ||
"test_collection": CheckpointActionRequired( | ||
step=CurrentStep.erasure, | ||
collection=CollectionAddress("test_dataset", "test_collection"), | ||
action_needed=[ | ||
ManualAction( | ||
locators={"id": ["example_id"]}, | ||
get=None, | ||
update={ | ||
"test_field": "null_rewrite", | ||
}, | ||
) | ||
], | ||
) | ||
}, | ||
) | ||
except EmailDispatchException as exc: | ||
logger.info("Email connector test failed with exception %s", Pii(exc)) | ||
return ConnectionTestStatus.failed | ||
return ConnectionTestStatus.succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a test_email is configured (you'd set up a test_email that you had access to), you can use this to send a test email to yourself. The data inside is just dummy data. It really just confirms your email config is working.
# Send erasure requests via email to third parties where applicable | ||
if can_run_checkpoint( | ||
request_checkpoint=CurrentStep.erasure_email_post_send, | ||
from_checkpoint=resume_step, | ||
): | ||
try: | ||
email_connector_erasure_send( | ||
db=session, privacy_request=privacy_request | ||
) | ||
except EmailDispatchException as exc: | ||
privacy_request.cache_failed_checkpoint_details( | ||
step=CurrentStep.erasure_email_post_send, collection=None | ||
) | ||
privacy_request.error_processing(db=session) | ||
await fideslog_graph_failure( | ||
failed_graph_analytics_event(privacy_request, exc) | ||
) | ||
# If dev mode, log traceback | ||
_log_exception(exc, config.dev_mode) | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the primary change of this PR: if applicable, we combine cached data from all the collections on an email-based dataset and send a single email for each dataset. There's no batching currently. If there is no data to mask / the connection was read-only, etc. no email is sent.
except PrivacyRequestErasureEmailSendRequired as exc: | ||
self.log_end(action_type, ex=None, success_override_msg=exc) | ||
self.resources.cache_erasure( | ||
f"{self.traversal_node.address.value}", 0 | ||
) # Cache that the erasure was performed in case we need to restart | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following a pattern we use elsewhere - I raise an Exception in the EmailConnector.mask_data
to update the execution log as it is a concern of the GraphTask and not the individual EmailConnectors. In this case, I want to add a more specific message to the erasure ExecutionLog as the action is not technically completed until we send the email at the end.
@@ -102,6 +111,7 @@ def run_webhooks_and_report_status( | |||
webhook.key, | |||
) | |||
privacy_request.error_processing(db) | |||
privacy_request.cache_failed_checkpoint_details(current_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using our new request execution checkpoints this lets us resume from the pre or post execution webhooks step later if they fail. We used to be able to resume from pause but had no handling to resume from failure.
@pattisdr I ran through and combined the two email guides and updated them to match the formatting - let me know if this works for you? |
- Subject Identity Verification - for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide. | ||
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [Email Connectors](email_connectors.md) for more information. | ||
- Subject Identity Verification - sends a verification code to the user's email address prior to for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide. | ||
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [creating Email Connectors](#create-an-email-connector) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conceptualshark I don't think this goes anywhere
|
||
Supported modes of use: | ||
|
||
- Subject Identity Verification - for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide. | ||
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [Email Connectors](email_connectors.md) for more information. | ||
- Subject Identity Verification - sends a verification code to the user's email address prior to for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conceptualshark Sentence doesn't seem complete... "prior to executing a privacy request" or similar?
| Field | Description | | ||
|----|----| | ||
| `key` | A unique key used to manage your email connector. This is auto-generated from `name` if left blank. Accepted values are alphanumeric, `_`, and `.`. | | ||
| `name` | A unique user-friendly name for your email connector. | | ||
| `connection_type` | Must be `email` to create a new email connector. | | ||
| `access` | Email connectors must be given `write` access in order to send an email. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice formatting here will try to remember this for future docs drafts @conceptualshark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching these - ran through a bit fast 😨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @pattisdr ! I've manually tested that:
- The test email was successfully sent upon adding secrets to the email connector:
Nothing blocking, just minor syntax issue in docs.
Note for future - this is another feature that relies on an email config being set up, and it's currently possible for email config not found exceptions to happen at run time during a privacy request. To prevent this, we can have Admin UI check for a valid email config at any point in the setup flow
db = Session.object_session(self.configuration) | ||
|
||
try: | ||
dispatch_email( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just keeping an eye on #1173 for possible conflicts depending on which is merged first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for the callout
@@ -380,14 +389,14 @@ def cache_email_connector_template_contents( | |||
|
|||
def get_email_connector_template_contents_by_dataset( | |||
self, step: CurrentStep, dataset: str | |||
) -> Dict[str, Optional[CollectionActionRequired]]: | |||
) -> Dict[str, Optional[CheckpointActionRequired]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a new class for Dict[str, Optional[CheckpointActionRequired]]
? I'd prefer to be explicit on what the str
represents here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, added a new type EmailRequestFulfillmentBodyParams
that is created from template data we pull out of the cache by dataset
def can_run_checkpoint( | ||
request_checkpoint: CurrentStep, from_checkpoint: Optional[CurrentStep] = None | ||
) -> bool: | ||
"""Determine whether we should run a specific checkpoint in privacy request execution | ||
|
||
If there's no from_checkpoint specified we should always run the current checkpoint. | ||
""" | ||
if not from_checkpoint: | ||
return True | ||
return EXECUTION_CHECKPOINTS.index( | ||
request_checkpoint | ||
) >= EXECUTION_CHECKPOINTS.index(from_checkpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this abstraction. Makes it easier to understand as we scale the number of execution checkpoints 💯
```json title="<code>PUT api/v1/connection/{email_connection_config_key}/secret</code>" | ||
{ | ||
"test_email": "[email protected]", | ||
"to_email": "[email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing quote "
…he cached email details are extracted by dataset.
…onnector_email_send # Conflicts: # tests/ops/integration_tests/test_integration_email.py
Comments addressed; back to you @eastandwestwind! |
thanks for addressing the comments, looks great @pattisdr ! |
@pattisdr I think the basic collection/field sections are probably fine as-is - could we add something more comprehensive to explain its purpose, and could we add in the org name to the template?
And then the list you already have generating, I think. |
great improvements, thank you @conceptualshark, I'll add this in a follow-up ticket |
* Send an email for each email-based dataset at the end of privacy request execution. - Add a migration to create a new audit log type. Create an audit log for the email send. - Throw an exception for email-based connectors and catch to override the default execution log. - Add a draft of an email template - Connect sending a "test email" with dummy data. A fidesops admin could configure to check their email config was working. * Add more "checkpoints" to privacy request execution - these are locations from which we can resume privacy request execution without having to run from the beginning. - Add more options to CurrentStep Enum - Cache the checkpoint if an email send fails, so we can retry from the same step. * Don't send an email if the connection config is read only or there are no updates to be applied to any of the collections on the dataset. * Don't assume there's a collection when building "resume" details. A failed privacy request can be resumed outside of the traversal. * Add a first draft of docs for setting up an email connector. * Moves the email connector send method to the email connector file. * Update mock location. * Bump downrev. * update email connector guides * correct link, broken sentence * Create a new EmailRequestFulfillmentBodyParams type to be used once the cached email details are extracted by dataset. * Fix missed test. Co-authored-by: Cole <[email protected]>
❗ Contains migration
❗
Don't merge until #1240 is resolvedPurpose
Adds the final PR for the EmailConnector. An EmailConnector sends an email at the end of privacy request execution to third parties with instructions on how to mask data that we can't access automatically. Also introduces the concept of "checkpoints" for privacy request execution to give us more locations from which to resume a paused/failed privacy request.
This allows us to retry just the email send and everything else downstream if the email send fails, and makes it easier to add other checkpoints in the future.
Changes
email_connector_erasure_send
after the access and erasure steps of privacy request execution. It collects data that was cached by the traversal for any collections on email-related datasets, and fires off a single email for that dataset.EmailConnector.mask_data
raise a specific Exception just to override the ExecutionLog message for the collection, to indicate that the email was prepared. The email itself is not sent until later. ExecutionLog creation is the responsibility of the GraphTask, not an individual email connector. We use similar exceptions to "pause" a privacy request.test_email
on the EmailConnector secrets, so the fidesops admin can send a test email to themselves with test data in it, to confirm their email config is setup properly.Checklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesTicket
Fixes #1158