Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Office 365] - Improve ECS utilization #4319

Closed
defendable-forfot opened this issue Sep 26, 2022 · 10 comments
Closed

[Office 365] - Improve ECS utilization #4319

defendable-forfot opened this issue Sep 26, 2022 · 10 comments
Assignees
Labels

Comments

@defendable-forfot
Copy link

defendable-forfot commented Sep 26, 2022

We are ingesting O365 data into our Elasticsearch for search, detection in Elastic Security and visualiation through Kibana. However, we have noticed a few areas for improvement within the module. What is most interesting with this module is how data is ingested. The most interesting data related to the events seem to be all placed within the o365.audit.Data field. This makes search and extraction of data from the log source difficult. Ideally the parsing should be done directly in the Filebeat module. We believe there is data within the field that can be used to populate other, more relevant, ECS fields.

Note: we are running filebeat version 8.1.3, but have noticed that none of the newer releases solves our issues.

  • o365.audit.UserKey
    • ECS fields: user.domain | user.name | user.id | user.email | related.user
    • Suggestion:
      • In cases where the kibana.alert.rule.name is “O365 Exchange Suspicious Mailbox Right Delegation” the user.name field is parsed incorrectly. The data will look as follows “domain\user”. It should not include the domain first, this should instead be placed in user.domain and the user.name should only include the actual username. Additionally, the user.email has the same issue where the domain is included before the actual email address. The domain should not be included in the user.email field.
      • In cases where the UserKey field is “SecurityComplianceAlerts” it is not an actual user, but the user.id field is also set to “SecurityComplianceAlerts”. The actual username related to the event can in these cases be extracted from the o365.audit.Data field.
  • o365.audit.UserId
    • ECS fields: user.domain | user.name | user.id | user.email | related.user
    • Suggestion: For record types that are not related to Microsoft Exchange, Azure and SecurityComplianceCenterCommand the UserId field is not an actual user, but a variation of SecurityCompliance*. Data related to the actual user can in these cases be extracted from the o365.audit.Data field.
  • o365.audit.Parameters.User
    • ECS fields: related.user
    • Suggestion: In cases where o365.audit.Workload is “Exchange” the o365.audit.Parameters.User field contains data related to the user on which the action is being performed. Based on this it is relevant to extract the information and include it in the related.user field.
  • o365.audit.Data.f3u
    • ECS fields: user.domain | user.name | user.id | user.email | related.user | related.user
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter” and the o365.audit.RecordType is not “24”, the o365.audit.Data.f3u field is set to “SecurityCompliancEvent” and the field does not contain actual user data. This should not populate the ECS fields mentioned above.
  • o365.audit.Data.suid
    • ECS fields: user.domain | user.name | user.id | user.email | related.user | client.user
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter”, the o365.audit.RecordType is not “24” and o365.audit.Data.suid is set to either “SecurityComplianceInsights” or “SecurityComplianceEvent” the field does not contain an actual user value. The client.user field should only be populated if the o365.audit.ClientIP field is also populated, this would prevent the field being populated when it is not related to an actual user.
  • o365.audit.Data.isda
    • ECS fields: related.user
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter” and the o365.audit.RecordType is not “24”, the o365.audit.Data.isda field contains an array of user objects from o365. These should be used to populate the related.user field.
  • o365.audit.Data.tsd
    • ECS fields: related.user
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter” and the o365.audit.RecordType is not “24”, the o365.audit.Data.tsd field contains data related to the sender of an email. This should be used to populate the related.user field. However, sometimes this field can also be malformed and only contain “<>” or a partially parsed utf8 string. Exceptions should be made in these cases.
  • o365.audit.Data.trc
    • ECS fields: user.domain | user.name | user.id | user.email | related.user | client.user
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter” and the o365.audit.RecordType is not “24”, the o365.audit.Data.trc field is a recipient of an email. This should parsed and used to populate the mentioned ECS fields. The client.user field should also only be populated if the o365.audit.ClientIP also is populated.
  • o365.audit.Data.zu
    • ECS fields: url.domain | url.extension | url.path | url.original | url.scheme | url.subdomain
    • Suggestion: In all cases where the o365.audit.Data.zu is populated it contains a URL. This data should be parsed and appropriately used to populate the ECS fields mentioned above.
  • o365.audit.Data.reid
    • ECS fields: url.domain | url.extension | url.path | url.original | url.scheme | url.subdomain
    • Suggestion: In cases where o365.audit.Workload is “SecurityComplianceCenter”, the o365.audit.RecordType is not “24” and the o365.audit.Data.zu field is not populated, the o365.audit.Data.reid field may contain URL related data concatenated together with o365.audit.Data.rid data. In cases where the event is related to a file it will not contain a URL(s), but if it is related to a URL it will. In cases where the event is related to a URL the URL should be parsed and appropriately used to populate the ECS fields mentioned above.
  • o365.audit.Data.alk
    • ECS fields: event.url
    • Suggestion: In cases where the o365.audit.Data.alk field is populated, it will contain the URL leading to the actual event. This should populate the event.url field.
  • o365.audit.Parameters.DomainName
    • ECS fields: url.domain | url.extension | url.path | url.original | url.scheme | url.subdomain
    • Suggestion: In cases where the o365.audit.Parameters.DomainName is populated, it will contain a domain. This data should be used to populate the ECS fields mentioned above.
  • o365.audit.Name
    • ECS fields: message | rule.name
    • Suggestion: The message field is currently populated with the contents “New Alert” in events where the o365.audit.Name field exists, while rule.name is populated with the o365.audit.Name data. We would also like to see that the message field populated with the corresponding data.

Additionally, we believe the ECS specification should be improved with the introduction of a new field within the Related fields section. Certain third-party data sources, the O365 module included, send events where multiple URLs are present. An optimal solution would be to add this data to a related.domain or related.url field, none of which currently exist.

This is a copy of https://discuss.elastic.co/t/office-365-filebeat-module-improve-ecs-utilization/315126, as I was recommended to post this as a GitHub issue instead.

@botelastic
Copy link

botelastic bot commented Sep 26, 2022

This issue doesn't have a Team:<team> label.

@jamiehynds jamiehynds transferred this issue from elastic/beats Sep 27, 2022
@WildDogOne
Copy link
Contributor

WildDogOne commented Oct 24, 2022

I so much agree with you on this!
I can open a pull request and try to work on this issue if nobody else has time

However it would help a lot if you could add an example event for each of the problems.
Of course I can dig through my own O365, but it's not proving easy ;)
but for example, the first issue you mention, does not affect me, because my userids come in the [email protected] format

@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@jamiehynds jamiehynds added the Integration:o365 Microsoft Office 365 label Feb 9, 2023
@khalavak
Copy link

I second this fully. The 365 audit data to ECS field extractions should really be improved as currently it is very hard to work with and customisations have to be made in order for the Elastic Alerts and data to be usable by Security Analysts working in the SIEM.

@jamiehynds jamiehynds changed the title [Filebeat] [Office 365 module] - Improve ECS utilization [Office 365] - Improve ECS utilization Jul 26, 2023
@jamiehynds
Copy link

Another issue to focus on as we work through O365 improvements:

#5013

@jamiehynds
Copy link

jamiehynds commented Oct 3, 2023

Additional feedback:

Using the standard o365 integration audit logs, the field o365.audit.Data contains json data that is pertinent to the event. The issue is that this field is mapped as a keyword and is not further processed. This field needs to be flattened and the json object should also be ingested into individual fields. This will allow for the better alert analysis required by humans.

Suggested mappings:

IP
o365.audit.Data.sip - ip

Date
o365.audit.Data.ts - date
o365.audit.Data.te - date
o365.audit.Data.at - date
o365.audit.Data.ttdt - date
o365.audit.Data.md - date

Keyword
o365.audit.Data.tid - keyword
o365.audit.Data.lon - keyword
o365.audit.Data.op - keyword
o365.audit.Data.an - keyword
o365.audit.Data.ad - keyword
o365.audit.Data.sev - keyword
o365.audit.Data.rid - keyword
o365.audit.Data.reid - keyword
o365.audit.Data.cid - keyword
o365.audit.Data.tht - keyword
o365.audit.Data.etype - keyword
o365.audit.Data.eid - keyword
o365.audit.Data.f3u - keyword
o365.audit.Data.als - keyword
o365.audit.Data.wl - keyword
o365.audit.Data.ut - keyword
o365.audit.Data.suid - keyword
o365.audit.Data.ail - keyword
o365.audit.Data.von - keyword
o365.audit.Data.sitmi - keyword
o365.audit.Data.dpn - keyword
o365.audit.Data.trc - keyword
o365.audit.Data.aii - keyword
o365.audit.Data.tsd - keyword
o365.audit.Data.ms - keyword
o365.audit.Data.dm - keyword
o365.audit.Data.ttr - keyword
o365.audit.Data.tpt - keyword
o365.audit.Data.tpid - keyword
o365.audit.Data.thn - keyword
o365.audit.Data.imsgid - keyword
o365.audit.Data.fvs - keyword
o365.audit.Data.zu - keyword
o365.audit.Data.pud - keyword
o365.audit.Data.sict - keyword
o365.audit.Data.plk - keyword
o365.audit.Data.mat - keyword
o365.audit.Data.alk - keyword
o365.audit.Data.zmfn - keyword
o365.audit.Data.zmfh - keyword
o365.audit.Data.zfn - keyword
o365.audit.Data.zfh - keyword
o365.audit.Data.sid - keyword
o365.audit.Data.etps - keyword
o365.audit.Data.upfv - keyword
o365.audit.Data.upfc - keyword
o365.audit.Data.ot - keyword
o365.audit.Data.od - keyword

Keyword - this had no analytical value in my instances, but could be helpful for other customers
o365.audit.Data.tdc - keyword
o365.audit.data.af - keyword
o365.audit.Data.ssic - keyword
o365.audit.Data.cpid - keyword
o365.audit.Data.srt - keyword

@chrisberkhout chrisberkhout self-assigned this Oct 4, 2023
@chrisberkhout
Copy link
Contributor

chrisberkhout commented Nov 8, 2023

@defendable-forfot, @WildDogOne & @khalavak,

If you can provide example data for the Data.*, Parameters.User or Parameters.DomainName fields, that would be very helpful.

For example, the data I've seen shows Data.f3u and Data.suid fields having values like [email protected], rather than SecurityComplianceEvent, SecurityComplianceInsights or SecurityComplianceEvent as mentioned in the issue description.

I haven't been able to find documentation of the various Data.* fields, except for the Office 365 Management Activity API schema documentation describing Data as being one of:

  1. The detailed data blob of the alert or alert entity. (here)
  2. Data string which contains more details about investigation entities, and information about alerts related to the investigation. Entities are available in a separate node within the data blob. (here)

If documentation of these individual alert or investigation fields does exist, any tips would be much appreciated.

@chrisberkhout
Copy link
Contributor

chrisberkhout commented Dec 12, 2023

Below I have attempted to restate and respond to each of @defendable-forfot's suggestions.

Many of the suggestions relate to undocumented fields or values that may vary between environments and for which sample data is not currently available.

The relevant upstream documentation is Office 365 Management Activity API schema. For the o365.audit.Data field we only have a small amount of example data, which is listed under "Known example values for the Data parameter" in #8571.

In this round of improvements I intend to:

  • Index known fields.
  • Improve ECS utilization where the meaning of a value can be verified using documentation, sample data or the structure of the value itself.

The original suggestions refer to Filebeat's Office 365 module but I will attempt to apply them to the preferred, Agent-based Microsoft 365 Elastic Integration. Wherever the suggestions don't seem to apply, the change to the Agent-based implemenation may explain the mismatch.

Responses to each suggestion are inline in > bold.

Miscellaneous suggestions

A suggestion about o365.audit.Data fields

The most interesting data related to the events seem to be all placed within the o365.audit.Data field. This makes search and extraction of data from the log source difficult. Ideally the parsing should be done directly in the Filebeat module.

> There is a PR to parse and index this data in the Microsoft 365 integration, here: #8571

A suggestion about new related.* ECS fields

ECS could be improved by adding related.domain or related.url fields, to be used by data sources, including the o365 module, that send events with multiple URLs.

> The closest existing field is related.hosts, which is for "All hostnames or other host identifiers seen on your event. Example identifiers include FQDNs, domain names, workstation names, or aliases."

> I've added an ECS issue, Add related.url field, to discuss this proposal further.

A suggestion about o365.audit.Name

When o365.audit.Name exists, its value populates rule.name.

In such cases the message field could also take that value, instead of New alert.

> The ECS message field value description says "For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event.".

> Currently, message is set to the value of the incoming field Comments for SecurityComplianceAlerts events (an example value is "New alert"), or the incoming field ExchangeMetadata.Subject for ComplianceDLPExchange events (the value being an email subject line).

> The Comments and Name values could be concatenated into message for a richer description, but this cosmetic improvement would come at the cost of having the Comments value unavailabe in its unmodified form. I think it's best not to change this for now.

User data suggestions

Unless otherwise indicated, these suggestions relate to the population of the ECS fields user.domain, user.email, user.id, user.name, and related.user.

Parsing for user.name and user.email values

In some cases, including some involving Exchange, user.name and user.email values have a domain prefix (domainname\) which should be removed and used to populate user.domain.

> Note: this suggestion was given in connection with the o365.audit.UserKey field and the O365 Exchange Suspicious Mailbox Right Delegation detection rule.

> The current logic for populating user.email, user.name, and user.domain will map an incoming value of [email protected] to "user.email": "[email protected]", "user.name": "user", and "user.domain": "inetdomain.com"

> Although user.domain is an appropriate field for storing both a Windows networking domains and Internet domains, before attempting to extract Windows networking domains from user.name and user.email values I would like to 1) have example data (none of our current examples have the Windows networking domain prefix), and 2) be able to clearly distinguish between a Windows networking domain prefix separated by a backslash and other uses of a backslash (valid email addresses may contain backslashes in the user name).

o365.audit.UserId and o365.audit.UserKey non-user value

Where UserId or UserKey matches /^SecurityCompliance.*/, that value should not be set in user.id.

The actual user data may be available in o365.audit.Data.

> Note: The UserId point was noted as being the case for "record types that are not related to Microsoft Exchange, Azure and SecurityComplianceCenterCommand". There is a large number of such record types.

> Currently, there is no reference to UserKey in the pipeline configuration. Its incoming value is retained as o365.audit.UserKey. The UserId field is renamed to user.id.

> The Management Activity API schema: Common schema documentation describes UserId as "The UPN (User Principal Name) of the user who performed the action (specified in the Operation property) that resulted in the record being logged; for example, my_name@my_domain_name. Note that records for activity performed by system accounts (such as SHAREPOINT\system or NT AUTHORITY\SYSTEM) are also included."

> Although values such as SecurityComplianceAlerts seem to refer to a service or function rather than a user or even a system account, I think the choice of this value for UserId in upstream API logic should not be overridden in the pipeline logic.

o365.audit.Parameters.User has user data

A value in o365.audit.Parameters.User can be put in related.user.

In cases where o365.audit.Workload="Exchange" that value will related to the user on which the action is being performed.

> Available example data includes values for this field such as:

EURPR01A002.prod.outlook.com/Microsoft Exchange Hosted Organizations/testsiem.onmicrosoft.com/Discovery Management
EURPR01A002.prod.outlook.com/Microsoft Exchange Hosted Organizations/testsiem.onmicrosoft.com/Discovery Management

> I will open a PR for this change. PR: #8803

o365.audit.Data.* user data

The following fields are suggested to contain user data, in particular when o365.audit.Workload="SecurityComplianceCenter" and o365.audit.RecordType!="24":

  • o365.audit.Data.f3u
    Unless its value is SecurityComplianceEvent.
  • o365.audit.Data.suid
    Unless its value is SecurityComplianceInsights or SecurityComplianceEvent. Also, client.user should be populated only if o365.audit.ClientIP is also populated.
  • o365.audit.Data.isda
    As an array of user objects that should populate related.user.
  • o365.audit.Data.tsd
    Unless it's value is <> or a "partially parsed utf8 string". Represents the sender of an email and should populate related.user.
  • o365.audit.Data.trc
    Represents the recipient of an email. Also, client.user should be populated only if o365.audit.ClientIP is also populated.

> Note: The RecordType=24 corresponds to member name "Discover", described as "Events for eDiscovery activities performed by running content searches and managing eDiscovery cases in the Security & Compliance Center."

> The Data.isda field is not in the list of known fields used for #8571, but that PR will make its value available in o365.audit.Data.flattened.isda. Before indexing that field directly under o386.audit.Data, it would be good to receive confirmation of its use, and example data.

> Although the presence of an incoming ClientIP suggests there is an initiator of a network connection related to this event, the client.user field set seems redundant when not used to distinguish between an initiator (client) and a responder (server).

> Available example data shows f3u, suid, tsd and trc as having values that match the format of an email address. The user.email and user.id fields could potentially be populated with these values, but given their undocumented and uncertain meaning, I think a better choice is to add values that appear to be email addresses into related.user to aid discovery and allow integration users to do any further interpretation of these values themselves.

> I will open a PR to add f3u, suid, tsd and trc values to related.user when they are in email address format. PR: #8803

URL data suggestions

Unless otherwise indicated, these suggestions relate to the population of the ECS fields url.domain, url.extension, url.original, url.path, url.scheme, and url.subdomain.

o365.audit.Parameters.DomainName has domain data

When present, use it to populate the relevant ECS fields.

> The Parameters field contains the "name and value for all parameters that were used with the cmdlet". For the Exchange Admin schema this is a cmdlet that that is identified in the Operations property. For the Security and Compliance Center schema it is noted this will not include PII.

> There is no example data available for this field. It's unclear whether a DomainName value would refer to the domain of a URL and be suitable for url.domain, or to a Windows Networking domain which would not. I would want to confirm the meaning of this field and have example data before populating ECS fields with its value.

o365.audit.Data.* URL data

The following fields are suggested to contain URL data:

  • o365.audit.Data.zu
    Whenever it's populated.
  • o365.audit.Data.reid
    Concatenated with Data.rid data, in cases where o365.audit.Data.zu is not populated and the event relates to a URL not to a file, and in particular, when o365.audit.Workload="SecurityComplianceCenter" and o365.audit.RecordType!="24".
  • o365.audit.Data.alk
    When populated it contains a URL for the actual event, which should be used to populate the event.url field.

> In example data, we have reid values of "cannot be shared" (from a public blog post, likely not the value delivered by the API) and "23a5e271-e297-4f35-ff57-08d7b17f5bf2" (from test data). If reid can contain a concatenation of different types of data, it may be difficult to dependably extract a URL from it. For zu and alk we have no example data.

> A URL value from an undocumented field may be easier to use than other values because a URL is data of specific format that is strictly defined. However, before attempting to extract URLs from zu, alk or other fields I would want to have some example data that confirms their presence.

@jamiehynds
Copy link

Hey @chrisberkhout - do you think we can close this issue on the back of the v2.1.0 update to O365, or are there still some outstanding items to address?

@chrisberkhout
Copy link
Contributor

I think this is done for now. We can revisit it in the future if we get more feedback and data.
The changes made were:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants