Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding base functionality for source confidence scoring #439

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

redsand
Copy link

@redsand redsand commented Nov 5, 2020

Adds base functionality for new scoring technique. Implemented as a misp module, hoping to see it brought internally to the app

@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2020

This pull request introduces 24 alerts when merging e1e7d49 into 900fe56 - view on LGTM.com

new alerts:

  • 17 for Unused import
  • 4 for Except block handles 'BaseException'
  • 2 for Unused local variable
  • 1 for Module is imported more than once

@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2020

This pull request introduces 13 alerts when merging 7e77058 into 900fe56 - view on LGTM.com

new alerts:

  • 7 for Unused import
  • 4 for Except block handles 'BaseException'
  • 2 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 11, 2020

This pull request introduces 13 alerts when merging 79acdec into ab23547 - view on LGTM.com

new alerts:

  • 7 for Unused import
  • 4 for Except block handles 'BaseException'
  • 2 for Unused local variable

@mokaddem
Copy link
Contributor

Hello @redsand!
Thanks a lot for your pull request. Please find my comments below:

  1. I am curious why this script is not relaying on the built-in decaying model of MISP.
    • Are you missing something that is not implemented or not working the way you'd like in the default MISP's implementation?
  2. I see some debugging leftovers (comments and commented code) which should be cleaned
  3. Querying back MISP for every value is extremely costly and not doable in a production system
    • results = misp.search(value=input_attribute['value'])
    • You could query back MISP from another script but having this step in the pipeline for every value is too costly
      • You have the query handshake + authentication + full database search + returning the value + processing the output
  4. The way I would have seen this module only handle the confidence part and relying on the MISP's built-in decaying implementation. So, only this part
    • final_score = ( total_score / confidence) * 100.0 # make it a pct
    • Where the complete steps would be:
      1. User issue a restSearch with decaying enabled and filtering out expired data
      2. MISP compute decaying score
      3. MISP provide the score and more data to the MISP module
      4. The MISP-module returns back either a weight or the modified score
      5. MISP filter out results based on the MISP-module feedback and return data to the user

Let us know what you think!

@redsand
Copy link
Author

redsand commented May 20, 2021

  1. This implementation was chosen because it was recommended per our meeting with the MISP team last year on a conference call. I am not familiar with the broader MISP project's codebase, per this suggestion.

  2. I can certainly remove any debugging output, oopsie!

  3. Querying back for all the data for the attribute is required for properly calculating the score (total_score), since its a representation of the attribute and its properties for each source provider. I have solved for the cost by precalculating all attributes and updating their scores periodically. More specifically, for our implementation internally, all items are scored and exported out as csv's for real-time processing of our MDR platform.

  4. This is meant for your team to better understand how the paper is written and identify the best way for this feature to be applied at the production level. I noticed several workflows (as you have) that do not compliment the method of how the research paper was written. For us, we are able to use the source confidence tables along with processing the data on export to calculate all values at that time, and we simply then perform this export every X hours or days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants