Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts for Mining Dataset of RefMiner #99

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Conversation

lyriccoder
Copy link
Member

@lyriccoder lyriccoder commented Nov 23, 2020

  1. Script where RefMiner 2.0 will be executed for each folder in parallel
  2. Code Similarity with the tests

Do the similar thing:
https://link.springer.com/article/10.1007/s11219-019-09442-9

The similarity between two code smells is based on their text, thanks to this SequenceMatcher, which relies on the Ratcliff and Obershelp’s algorithm, published in 1980, named “gestalt pattern matching.” The main idea of the algorithm is to find the longest contiguous matching subsequence between two compared sequences. We consider two smells as the same if they are from the same smell type (among the 12 studied code smells), and if their similarity degree is greater than 0.7. If one smell of C1 gets a similarity degree greater than 0.7 with two smells of C2, we match it with the one with the highest similarity value.

I just added checking of hamming distance. If Ratcliff and Obershelp’s > 0.7, the check if hamming distance is large than 0.4.
Then then number of matched strings / all strings number must be > 0.7

  1. Combining jsons for all repos into 1 dataset and filtration EM methods (with tests)

@acheshkov
Copy link
Member

@lyriccoder please, give PR a proper name and add a description.

@acheshkov
Copy link
Member

@lyriccoder what algorithm you have chosen to measure code similarity?

@lyriccoder lyriccoder changed the title Add refMiner execution Scripts for Mining Dataset of RefMiner Nov 26, 2020
@lyriccoder lyriccoder marked this pull request as ready for review November 27, 2020 13:03
@lyriccoder lyriccoder marked this pull request as draft November 30, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants