Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_references_from_file returns inconsistent data #104

Open
Hu1buerger opened this issue May 10, 2023 · 0 comments
Open

extract_references_from_file returns inconsistent data #104

Hu1buerger opened this issue May 10, 2023 · 0 comments

Comments

@Hu1buerger
Copy link

Hu1buerger commented May 10, 2023

Given this document Kotti et al. - 2023 - Machine Learning for Software Engineering A Terti.pdf

Expectation

i would expect

  1. per linenumber only one reference to be found
  2. even if it returns a reference object for the same line it should hold that r1 = r2 with r1, r2 from the refs each with the same lineno and especially that r1['title'] = r2['title']

Actual

the refs found contain multiple contradictory results.

ie. Screenshot 2023-05-10 at 15 06 09

Replicate me

install pytest-subtests.
call with the document attached above

#with subtests from pytest-subtests
def test_reference_consistency(path, subtests):
    """
    Ensure that for each line in the file, there are no inconsistent duplicate references.

    Given a list of references, there shall only exist two references r1 and r2 where r1.lineno = r2.lineno and r1 == r2.
    """
    refs = extract_references_from_file(path)

    # Group the references by line number
    lines = {}
    for ref in refs:
        lineno = ref['linemarker'][0]
        if lineno in lines:
            lines[lineno].append(ref)
        else:
            lines[lineno] = [ref]

    # Check for inconsistent duplicate references on each line
    consistency = True

    for lineno, refs in lines.items():
        if len(refs) == 1:
            continue

        assert len(refs) > 1

        with subtests.test('line', lineno=lineno, refs=refs) as st:
            # Check that each pair of references on the line are consistent duplicates
            for i in range(1, len(refs)):
                ref1 = refs[i - 1]
                ref2 = refs[i]

                assert r1 == r2, f"Found inconsistent references: {r1} and {r2}"
@Hu1buerger Hu1buerger changed the title extract_references_from_file contains duplicates with conflicting data. extract_references_from_file returns inconsistent data May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant