Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symbol visibility issues, add test for it #1359

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Jun 26, 2023

Closes #1181.

Catches the actual symbol visibility issue.

Replaces #1135.

@hebasto
Copy link
Member Author

hebasto commented Jun 27, 2023

Rebased on top of the merged #1356.

@hebasto
Copy link
Member Author

hebasto commented Jul 13, 2023

Rebased f816729 -> d7b1cea (pr1359.02 -> pr1359.03) due to the conflict with #1313.

@hebasto hebasto marked this pull request as draft July 13, 2023 17:36
@hebasto hebasto marked this pull request as ready for review July 13, 2023 17:47
@hebasto hebasto marked this pull request as draft August 31, 2023 14:35
@hebasto hebasto marked this pull request as ready for review February 12, 2024 18:31
@hebasto
Copy link
Member Author

hebasto commented Feb 12, 2024

@maflcko

Is it possible for a Cirrus persistent worker to do not override environment variables set with the ENV command in the Dockerfile?

@maflcko
Copy link
Contributor

maflcko commented Feb 13, 2024

Maybe --env-file drops ENV?

@hebasto
Copy link
Member Author

hebasto commented Apr 23, 2024

Rebased to resolve conflicts.

@hebasto
Copy link
Member Author

hebasto commented Aug 25, 2024

#1595 (comment):

This reminds me that we should move forward with #1359.

Agreed.

Rebased and ready to move forward :)

@hebasto hebasto marked this pull request as draft October 16, 2024 13:07
@hebasto hebasto force-pushed the 230626-visibility branch 2 times, most recently from f2a6941 to 4f64b9f Compare October 16, 2024 14:01
@hebasto hebasto marked this pull request as ready for review October 16, 2024 14:34
@hebasto
Copy link
Member Author

hebasto commented Oct 16, 2024

Cool that this caught a real (new) symbol leak.

To clarify, the fix is in the first commit. The excerpt from the CI log with an error message from the previous iteration of this branch looks as follows:

+ python3 ./tools/symbol-check.py .libs/libsecp256k1.so
.libs/libsecp256k1.so: export of symbol secp256k1_musig_nonce_gen_internal not expected
.libs/libsecp256k1.so: export of symbol secp256k1_musig_nonce_gen_internal not expected
.libs/libsecp256k1.so: failed EXPORTED_SYMBOLS
Error: Process completed with exit code 1.

@fanquake suggested that the test could potentially be simplified for c-i.

Thanks! The tools/symbol-check.py script has been simplified.

@real-or-random real-or-random added this to the 0.6.0 milestone Oct 21, 2024
real-or-random added a commit that referenced this pull request Oct 21, 2024
447334c include: Avoid visibility("default") on Windows (Tim Ruffing)

Pull request description:

  Fixes #1421. See code comments for rationale.

  Related meta-bug: #1181.  This reminds me that we should move forward with #1359.

ACKs for top commit:
  fanquake:
    ACK 447334c
  hebasto:
    ACK 447334c, tested on Ubuntu 24.04 using the following commands:
  theuni:
    ACK 447334c

Tree-SHA512: aaa47d88fd1b1f85c3e879a2b288c0eb3beebad0cc89e85f05d0b631f83e58d5a324fb441911970865eaa292f6820d03a1b516d6e8de37a87510e2082acc6e28
Copy link
Contributor

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 44a9392. No opinion on the tests/ci.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
tools/symbol-check.py Outdated Show resolved Hide resolved
ci/ci.sh Outdated Show resolved Hide resolved
tools/symbol-check.py Outdated Show resolved Hide resolved
src/precomputed_ecmult.h Outdated Show resolved Hide resolved
tools/symbol-check.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is small enough to keep it in util.h. Or is there a reason not to keep it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is to facilitate the further attempts to make all headers self-contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm just missing it. Can you explain how this will help with #1039?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to avoid bringing the high-level ../include/secp256k1.h header into lower-level code via util.h. However, at this point, util.h is already being included indirectly through other headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion here, I defer to @real-or-random for preference.

@hebasto
Copy link
Member Author

hebasto commented Oct 24, 2024

@real-or-random's comments have been addressed.


def check_MACHO_exported_functions(library, expected_exports) -> bool:
ok: bool = True
macho_lib: lief.MACHO.Binary = library.concrete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the calls to .concrete for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif exe_format == lief.Binary.FORMATS.MACHO:
success = check_MACHO_exported_functions(library, grep_exported_symbols())
else:
print(f'{filename}: unknown executable format')
Copy link
Member

@fanquake fanquake Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code can be reached, because if it is an unknown format, .parse() will have errored (printing Unknown format), and then lief.Binary.FORMATS = library.format will fail:

./tools/symbol-check.py not_a_file
Unknown format
Traceback (most recent call last):
  File "./tools/symbol-check.py", line 81, in <module>
    exe_format: lief.Binary.FORMATS = library.format
                                      ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'format'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Reworked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code can be reached,

This will be reachable once LIEF adds support for a new binary format, so I think it was useful to have it.

@hebasto
Copy link
Member Author

hebasto commented Oct 24, 2024

@fanquake's feedback has been addressed.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script can still be simplified. Let me suggest a refactored version later.


def check_PE_exported_functions(library, expected_exports) -> bool:
ok: bool = True
for function in library.exported_functions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs, I believe that .exported_functions is from the abstract Binary class, and [entry.name for entry in library.get_export()] may give the better result for PE because it includes non-function exports? See https://lief.re/doc/latest/formats/pe/python.html#export-entry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested approach returns the same list of symbols on the master branch. Incorporated as a potentially superior one.


def check_MACHO_exported_functions(library, expected_exports) -> bool:
ok: bool = True
for function in library.exported_functions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs, I believe that .exported_functions is from the abstract Binary class, and .exported_symbols may give a better result for Mach-O because it includes non-function exports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested approach returns the same list of symbols on the master branch. Incorporated as a potentially superior one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, just checking: You're saying that this change didn't make a difference, right? So it still finds only exported functions and not exported variables?

(And I have the same question for PE.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, just checking: You're saying that this change didn't make a difference, right?

Correct.

So it still finds only exported functions and not exported variables?

With the reverted second commit, both variants catch the same local variables:

./tools/symbol-check.py: In .libs/libsecp256k1.dylib: Unexpected exported symbols: {'secp256k1_pre_g', 'secp256k1_pre_g_128', 'secp256k1_ecmult_gen_prec_table'}

(And I have the same question for PE.)

With the reverted second commit, both variants do not catch local variables.

For more details, please refer to:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that means that the LIEF API is a bit unexpected. Or we don't understand it.

  • For Mach-O, exported_functions is the same as exported_symbols. Both include variables, but I'd expect the former to include only functions.
  • For PE, there doesn't seem to be a way to find exported variables. By the way, importing variables is rare, but the ellswift example and benchmark import the variable secp256k1_ellswift_xdh_hash_function_bip324, so at least we test this in CI. And thus we can be sure that the variables are actually exported in PE, and it's just LIEF that doesn't show them.

Does this match your understanding? Do you think it's worth asking LIEF about the variables?

Relatedly, on platforms where LIEF gives us both variables and functions, would it make sense also check that all expected symbols are actually exported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, my analysis of variables is wrong. LIEF shows them. I'll revisit this later.

@real-or-random
Copy link
Contributor

real-or-random commented Nov 1, 2024

Here's a refactored version, which is more readable IMHO (haven't tested on Mac or Windows):

#!/usr/bin/env python3
"""Check that a libsecp256k1 shared library exports only expected symbols.

Usage examples:
  - When building with Autotools:
    ./tools/symbol-check.py .libs/libsecp256k1.so
    ./tools/symbol-check.py .libs/libsecp256k1-<V>.dll
    ./tools/symbol-check.py .libs/libsecp256k1.dylib

  - When building with CMake:
    ./tools/symbol-check.py build/lib/libsecp256k1.so
    ./tools/symbol-check.py build/bin/libsecp256k1-<V>.dll
    ./tools/symbol-check.py build/lib/libsecp256k1.dylib"""

import re
import sys
import subprocess

import lief


class UnexpectedExport(RuntimeError):
    pass


def get_exported_exports(library) -> list[str]:
    """Adapter function to get exported symbols based on the library format."""
    if library.format == lief.Binary.FORMATS.ELF:
        return [symbol.name for symbol in library.exported_symbols]
    elif library.format == lief.Binary.FORMATS.PE:
        return [function.name for function in library.exported_functions]
    elif library.format == lief.Binary.FORMATS.MACHO:
        return [function.name[1:] for function in library.exported_functions]
    raise NotImplementedError(f"Unsupported format: {library.format}")


def grep_expected_symbols() -> list[str]:
    """Guess the list of expected exported symbols from the source code."""
    grep_output = subprocess.check_output(
        ["git", "grep", "^SECP256K1_API", "--", "include"], # TODO WHITESPACE
        universal_newlines=True,
        encoding="utf-8"
    )
    lines = grep_output.split("\n")
    pattern = re.compile(r'\bsecp256k1_\w+')
    exported: list[str] = [pattern.findall(line)[-1] for line in lines if line.strip()]
    return exported


def check_symbols(library, expected_exports) -> None:
    """Check that the library exports only the expected symbols."""
    actual_exports = list(get_exported_exports(library))
    unexpected_exports = set(actual_exports) - set(expected_exports)
    if unexpected_exports != set():
        raise UnexpectedExport(f"Unexpected exported symbols: {unexpected_exports}")

def main():
    if len(sys.argv) != 2:
        print(__doc__)
        return 1
    library = lief.parse(sys.argv[1])
    expected_exports = grep_expected_symbols()
    try:
        check_symbols(library, expected_exports)
    except UnexpectedExport as e:
        print(f"{sys.argv[0]}: In {sys.argv[1]}: {e}")
        return 1
    return 0


if __name__ == "__main__":
    sys.exit(main())

@hebasto
Copy link
Member Author

hebasto commented Nov 1, 2024

Here's a refactored version, which is more readable IMHO (haven't tested on Mac or Windows):

Tested on macOS and on Windows. Incorporated. Thank you!

def grep_expected_symbols() -> list[str]:
"""Guess the list of expected exported symbols from the source code."""
grep_output = subprocess.check_output(
["git", "grep", "^\s*SECP256K1_API", "--", "include"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me error symbol-check.backup.py:40: SyntaxWarning: invalid escape sequence '\s'. I think it should be

r"^\s*SECP256K1_API"

or

"^\\s*SECP256K1_API"

@jonasnick jonasnick mentioned this pull request Nov 4, 2024
3 tasks
@real-or-random real-or-random modified the milestones: 0.6.0, 0.6.1 Nov 4, 2024
hebasto and others added 2 commits November 4, 2024 17:48
This change makes the `-fvisibility=hidden` compiler option unnecessary.
@hebasto
Copy link
Member Author

hebasto commented Nov 4, 2024

Rebased and addressed @jonasnick's #1359 (comment).

Copy link
Contributor

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-review reACK 7ac46cf.

Still no opinion on the .py/ci.

Being able to drop -fvisibility=hidden is a nice improvement (and also a testament to the code structure!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symbol visibility
6 participants