Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink measured size #15

Merged
merged 22 commits into from
Aug 12, 2024
Merged

Shrink measured size #15

merged 22 commits into from
Aug 12, 2024

Conversation

krystian-hebel
Copy link
Member

@krystian-hebel krystian-hebel commented Feb 22, 2024

This set of changes makes the measured size of SKL smaller by making all the non-const data built at runtime and moving sections around. Size changes, calculated as offset to bootloader_data symbol:

  • for 64b build, change from 0xe058 to 0xd4d8, new measured size: 0x2f4a
  • for 32b build, change from 0x705a to 0x7480, new measured size: 0x30de

Increase in size for 32b is caused mostly due to measured size being just above multiply of 4K in size, and next variable is 4K-aligned.

Not tested on hardware, I'm planning to do so after switching from current tags format to SLRT.

At least GNU ld 2.30 and earlier fail to discard the generic part of
.got.plt when no actual entries were allocated. As this section can't
be discarded, linker script checks if section's size is what we expect
it to be (either empty or 3 * sizeof(long)) and fails otherwise.
.got.plt is removed by objcopy when final binary is built.

Signed-off-by: Krystian Hebel <[email protected]>
add %edx, %ebx
add $MAX_STACK_SIZE, %ebx
cmp %ebx, %esp
jge .Lenough_stack
Copy link
Collaborator

@andyhhp andyhhp Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you can actually use the fact that the SKL size is 64k to your advantage here:

movzwl booloader_data+2(%ebp), %eax
add    $MAX_STACK_SIZE, %ax
jno    .Lenough_stack

ought to be equivalent, I think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

booloader_data+2(%ebp) has just the size from booloader_data to the end of that data, not from start of SLB, so booloader_data offset would still have to be added to that.

Using halfword trick is nice, but it won't matter for SLRT because it's size field is u32.

Copy link
Member Author

@krystian-hebel krystian-hebel Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified in 65e9d93 in next PR

link.lds Outdated

/*
* This section is expected to be empty. To keep amount of data sent to the
* TPM by SINIT instruction low, all data is either read-only (.rodata), or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/SINIT/SKINIT/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krystian-hebel krystian-hebel force-pushed the shrink_measured_size branch 2 times, most recently from 602004f to 1019f11 Compare February 27, 2024 19:12
iommu.c Outdated Show resolved Hide resolved
link.lds Outdated Show resolved Hide resolved
event_log.c Show resolved Hide resolved
tpmlib/crb.c Show resolved Hide resolved
Big, sparsely filled structures don't have to be in measured part of
SLB. This change moves them to .bss where they are filled at runtime,
after .bss is cleared.

Note that pagetables are not moved yet, it will be done in following
commit.

Signed-off-by: Krystian Hebel <[email protected]>
Pagetables take 7*4K = 28K of space. Building them at runtime means that
they won't have to be measured, as long as the code creating them is
measured and proper memory protections are in place.

Signed-off-by: Krystian Hebel <[email protected]>
To keep amount of data sent to the TPM by SKINIT instruction low, .data
section is made empty by moving its contents which is either turned
read-only (.rodata), or uninitialized and built at run time (.bss).
Linker script asserts that there is no leftover .data.

Signed-off-by: Krystian Hebel <[email protected]>
Stack area doesn't need to be measured by SKINIT. Initial %esp always
points to the end of SLB, there is no reason to move it from there.
Minimal stack size is checked by the code as the amount of available
space depends on size of SLRT passed by the bootloader.

Signed-off-by: Krystian Hebel <[email protected]>
@krystian-hebel
Copy link
Member Author

Loosely related to the commit that moves non-RO data to .bss: https://nvd.nist.gov/vuln/detail/CVE-2017-16837. To be fair, similar vulnerability doesn't exist in the current code as all data was measured, including .bss, stack etc.

With reordered sections and most of the data structures built at runtime,
size of actual code that has to be measured by TPM can be reduced. As
that size field used to specify offset to bootloader data, a new field
was added to the header for that purpose.

Signed-off-by: Krystian Hebel <[email protected]>
Far return is used, so segment selector and relocated instruction
pointer are build on the stack, instead of modifying the code.

Signed-off-by: Krystian Hebel <[email protected]>
pm_kernel_entry and zero_page were renamed to dlme_entry and dlme_arg,
respectively. Their previous names were valid only for Linux.

Signed-off-by: Krystian Hebel <[email protected]>
Linux, Multiboot2 and simple payload don't have overlapping uses for
values passed through registers or stack. All of those can be set at
the same time.

Signed-off-by: Krystian Hebel <[email protected]>
SKL hashes are no longer passed from the bootloader. Measured code and
read-only data are not changed, and all other data is created at runtime
so SKL is able to measure itself, which it now does. Those hashes are
only used for event log entries, PCR was extended as a result of SKINIT
instruction on Dynamic Launch Event.

Signed-off-by: Krystian Hebel <[email protected]>
This removes different handling between boot protocols. Everything is
reduced to one range of consecutive memory for DLME, as well as pointer
to the entry point, which is measured to PCR 17 as an offset from DLME
base to avoid attacks based on jumping to different parts of DLME.

Signed-off-by: Krystian Hebel <[email protected]>
MULTIBOOT2_BOOTLOADER_MAGIC has been moved to defs.h as it is still
used in head.S.

Signed-off-by: Krystian Hebel <[email protected]>
It used to be defined in boot.h, even though it isn't expected to be
ever needed anywhere but in code for handling event log internally.

Signed-off-by: Krystian Hebel <[email protected]>
Test for INVALIDATE_ALL completion is now done inside the initialization
function, instead of main.c. Function names were changed to better
describe what the function does. Comment describing window in anti-DMA
security was updated with new finds.

Some bugs were fixed:
- EventLogInt wasn't properly cleared (it is write-1-to-clear bit)
- CmdBufEn was set on transition to the kernel, which didn't clear it
  before setting up head/tail pointers, which in turn lead to hang
  (it is listed as undefined behavior in IOMMU specification)
- Command buffer tail/head pointer registers were incorrectly masked

Signed-off-by: Krystian Hebel <[email protected]>
Release versions of SKL hanged in tis_send(), while versions with serial
output worked, most likely thanks to increased delay between DRTM
sequence and next TPM command. Apparently, some TPMs don't properly
handle setting STS.commandReady when all of the following conditions are
met:

- TPM has recently finished DRTM sequence (internal work may still be
  happening at that point, there is no way to be sure),
- TPM just transitioned to Idle state (e.g. by changing locality),
- STS.commandReady is written periodically before TPM reports it is in
  Ready state.

When all of the above applies, STS.commandReady is always read as 0,
as if the TPM restarted transition from Idle to Ready each time it is
asked to do so. To work around this, set this bit once and keep checking
in a loop until TIMEOUT_B (2 seconds). Well behaving TPM must be able to
enter Ready state before that time, if it doesn't, error is returned.

Signed-off-by: Krystian Hebel <[email protected]>
This enables SHA tests to check if different compilers produce the same
results.

Definitions in string.h were moved outside of __STDC_HOSTED__ to avoid
compiler warnings (promoted to errors because of -Werror) due to
incompatible implicit declaration of built-in functions.

Signed-off-by: Krystian Hebel <[email protected]>
-fstrict-aliasing is enabled by default when using optimization levels
higher than 1, including -Os. With that option, compiler may assume
that object of one type never resides at the same address as object of
a different type. Both sha1_final() and sha256_final() used to write
message length by casting a pointer to buffer into a pointer to u64,
while surrounding code operated on the buffer directly.

The problem manifests in GCC 11 and later versions.

The commit fixes this issue and another UB caused by unaligned access
at the beginning of transformation step by using memcpy() in both cases.

Signed-off-by: Krystian Hebel <[email protected]>
Signed-off-by: Sergii Dmytruk <[email protected]>
@macpijan
Copy link
Member

macpijan commented Apr 17, 2024

Integrated both #17 and #19 here for AEM milestone release.

The review can still happen in individual (even closed) MRs if that makes it easier to review, and we will happily consider any comments.

…ssors"

This reverts commit 335c103.

Without -march=btver2 SKL can be used on AMD 15h or earlier and now that
SKL more than fits into 64KiB limit, no need for such options to save a
bit of space.

Signed-off-by: Sergii Dmytruk <[email protected]>
DEV is specific to a northbridge, so there are multiple of them on
systems with several CPU sockets. 0x18 is the device number of the
first one with the rest using successive numbers (0x19, 0x1A, etc.).

Signed-off-by: Sergii Dmytruk <[email protected]>
@rossphilipson
Copy link
Collaborator

Just FYI, I have modified the SLRT to make everything 8b aligned in the SLRT in Linux. The PR to do this was pulled into the linux-sl-6.7 working branch. I am also adding a TXT heap address to the Intel info table. This should go up as a PR soon.

@krystian-hebel
Copy link
Member Author

Just FYI, I have modified the SLRT to make everything 8b aligned in the SLRT in Linux. The PR to do this was pulled into the linux-sl-6.7 working branch. I am also adding a TXT heap address to the Intel info table. This should go up as a PR soon.

Should be addressed by #21

This results in 64-bit alignment of 64-bit fields.

Signed-off-by: Sergii Dmytruk <[email protected]>
Copy link
Collaborator

@rossphilipson rossphilipson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to move the work from shrink_measured_size into master. We used the shrink_measured_size branch when working on the latest rebase of the AMD work on the internal PSP work we did. There are changes on top of these that will need to be reconciled once we can merge the PSP/DRTM bits for AMD. But for now, I think this LGTM:

Reviewed-by: Ross Philipson [email protected]

@rossphilipson rossphilipson merged commit 1cc0a75 into master Aug 12, 2024
56 of 57 checks passed
@SergiiDmytruk SergiiDmytruk deleted the shrink_measured_size branch August 12, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants