Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_magisk is not compatible with Android 10 #4

Closed
ubergeek77 opened this issue Dec 20, 2019 · 38 comments
Closed

add_magisk is not compatible with Android 10 #4

ubergeek77 opened this issue Dec 20, 2019 · 38 comments

Comments

@ubergeek77
Copy link

ubergeek77 commented Dec 20, 2019

To preface, this issue will be quite long, but I wanted to give as much detail as possible in hopes of making it easier to identify a solution. A TL;DR/summary will be provided at the bottom.

When building an image for my Pixel 3 XL (crosshatch), I ran into some issues with the add_magisk function that caused the build to fail. I don't know if these issues were crosshatch-specific, or if they happened due to code changes between Android 9 and Android 10.

Additionally, despite having a sucessful build with my below "fixes," I unfortunately cannot flash this resulting image to my Pixel 3 XL. Regardless, in hopes that someone who knows what's going on can tell me what I did wrong and how I can fix this, here are the issues I ran into and how I fixed them:

BOOT/RAMDISK/init is a dead symlink in crosshatch builds

add_magisk will fail if the init binary in target_files_intermediates is a symbolic link to /system/bin/init, as it was for me.

Please consider the following lines in add_magisk:

chaosp/build.sh

Line 1110 in 9133970

cp -n $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/{init,.backup/init}

chaosp/build.sh

Line 1113 in 9133970

cp magisk-latest/arm/magiskinit64 $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/init

They expect the init binary to be readable from:

$BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/init

However, in my build, this init "file" is actually a dead symbolic link:

init -> /system/bin/init

Attempting to read this file will result in the usual "file not found" error, because the file doesn't actually exist on the build system at that path - it's just a dead link.

The actual init binary is located in a sub-directory, adjacent to the init symlink, that represents the /system/bin file tree:

$BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/system/bin/init

The "quick fix" is to change the two lines linked above to the following:

Line 1110:
cp -n $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/system/bin/{init,../../.backup/init}

Line 1113:
cp magisk-latest/arm/magiskinit64 $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/RAMDISK/system/bin/init

This way, add_magisk can put the magiskinit binary in (hopefully) the correct place.


crosshatch kernel images are already separated from the dtb

This line attempts to separate the kernel and dtb images before decompressing the kernel:

chaosp/build.sh

Line 1130 in 9133970

python3 ./extract-dtb.py $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/kernel

However, add_magisk will simply fail if the kernel image is already separate from the dtb.

I'm not sure why the format for this is so different from other Pixel devices, but on my crosshatch build, this is what the $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT directory looks like before any modifications:

drwxr-xr-x  3 ubuntu ubuntu     4096 Dec 19 21:03 .
drwxr-xr-x 13 ubuntu ubuntu     4096 Dec 19 21:04 ..
drwxr-xr-x 25 ubuntu ubuntu     4096 Dec 19 21:08 RAMDISK
-rw-r--r--  1 ubuntu ubuntu       11 Dec 19 21:03 base
-rw-r--r--  1 ubuntu ubuntu      316 Dec 19 21:03 cmdline
-rw-r--r--  1 ubuntu ubuntu   862396 Dec 19 21:03 dtb
-rw-r--r--  1 ubuntu ubuntu 19331007 Dec 19 21:03 kernel
-rw-r--r--  1 ubuntu ubuntu        5 Dec 19 21:03 pagesize

As you can see, dtb is already separated from the kernel image.

If you let extract-dtb.py continue and run against this particular kernel file, it will return "No appended dtbs found."

Because of this, the next line in add_magisk will fail, since extract-dtb.py didn't actually extract anything, so a file named 00_kernel does not exist:

chaosp/build.sh

Line 1133 in 9133970

lz4 -d dtb/00_kernel dtb/uncompressed_kernel

The solution is to skip trying to extract and re-concatenate the dtb, and just operate directly off of this already-separated kernel file in the BOOT directory.

Here are my replacement lines for this fix that fit between lines 1125 and 1157:

  # Uncompress the kernel
  lz4 -d $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/kernel ./uncompressed_kernel

  # Hexpatch the kernel
  chmod +x ./magisk-latest/x86/magiskboot
  ./magisk-latest/x86/magiskboot hexpatch ./uncompressed_kernel 736B69705F696E697472616D667300 77616E745F696E697472616D667300

  # Recompress kernel
  lz4 -f -9 ./uncompressed_kernel $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT/kernel

  # Remove target files zip
  rm -f $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER.zip

  # Rezip target files
  cd $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER
  zip --symlinks -r ../aosp_$DEVICE-target_files-$BUILD_NUMBER.zip *

If this is a crosshatch-specific issue, and not caused due to the jump between Android 9 and Android 10, then this change will almost certainly break builds for non-crosshatch devices. So, while this works for me, this code snippet is just an example, and a more catch-all, robust solution is likely needed.


With these changes, I was able to successfully "build" an image for my Pixel 3 XL. Unfortunately, this is clearly not the correct solution, as flashing the resulting image to my device results in a failed flash. The resulting image is broken and does not function in any way.

When flashing, instead of doing this, as in a typical flash:

...
        archive does not contain 'dt.img'	
	archive does not contain 'recovery.img'	
	extracting vbmeta.img (0 MB) to disk... took 0.000s	
	archive does not contain 'vbmeta.sig'	
	Sending 'vbmeta_b' (4 KB)                          OKAY [  0.120s]	
	Writing 'vbmeta_b'                                 OKAY [  0.064s]	
	archive does not contain 'vbmeta_system.img'	
	archive does not contain 'vendor_boot.img'	
	extracting super_empty.img (0 MB) to disk... took 0.000s	
	Rebooting into fastboot                            OKAY [  0.060s]	
	< waiting for any device >	
	Sending 'system_b' (4 KB)                          OKAY [  0.001s]	
	Updating super partition                           OKAY [  0.028s]	
	Resizing 'product_b'                               OKAY [  0.005s]	
	Resizing 'system_b'                                OKAY [  0.005s]	
	Resizing 'vendor_b'                                OKAY [  0.005s]
...

The flash of a Magisk-patched image does this:

...
        archive does not contain 'dt.img'
	archive does not contain 'recovery.img'
	extracting vbmeta.img (0 MB) to disk... took 0.000s
	archive does not contain 'vbmeta.sig'
	Sending 'vbmeta_b' (4 KB)                          OKAY [  0.120s]
	Writing 'vbmeta_b'                                 OKAY [  0.063s]
	archive does not contain 'vbmeta_system.img'
	archive does not contain 'vendor_boot.img'
	extracting super_empty.img (0 MB) to disk... took 0.000s
	Rebooting into fastboot                            OKAY [  0.060s]
	< waiting for any device >
	fastboot: error: Failed to boot into userspace fastboot; one or more components might be unbootable.

Basically, when it gets to the part where it tries to boot into userspace fastboot to finish flashing the rest of the image, it fails to do so, so the rest of the flash fails.

I should point out that the mkbootfs patch was a part of this build, as well.

If I remove all of these patches and build a "vanilla" RattlesnakeOS image with no Magisk or any other patches, it flashes, installs, and runs just fine. The only reason I see that this failed is because something went wrong with adding Magisk.

Does anyone have any ideas as to how I can get this working on my Pixel 3 XL?


TL;DR/summary

  • The add_magisk function from this repository has some key issues that causes builds to fail for crosshatch
  • I attempted to address these issues myself, and I was able to get the build to succeed
  • The image produced by this build and my add_magisk fixes does not flash properly, and fails to produce a bootable userspace fastboot image
    • My fixes clearly broke something
  • Unmodified, Magisk-less, "vanilla" RattlesnakeOS builds will build and install fine on my device
@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

Can you tell me how I should fix this? I'm not sure what you mean by "The unit is in a different folder."

I was just going off of what was already in add_magisk, and a lot of things still look similar in the AOSP file tree, but if you know how I can fix this, please, do tell me!

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

It looks like you have two scripts in two different branches.

In your master branch, you have add_magisk completely commented out.

But in your 10.0-testing branch, at least in your latest commit, you've only commented out the cp -n line on line 1129 in the same file, and left everything else intact. If I diff it to CaseyBakey's version, this is the only change.

But you also leave the dtb extraction script to do its thing, and that won't work for me. On crosshatch, as I've discovered and mentioned above, the kernel file does not have a bundled dtb inside of it. This means that, for the rest of the script to continue, 00_kernel has to exist, but it can't, since the dtb extraction script can't create it.

Builds for taimen probably produce a kernel file that has an appended dtb, which means this issue is probably exclusive to crosshatch.

There are two parts to this problem - the init binary, and patching the kernel. The first problem is easy enough, as we've seen, but I suspect this second problem is what's causing my image to be unable to boot.

I'll take a look at your commit history to see what I find, but if you have any ideas of what commit I should look at, let me know.

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

but if none of that works, we have couple options. a.) give me the build error, or b.) i can build that fucker in less 6 hours i just dont have a way to test it

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

That's the first of the two problems. If you modify the function to find the correct init binary in the ./system/bin subdirectory and run again, you'll get this:

#### build completed successfully (02:25 (mm:ss)) ####

Archive: magisk-latest.zip
signed by SignApk
inflating: magisk-latest/META-INF/com/google/android/update-binary
inflating: magisk-latest/META-INF/com/google/android/updater-script
inflating: magisk-latest/arm/magiskboot
inflating: magisk-latest/arm/magiskinit
inflating: magisk-latest/arm/magiskinit64
inflating: magisk-latest/chromeos/futility
inflating: magisk-latest/chromeos/kernel.keyblock
inflating: magisk-latest/chromeos/kernel_data_key.vbprivk
inflating: magisk-latest/common/addon.d.sh
inflating: magisk-latest/common/boot_patch.sh
inflating: magisk-latest/common/magisk.apk
inflating: magisk-latest/common/util_functions.sh
inflating: magisk-latest/x86/magiskboot
inflating: magisk-latest/x86/magiskinit
inflating: magisk-latest/x86/magiskinit64
inflating: magisk-latest/META-INF/MANIFEST.MF
inflating: magisk-latest/META-INF/CERT.SF
inflating: magisk-latest/META-INF/CERT.RSA
Cloning into 'extract-dtb'...
No appended dtbs found
Unable to access file for processing: dtb/00_kernel
/home/ubuntu/magisk-workdir
open: extract-dtb/dtb/uncompressed_kernel failed with 2: No such file or directory

The build fails here.

@shell832
Copy link

basically what needs to be done is the link needs to rebuilt for use with magisk, my bad i forgot what it is buy there is a command in one of the py scripts i believe its there that cant handle the init file in its current state, and the easiest solution for me was to go directly to the file. thats why well atleast in my term its red

@ubergeek77
Copy link
Author

Just so we're clear, the init binary is not the main problem here. That's just a file that can be easily replaced and is easily dealt with.

The kernel patching and subsequent repacking is what's causing problems here. See my comment above.

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

Well, I don't have it uploaded anywhere, but if you look at the first post of this issue, I list exactly the changes I made to CaseyBakey's original script.

You really should go through my first post, but in short, my add_magisk function is literally just CaseyBakey's function with a few changes. All of those changes are listed in the first post here and you can just copy-paste it.

I didn't know the original function didn't do Android 10 "at all," so that might explain why I'm having so much trouble. I didn't do any porting, I just took what CaseyBakey had and made a few changes. I don't have anything else more complicated than that.

I should also mention I'm doing all of this on "vanilla" RattlesnakeOS-stack on AWS. So I'm using Dan-V's original script, with add_magisk just added in, à la https://github.com/corrmaan/rattlesnakeos-patches

So, Dan-V handles most of the Android 10 specific stuff, and I'm only trying to add Magisk and nothing more.

The add_magisk function is compatible with RattlesnakeOS-stack proper, as a few others have reported success with what I'm trying to do, but this was before Android 10, and I'm not even sure anyone else on crosshatch has done this, so it could either be an Android 10 issue, a crosshatch issue, or both.

@shell832
Copy link

ok for what i see you have to problems, and i think you are way off,

Unable to access file for processing: dtb/00_kernel
/home/ubuntu/magisk-workdir
open: extract-dtb/dtb/uncompressed_kernel failed with 2: No such file or directory

this means non existent
like they arent there. its not the script im pretty sure of that
have you looked in the relavent folders?

thats what that error says either you are not git cloning the repo for dtb
when was the last time you ran a fresh build too? if you build to many times its fucks shit up well unsuccessful builds, successfuls dont mess with the build structure

mainly to me it is saying
it can not find extract-dtb/dtb/uncompressed_kernel <~~~ this file... and it needs it to extract dtb.
check your git clone command in the script first

@ubergeek77
Copy link
Author

ubergeek77 commented Dec 20, 2019

As I've explained before, 00_kernel does not exist in the AOSP source tree. It's a product of running extract-dtb.py.

00_kernel would be produced by this script, but as you can see in the log output I have posted, my device's kernel image is not concatenated with dtb by default. They are already separate, so there is nothing for extract-dtb.py to extract. Trying to do so will show you the error message: No appended dtbs found. Again, you can see this in the log I just posted.

Thus, for crosshatch, 00_kernel cannot be created. As a consequence, the subsequent commands that would have created extract-dtb/dtb/uncompressed_kernel cannot do so, since 00_kernel never existed, and cannot exist for crosshatch.

As you can see in my original post here, I've gone to great lengths to work around this and allow the script to patch the already existing, un-dtb-concatenated kernel file, and while this allows the build to succeed, the device won't boot.

That is what my problem is.

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

Yes, that's what I'm trying to do.

However, given the nature of how AOSP builds work, and based on feedback from others, I think that this is also broken in CaseyBakey's chaosp, and that a fix is needed for this to work both here and on RattlesnakeOS-stack.

I'm 99.9% sure that running this on RattlesnakeOS-stack has nothing to do with my problems. I could run it locally with chaosp and it would still probably fail for the same reason.

@shell832
Copy link

shell832 commented Dec 20, 2019 via email

@ubergeek77
Copy link
Author

The extract-dtb.py comes straight from CaseyBakey's script. It's cloned right there in the script, and run directly like that.

I'm not cloning any other dtb files or anything like that, I'm just using what already exists as a result of the AOSP build. From the top post in this issue:


this is what the $BUILD_DIR/out/target/product/$DEVICE/obj/PACKAGING/target_files_intermediates/aosp_$DEVICE-target_files-$BUILD_NUMBER/BOOT directory looks like before any modifications:

drwxr-xr-x  3 ubuntu ubuntu     4096 Dec 19 21:03 .
drwxr-xr-x 13 ubuntu ubuntu     4096 Dec 19 21:04 ..
drwxr-xr-x 25 ubuntu ubuntu     4096 Dec 19 21:08 RAMDISK
-rw-r--r--  1 ubuntu ubuntu       11 Dec 19 21:03 base
-rw-r--r--  1 ubuntu ubuntu      316 Dec 19 21:03 cmdline
-rw-r--r--  1 ubuntu ubuntu   862396 Dec 19 21:03 dtb
-rw-r--r--  1 ubuntu ubuntu 19331007 Dec 19 21:03 kernel
-rw-r--r--  1 ubuntu ubuntu        5 Dec 19 21:03 pagesize

As you can see, dtb is already separated from the kernel image.

If you let extract-dtb.py continue and run against this particular kernel file, it will return "No appended dtbs found."

Because of this, the next line in add_magisk will fail, since extract-dtb.py didn't actually extract anything, so a file named 00_kernel does not exist


I'm not sure how much more clear I can be about why those files don't exist - I've explained it many times. Every file in that directory listing above is already there by the time add_magisk is executed.

As for the missing files in my error logs, the missing files would be created by add_magisk when it runs extract-dtb.py, but they aren't created because the crosshatch kernel format doesn't concatenate dtb to the end of the main kernel.

In short, it's not that I'm missing files or need to download something else, it's that add_magisk assumes the Android 10 crosshatch kernel is in a certain format, when in reality it's not.

@ubergeek77
Copy link
Author

@CaseyBakey Do you have any ideas about how I can troubleshoot this? I've been banging my head against this for the past two weeks and I've made zero progress. I'm not sure what I'm doing wrong.

All Google searches about trying to bake Magisk into AOSP leads back to you, be it either here or on Reddit - no one has really done this before, so there's not a lot of troubleshooting resources available for this.

@CaseyBakey
Copy link
Owner

CaseyBakey commented Jan 4, 2020

Hello there,

Excuse me for the late answer but the Github notifications go to a "trash" mail account that I don't look often...

I didn't read the whole thing but:

  1. lot of changes happened with Android 10. This is why CHAOSP doesn't work yet on Android 10 (at least the CHAOSP-only part like opengapps and Magisk). Take a look here to understand what's going on with init/ramdisk stuff: https://source.android.com/devices/bootloader/system-as-root#ramdisk
    ==> this break the Magisk part

  2. android-prepare-vendor, the scripts responsible to recover binary blobs missing (from AOSP) from official OTA update zips, doesn't work yet on Android 10 for the "full" configuration, needed to implement opengapps: API 29 support (Android 10.0) anestisb/android-prepare-vendor#169
    ==> this break the opengapps part

  3. Opengapps doesn't support Android 10 yet

Even if you don't see commits there, I'm working on it when I got time.
Things seem to start moving on the android-prepare-vendor project anyway, so keep on Android 9 for now (like me).

@CaseyBakey
Copy link
Owner

@CaseyBakey Do you have any ideas about how I can troubleshoot this? I've been banging my head against this for the past two weeks and I've made zero progress. I'm not sure what I'm doing wrong.

All Google searches about trying to bake Magisk into AOSP leads back to you, be it either here or on Reddit - no one has really done this before, so there's not a lot of troubleshooting resources available for this.

So if you say that there is no DTB concatenated with the kernel, why are you still trying to use extract-dtb.py? Just uncompress the kernel (if still lz4 compressed), patch it with magiskboot, and recompress it. And, for sure, don't concatenate the kernel and the DTB together.

That's it.

So comment/delete the corresponding lines to achieve this.

@ubergeek77
Copy link
Author

ubergeek77 commented Jan 4, 2020

Thanks for the reply. I don't think you have the full picture though. You can ignore every comment here besides my first one - the discussion with the other user here was basically pointless. Those may have misled you into thinking I was using that extract-dtb python script, when in reality I was not. The other commenter here apparently could not comprehend what I was trying to say, so I had to repeat over and over, in increasing detail, why I was not using that script.

In my first comment, I go over what I've tried, what works, what doesn't work, and how things fail when things go wrong. I did in fact skip the parts of your script that work with the DTB, and I opted to just patch the kernel directly. Still, when I flash the factory image, boot.img fails to boot, so it's unable to boot into userspace fastboot to complete the flash, and so the flash fails.

I'm really not sure what else to try.

@ubergeek77 ubergeek77 changed the title add_magisk is not compatible with crosshatch builds add_magisk is not compatible with Android 10 Jan 5, 2020
@ubergeek77
Copy link
Author

ubergeek77 commented Jan 5, 2020

I've changed the title here as I've gotten new information/explaination from TopJohnWu himself about why this happens. I'd rather not ping him by linking directly to the issue, but the discussion between him and I is here on github at topjohnwu/Magisk, and its issue number 2214.

Based on his explanation and my tests, I've learned that this flash/boot failure is not because Magisk was added to boot.img incorrectly. Rather, it has something to do with the boot control HAL.

The reason a Magisk-patched factory image isn't installing correctly is because the part of the flash that fails needs to explicitly boot boot.img to get into userspace fastboot to finish flashing the system. Despite boot.img being perfectly bootable, the bootloader will refuse to boot it. By just having Magisk inside of the boot.img, the boot control HAL doesn't consider the boot.img to be "valid," even if the bootloader is locked, and even if the boot.img is signed. Not only does this prevent the second half of a factory image flash from proceeding (like I documented in this issue), it also prevents a normal boot from occurring (like I documented in the Magisk issue).

To make this even more confusing and frustrating, if you were to flash a factory image with a patched boot.img, fail to boot, and then flash a stock Google image, even that will fail to flash correctly, because the boot slot it tries to flash over either still isn't verified, didn't boot correctly at least once, or the max number of boot attempts has been reached. I didn't brick in this case, because I was somehow able to boot into recovery mode, which is considered a "successful boot" for boot.img, which then let me go ahead and flash the stock Google image again.

I really do encourage you to read the full issue 2214 over on the Magisk repository, because it provides some insight into how Magisk has to be added into ROMs like CHAOSP or RattlesnakeOS. If you want to patch this up to Android 10, it's worth the read.

Based on what I've observed and what TopJohnWu has told me in that issue, it's looking like:

  • You cannot embed Magisk into a factory image, period. Full-stop. Can't do it. There doesn't seem to be any getting around this.
  • You definitely need to "start" from an unpatched, no-Magisk image.
  • Assuming flashing OTA images doesn't carry the same boot control HAL limitations that a full factory image flash comes with, you might be able to:
    • Assign your custom AVB signing key via fastboot flash avb_custom_key
    • Install your unpatched, no-Magisk factory image, signed with that key
    • Lock the bootloader (will do a data wipe)
    • Flash a signed OTA image containing a signed boot.img with Magisk inside of it, since the bootloader is now locked, and this is the only way to deliver the Magisk boot.img
    • If you're lucky, the bootloader will actually boot with Magisk. Otherwise, if this doesn't work, I'm not sure what else could be done here.
    • Assuming that worked, since you can't allow Magisk to patch the boot.img for you since your bootloader is locked and the boot.img is signed, future updates would have to have Magisk contained in the OTA, and you would have to manually run bootctl mark-boot-successful for that inactive boot slot before rebooting, otherwise the boot would probably fail

You would have to be extremely careful with this, both when testing/developing ROM updates, and when just using the device. If you ever put yourself in a situation where the bootloader is locked, but both boot slots are unverified and therefore won't boot, even if the boot.img files themselves are perfectly bootable, you're screwed. You'd be bricked at that point. You wouldn't be able to boot into recovery mode or rescue mode, because for some asinine reason, Google has decided that both of those things should be handled by the same boot.img. Your only saving grace in this situation is that, you enabled "OEM bootloader unlocked" while booted into Android, meaning you can run fastboot flashing unlock, and try again. You'd lose your data at this point, though, but at least you wouldn't be bricked.

This is really disheartening to say the least. Seeing how easy things were on Android 9, I really didn't think this would be an entire ordeal. Even if this is figured out, and Magisk can be used with a locked bootloader, due to the boot control HAL nonsense, doing so is a lot more dangerous than it was on Android 9.

This sucks :/

@ubergeek77
Copy link
Author

ubergeek77 commented Jan 9, 2020

@CaseyBakey Good news!

I tested the process I outlined above, and I can confirm that you can add Magisk directly to an Android 10 build, but it cannot be added to the initial factory image.

If Magisk is properly added to the build, and a signed vbmeta.img is created against a modified boot.img, Magisk will be accepted so long as it is applied as an OTA update. In other words, you must flash an unpatched factory image first, then create an identical build to the one you just flashed, with Magisk included before vbmeta.img is generated, and install it as an OTA update.

RattlesnakeOS' own automatic background OTA updater can be used to install Magisk like this. Further, updates to Magisk can be applied via the OTA updater without needing to uninstall Magisk first, as you would in a normal OTA update. In absence of that, you can also install it via adb sideload from recovery.

I've done this test with an unlocked bootloader, as well as with a locked bootloader with a custom AVB key. Not only can Magisk be added to Android 10, it can be added with a locked bootloader.

I'm not 100% sure (according to my debug logs at least), but I believe AVB-related verification files are generated during "build_aosp," meaning this all happens before "add_magisk" would be normally called. Because of this, I've opted to remove "add_magisk" and instead patch build/make/core/Makefile before the build starts, so that Magisk is included in boot.img right before it is packed and signed. This way, I don't have to worry about including it afterwards and potentially running into signing mismatch, and I can be sure that the resulting boot.img will be accepted by a locked bootloader with a custom key.

If you like, I can submit a PR with the necessary changes to include Magisk into an Android 10 build, however, I haven't tested or modified the rest of your script, so I wouldn't be able to tell you what else needs to be changed for Android 10 compatibility.

@fltcaptriker
Copy link

@ubergeek77 I would be interested in seeing what you did to add magisk to rattlesnakeos as I am trying to do that now. Do you have the patch posted anywhere?

@ubergeek77
Copy link
Author

@fltcaptriker Please take a look here: https://github.com/ubergeek77/rattlesnakeos_scripts

You're welcome to take a look as well, CaseyBakey; perhaps my method could be incorporated into CHAOSP.

@fltcaptriker - since the two of us are beginning to talk more and more about RattlesnakeOS-specific stuff, let's continue our discussion on my issue tracker over there.

@CaseyBakey
Copy link
Owner

@CaseyBakey Good news!

I tested the process I outlined above, and I can confirm that you can add Magisk directly to an Android 10 build, but it cannot be added to the initial factory image.

If Magisk is properly added to the build, and a signed vbmeta.img is created against a modified boot.img, Magisk will be accepted so long as it is applied as an OTA update. In other words, you must flash an unpatched factory image first, then create an identical build to the one you just flashed, with Magisk included before vbmeta.img is generated, and install it as an OTA update.

RattlesnakeOS' own automatic background OTA updater can be used to install Magisk like this. Further, updates to Magisk can be applied via the OTA updater without needing to uninstall Magisk first, as you would in a normal OTA update. In absence of that, you can also install it via adb sideload from recovery.

I've done this test with an unlocked bootloader, as well as with a locked bootloader with a custom AVB key. Not only can Magisk be added to Android 10, it can be added with a locked bootloader.

I'm not 100% sure (according to my debug logs at least), but I believe AVB-related verification files are generated during "build_aosp," meaning this all happens before "add_magisk" would be normally called. Because of this, I've opted to remove "add_magisk" and instead patch build/make/core/Makefile before the build starts, so that Magisk is included in boot.img right before it is packed and signed. This way, I don't have to worry about including it afterwards and potentially running into signing mismatch, and I can be sure that the resulting boot.img will be accepted by a locked bootloader with a custom key.

If you like, I can submit a PR with the necessary changes to include Magisk into an Android 10 build, however, I haven't tested or modified the rest of your script, so I wouldn't be able to tell you what else needs to be changed for Android 10 compatibility.

I didn't have time to check on this yet but the "add_magisk" part was always called before the signature stuff took place.

It happened after "build_aosp" and before "release". So after the compilation, and before the signature. Which makes sense.

I don't know, yet, if Android 10 changed something about it this, but I don't think so.

So, yes, compile, patch (add Magisk), and then sign is the right way to do it.

When you say : "...but it cannot be added to the initial factory image."

I don't know if you're talking about factory images from Google. If this is the case, it makes sense since you didn't "modify" the vbmeta partition to take into account the modz you made to the boot image when patching it.

I never tried to mod stock images from Google. And I won't. They started to be different again (with Android 10 maybe), splitting "ota" and "images" zip, as you can see in the android-prepare-vendor script.

But adding Magisk to your own build shouldn't be a problem, as you saw when successing.

You have the source, you have the keys, no problem in sight ;-)

I'll take a look at your patch when I'll have time.

It seems we're "just" waiting for "full" configuration support on android-prepare-vendor, for Android 10 and the different devices.

And the OpenGapps for Android 10 :p

@ubergeek77
Copy link
Author

ubergeek77 commented Jan 20, 2020

When you say : "...but it cannot be added to the initial factory image."

I don't know if you're talking about factory images from Google. If this is the case, it makes sense since you didn't "modify" the vbmeta partition to take into account the modz you made to the boot image when patching it.

I never tried to mod stock images from Google. And I won't. They started to be different again (with Android 10 maybe), splitting "ota" and "images" zip, as you can see in the android-prepare-vendor script.

I don't mean Google's images, actually. I mean a fully custom image that gets built by the AOSP build process, where the image type is a factory image. For example, the AOSP build process produces a factory image and an OTA image. The factory image is the one with issues here.

At least on Android 10, if you perform a data wipe, the bootloader will reject the patched boot.img, even if it is properly signed and checksum-verifiable against a proper vbmeta image. This means that, on a factory image install, unless you remove the -w flag from the install, you will fail to boot. I suppose you could get away with installing another patched factory image without the -w flag, but at that point, it's no different than installing an OTA update, since your image would have to be very similar to the previous image, or the user data partition would cause issues. I've gone over the bootloader specifics above, and you can always go to TopJohnWu's comment on the Magisk issue I mentioned above for a more technical explanation. But the short of it is that, even if the checksums are all sane, the bootloader doesn't like something about how Magisk behaves during a data wipe.

So, generally speaking, if you try to install a fully-custom, non-Google, Magisk-patched factory image, you will fail to boot.

Go ahead and try flashing a patched factory image yourself. You'll see what I mean.

As for the stuff in the build process, yes, I did see that add_magisk is called before release. However, looking at a debug build log, it looks like some vbmeta stuff is happening at the very end of build_aosp, and at minimum, a verification hash is appended to the end of boot.img. Some of this stuff might be overwritten by release, but there is at least stuff happening before it. Because of that, I've opted to do this stuff higher up the build process, since even if it's not necessary, I find it to be a lot cleaner.

I'm sure you're already aware of this, but the main RattlesnakeOS project has already added support for Android 10, so I think android-prepare-vendor is already updated. But I could be wrong if you're waiting for something more specific.

@CaseyBakey
Copy link
Owner

Hi there,

Just to let you know that I have now time to "work" again on CHAOSP for Android 10.

I did encounter the error with add_magisk and I'm trying to work-around it my way ;)

I did check your commit and it also makes sense.

Right now, trying to flash a factory image, and the device try to reboot to fastboot user-land (I guess), and boot-loop. Looks like your issue a while back.

It makes no sense to me that we first need to flash a "clean" factory then sideload a Magisk-ed update. It should be possible to cook Magisk directly in the factory image.

I'll take a look at your claim regarding the fact that maybe there is some vbmeta/signature happening before the release function.

And yes, I know that RattlesnakeOS supports Android 10 since many months but that's not enough. I need that the android-prepare-vendor supports the "full" configuration (and not only the "naked" configuration) for the devices with API29 (Android 10). That's required to be able to add the OpenGapps to the build. The only people that managed to cook some "full" configuration is @typeproto187 (https://github.com/typeproto187/android-prepare-vendor/blob/10.0/taimen/config.json) and I would like to know how he managed to do this because it needs to be done for the other devices.

In the same time, OpenGapps now seems to support Android 10 also.

So, to have the same features on Android 10, CHAOSP is "just" missing the "full" configurations.

Regards.

@ubergeek77
Copy link
Author

ubergeek77 commented Feb 7, 2020

Great to hear! I'm glad you gave my repo a look.

As for the factory image thing, as unfortunate as it is, TopJohnWu was pretty clear in his explanation that there's no getting around it, since this behavior is caused by the closed-source boot control HAL. I encourage you to see what he had to say about it if you're interested.

It's also not exactly caused by the factory image itself - the factory reset is what causes that behavior. The fact that a normal factory image flash itself typically involves a factory reset makes it seem as though the factory image is the issue. The userspace fastboot part of Android 10's install process might also have something to do with it, since it wipes data first, then "boots" the system into userspace fastboot to complete the install. That is where the install typically fails in this sort of situation.

While I haven't tried to do this deliberately, in theory, if you were to install Magisk to both boot slots on a stock image and, perform a factory reset, and/or install a stock Google image, you would very likely see this behavior as well. All because the factory reset was performed while Magisk was present.

With that being said, you could absolutely incorporate Magisk into the initial factory image - you would just need to avoid performing a factory reset until your first legitimate successful boot, or until Magisk's "OTA update" feature forcibly sets the boot slot state to "successful."

Personally, I don't mind the extra installation step. In the long term, this juggling of the factory image and OTA update has to be done literally once. After that, it's easy enough to make sure the OTA update is handled carefully, and everything is fine after that.

Good luck! And welcome back!

@CaseyBakey
Copy link
Owner

@ubergeek77 @typeproto187
This issue is getting too long and not targeted enough.
Please follow up at #5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants