Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DyldCache: add the ability to iterate mappings and relocations #738

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

scollinson
Copy link

Add mapping and relocation iterators to the DyldCache class that handle modern dyld shared cache formats.

src/pod.rs Outdated
if (ptr as usize) % mem::align_of::<T>() != 0 {
return Err(());
}
// if (ptr as usize) % 8 != 0 {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to read certain structures from my local dyld shared cache because of this alignment requirement. My understanding is that the alignment should be that of the primitive types within the structure, but it appears even that isnt correct as only an alignment of 2 seemed to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which structures and fields in particular? If they are specific to dyld shared cache, then those fields should use types such as U32Bytes instead of U32. If it's Mach-O structures for example, then it may be that the dyld shared cache isn't aligned them the same way as normal files, and the unaligned feature of this crate is required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting it when reading the slide info:

                let slide_info_file_offset = self.slide_info_file_offset.get(endian);
                let version = data
                    .read_at::<U32<E>>(slide_info_file_offset)
                    .read_error("Invalid slide info file offset size or alignment")?
                    .get(endian);
                match version {
                    5 => {
                        let slide = data
                            .read_at::<macho::DyldCacheSlideInfo5<E>>(slide_info_file_offset)
                            .read_error("Invalid dyld cache slide info size or alignment")?;
                        let page_starts_offset = slide_info_file_offset
                            .checked_add(mem::size_of::<macho::DyldCacheSlideInfo5<E>>() as u64)
                            .unwrap();
                        let page_starts = data
                            .read_slice_at::<U16<E>>(
                                page_starts_offset,
                                slide.page_starts_count.get(endian) as usize,
                            )
                            .read_error("Invalid page starts size or alignment")?;
                        Ok(Some(DyldCacheSlideInfoSlice::V5(slide, page_starts)))
                    }
                    _ => todo!("handle other dyld_cache_slide_info versions"),
                }

which has this format:

pub struct DyldCacheSlideInfo5<E: Endian> {
    pub version: U32<E>,   // currently 5
    pub page_size: U32<E>, // currently 4096 (may also be 16384)
    pub page_starts_count: U32<E>,
    reserved1: u32,
    pub value_add: U64<E>,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the slide_info_file_offset you're seeing only has 32-bit alignment then value_add needs to change to U64Bytes. And if it only has 8-bit alignment then you need to change the U32 as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was actually just the alignment of the reserved1 field that was the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you're already using the unaligned feature then (which only affects U32 and U64, not u32), so the other fields technically should be changed too so that it works without that feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I am, unless it's the default? I have just added the dependency with cargo add object and changed the path to the local dir I have object checked out to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's the default.

Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to look in detail at these changes later, but there's a couple of things I would like changed which will probably affect some of the rest.

Cargo.toml Outdated Show resolved Hide resolved
src/macho.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented Oct 22, 2024

What testing have you done for this? Are you able to show me some code that demonstrates how all of this is used?

@scollinson
Copy link
Author

scollinson commented Oct 23, 2024

What testing have you done for this? Are you able to show me some code that demonstrates how all of this is used?

For testing I have been using a very simple script that runs the functions I am after:

use anyhow::{Context, Result};
use memmap2::Mmap;
use object::macho;
use object::read::macho::DyldCache;
use object::Endianness;
use std::fs;
use std::mem::forget;
use std::path::{Path, PathBuf};

fn map(path: PathBuf) -> Result<&'static [u8]> {
    let file = fs::File::open(path).context("failed to open cache path")?;
    let file = unsafe { Mmap::map(&file) }.context("failed to map cache file")?;
    let data = &*file as *const [u8];
    forget(file);
    let data = unsafe { data.as_ref() }.context("cache map is null")?;
    Ok(data)
}

fn main() -> Result<()> {
    let cache_root = Path::new("/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld");

    let cache = map(cache_root.join("dyld_shared_cache_arm64e"))?;
    let subcaches = &[map(cache_root.join("dyld_shared_cache_arm64e.01"))?];

    let cache = DyldCache::<Endianness>::parse(cache, subcaches)?;

    for reloc in cache.relocations() {
        if let Some(ref auth) = reloc.auth {
            match (auth.key, auth.diversity, auth.addr_div) {
                (macho::PtrauthKey::IA, 0u16, false) => {
                    dbg!(reloc);
                }
                _ => {}
            }
        }
    }

    for mapping in cache.mappings() {
        dbg!(mapping);
    }

    Ok(())
}

I'm not sure how to do any testing with CI as the files are prohibitively large to put in the test binaries repo. I've done a lot of manual comparison with the output of ipsw dyld slide /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e --json --auth | jq '.[] | select(.pointer.authenticated == true and .pointer.key == "IA" and .pointer.addr_div == false and .pointer.diversity == 0)'.

@philipc
Copy link
Contributor

philipc commented Oct 24, 2024

I'm having trouble understanding the use of DyldCache::mappings and DyldCache::relocations. Why do you want to add these? For example, I don't see how the relocation returned by this iterator could be used, because its offset is relative to a mapping, but the iterator discards all association with that mapping.

@scollinson
Copy link
Author

Oh yes I can see why that's a little awkward. I should add the address to the DyldRelocation. My goal is to use the mapping (address, protection, data) and relocations (taints, signatures) in a symbolic execution project.

@philipc
Copy link
Contributor

philipc commented Oct 24, 2024

I've looked a bit more and DyldCache::mappings is probably okay. I was thinking we exposed the subcaches but it seems we don't. That's something we could consider though (add DyldCache::subcaches instead of DyldCache::mappings).

I still don't see the reason for DyldCache::relocations though. Why can't the user call DyldCache::mappings and then call DyldCacheMapping::relocations for each mapping?

@scollinson
Copy link
Author

I still don't see the reason for DyldCache::relocations though. Why can't the user call DyldCache::mappings and then call DyldCacheMapping::relocations for each mapping?

They can do both, and DyldCache::relocations is convenient for when you don't want to know anything about the mappings, though it is a little useless in that case without the address, as you've pointed out. I'll add this now.

@philipc
Copy link
Contributor

philipc commented Oct 24, 2024

DyldCache::relocations is convenient for when you don't want to know anything about the mappings

Can we instead design it so that it's easy for the user to do something like cache.mappings().flat_map(DyldCacheMapping::relocations)? Turning nested iterators into a boxed chain of iterators is the wrong way to do this.

@scollinson
Copy link
Author

Sorry, rust is quite new for me and I'm not sure what is wrong with that, or what the best approach is. The requirement to box the iterators came because the mappings optionally contain slide information (based on the version of the mapping info, or the mapping itself), not because the iterators are nested?

@scollinson
Copy link
Author

Removing the reference from self for DyldCacheMapping::relocations seems to make your example with flatten work. So I presume the idea from there is to remove DyldCache::relocations?

DyldCache::mappings would still return a boxed iterator, is that an issue also?

@philipc
Copy link
Contributor

philipc commented Oct 26, 2024

It's doing a bunch of memory allocations for something that shouldn't need any memory allocations at all. So while it works, it seems to me that it could be designed better.

So DyldCache::mappings could be changed to this:

    pub fn mappings<'cache>(
        &'cache self,
    ) -> impl Iterator<Item = DyldCacheMapping<'data, E, R>> + 'cache {
        self.mappings.iter().chain(self.subcaches.iter().flat_map(|subcache| subcache.mappings.iter()))
    }

I did try doing the same thing for DyldCache::relocations, but we need to name the 'data lifetime somehow and I'm not sure how. I can look into it more later.

    pub fn relocations<'cache>(&'cache self) -> impl Iterator<Item = DyldRelocation> + 'cache {
        self.mappings().flat_map(|mapping| mapping.relocations())
    }

Or we could leave out DyldCache::relocations, but I would prefer to understand it better first.

I'm sure we could replace the impl Iterator with our own type, but that's a bit more verbose.

@scollinson
Copy link
Author

Have pushed a change so that the following now works:

cache
        .mappings()
        .map(DyldCacheMapping::relocations)
        .flatten()

But I am not sure how to reimplement relocations for the same reason as you said.

@philipc
Copy link
Contributor

philipc commented Oct 27, 2024

The 'data error is a known limitation of rust that will be fixed in the 2024 edition. Using a workaround, we could do this:

pub trait Captures<'a> { }
impl<'a, T: ?Sized> Captures<'a> for T { }

impl<'data, E, R> DyldCache<'data, E, R>
where
    E: Endian,
    R: ReadRef<'data>,
{
    /// Return all the relocations in this cache.
    pub fn relocations<'cache>(
        &'cache self,
    ) -> impl Iterator<Item = DyldRelocation> + Captures<'cache> + Captures<'data> {
        self.mappings().flat_map(DyldCacheMapping::relocations)
    }
}

Defining our own iterator instead of using flat_map would also workaround it, but we'd have to define an iterator for the mappings too.

I'd prefer to just leave this out instead of doing workarounds. Is that okay? I don't think that writing the flat_map yourself is too hard.

crates/examples/src/readobj/macho.rs Outdated Show resolved Hide resolved
@@ -282,6 +283,33 @@ pub const VM_PROT_WRITE: u32 = 0x02;
/// execute permission
pub const VM_PROT_EXECUTE: u32 = 0x04;

#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PtrauthKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't part of the file format that I can see from the usage in this PR. I think it should be moved into read::macho, and probably given a better name. I assume this is specific to arm, so it would be good to have that in the name.

}

/// Pointer auth data
pub struct Ptrauth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for PtrauthKey.

/// UNUSED: moved to imagesOffset to prevent older dsc_extarctors from crashing
pub images_offset_old: U32<E>,
/// UNUSED: moved to imagesCount to prevent older dsc_extarctors from crashing
pub images_count_old: U32<E>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, and furthermore the old names are now the names of fields in a different position, so this could silently break users. Which of the new fields that you have added do you actually need?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how apple rolls. I've updated this structure to match the latest dyld source. Whenever interacting with apple structures like this I would not rely on structure locations to stay the same, but write wrappers around where they are used that handles the change in their usage in dyld. I figure that's roughly why the functions in the read module exist?

src/macho.rs Outdated Show resolved Hide resolved
let file_offset: u64 = info.file_offset.get(*endian) + mapping_offset + offset;
let pointer: macho::DyldCacheSlidePointer5 = data
.read_at::<U64<E>>(file_offset)
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use unwrap. This crate must never panic for bad input data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the use of unwrap, but noticed there is one stray in DyldCache::parse that might have been missed previously.

src/read/macho/dyld_cache.rs Show resolved Hide resolved
let sc_header = macho::DyldCacheHeader::<E>::parse(data)?;
if &sc_header.uuid != uuid {
let header = macho::DyldCacheHeader::<E>::parse(data)?;
if &header.uuid != uuid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems neater when the variable is only used once. Can revert if you'd like?

src/read/macho/dyld_cache.rs Outdated Show resolved Hide resolved
src/read/macho/dyld_cache.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I appreciate the contribution. There's breaking changes here so I plan to do a release before merging, and I will try to do some follow up work at that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants