Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Host I/O operations #66

Merged
merged 13 commits into from
Aug 20, 2021
Merged

Support Host I/O operations #66

merged 13 commits into from
Aug 20, 2021

Conversation

bet4it
Copy link
Contributor

@bet4it bet4it commented Jul 30, 2021

Description

Support Host I/O operations

Refer: #32

API Stability

  • This PR does not require a breaking API change

Checklist

  • Implementation
    • cargo build compiles without errors or warnings
    • cargo clippy runs without errors or warnings
    • cargo fmt was run
    • All tests pass
  • Documentation
    • rustdoc + approprate inline code comments
    • Updated CHANGELOG.md
    • (if appropriate) Added feature to "Debugging Features" in README.md
  • If implementing a new protocol extension IDET
    • Included a basic sample implementation in examples/armv4t
    • Included output of running examples/armv4t with RUST_LOG=trace + any relevant GDB output under the "Validation" section below
    • Confirmed that IDET can be optimized away (using ./scripts/test_dead_code_elim.sh and/or ./example_no_std/check_size.sh)
    • OR Implementation requires adding non-optional binary bloat (please elaborate under "Description")
  • If upstreaming an Arch implementation
    • I have tested this code in my project, and to the best of my knowledge, it is working as intended.

Validation

GDB output
(gdb) remote get /tmp/remote.txt /tmp/local.txt
Successfully fetched file "/tmp/remote.txt".
(gdb) remote put /tmp/local.txt /tmp/remote.txt
Successfully sent file "/tmp/local.txt".
(gdb) remote delete /tmp/remote.txt
Successfully deleted file "/tmp/remote.txt"
armv4t output
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/examples/armv4t`
loading section ".text" into memory from [0x55550000..0x55550078]
Setting PC to 0x55550000
Waiting for a GDB connection on "127.0.0.1:9001"...
Debugger connected from 127.0.0.1:36992
 TRACE gdbstub::protocol::recv_packet > <-- +
 TRACE gdbstub::protocol::recv_packet > <-- $qSupported:multiprocess+;swbreak+;hwbreak+;qRelocInsn+;fork-events+;vfork-events+;exec-events+;vContSupported+;QThreadEvents+;no-resumed+;xmlRegisters=i386#6a
 TRACE gdbstub::protocol::response_writer > --> $PacketSize=1000;vContSupported+;multiprocess+;QStartNoAckMode+;ReverseContinue+;ReverseStep+;QDisableRandomization+;QEnvironmentHexEncoded+;QEnvironmentUnset+;QEnvironmentReset+;QStartupWithShell+;QSetWorkingDir+;swbreak+;hwbreak+;QCatchSyscalls+;qXfer:features:read+;qXfer:memory-map:read+#cc
 TRACE gdbstub::protocol::recv_packet     > <-- +
 TRACE gdbstub::protocol::recv_packet     > <-- $vMustReplyEmpty#3a
 INFO  gdbstub::gdbstub_impl              > Unknown command: vMustReplyEmpty
 TRACE gdbstub::protocol::response_writer > --> $#00
 TRACE gdbstub::protocol::recv_packet     > <-- +
 TRACE gdbstub::protocol::recv_packet     > <-- $QStartNoAckMode#b0
 TRACE gdbstub::protocol::response_writer > --> $OK#9a
 TRACE gdbstub::protocol::recv_packet     > <-- +
 TRACE gdbstub::protocol::recv_packet     > <-- $Hgp0.0#ad
 TRACE gdbstub::protocol::response_writer > --> $OK#9a
 TRACE gdbstub::protocol::recv_packet     > <-- $qXfer:features:read:target.xml:0,ffb#79
 TRACE gdbstub::protocol::response_writer > --> $m<?xml version="1.0"?>
<!DOCTYPE target SYSTEM "gdb-target.dtd">
<target version="1.0">
    <architecture>armv4t</architecture>
    <feature name="org.gnu.gdb.arm.core">
        <vector id="padding" type="uint32" count="25"/>

        <reg name="r0" bitsize="32" type="uint32"/>
        <reg name="r1" bitsize="32" type="uint32"/>
        <reg name="r2" bitsize="32" type="uint32"/>
        <reg name="r3" bitsize="32" type="uint32"/>
        <reg name="r4" bitsize="32" type="uint32"/>
        <reg name="r5" bitsize="32" type="uint32"/>
        <reg name="r6" bitsize="32" type="uint32"/>
        <reg name="r7" bitsize="32" type="uint32"/>
        <reg name="r8" bitsize="32" type="uint32"/>
        <reg name="r9" bitsize="32" type="uint32"/>
        <reg name="r10" bitsize="32" type="uint32"/>
        <reg name="r11" bitsize="32" type="uint32"/>
        <reg name="r12" bitsize="32" type="uint32"/>
        <reg name="sp" bitsize="32" type="data_ptr"/>
        <reg name="lr" bitsize="32"/>
        <reg name="pc" bitsize="32" type="code_ptr"/>

        <!--
            For some reason, my version of `gdb-multiarch` doesn't seem to
            respect "regnum", and will not parse this custom target.xml unless I
            manually include the padding bytes in the target description.

            On the bright side, AFAIK, there aren't all that many architectures
            that use padding bytes. Heck, the only reason armv4t uses padding is
            for historical reasons (see comment below).

            Odds are if you're defining your own custom arch, you won't run into
            this issue, since you can just lay out all the registers in the
            correct order.
        -->
        <reg name="padding" type="padding" bitsize="32"/>

        <!-- The CPSR is register 25, rather than register 16, because
        the FPA registers historically were placed between the PC
        and the CPSR in the "g" packet. -->
        <reg name="cpsr" bitsize="32" regnum="25"/>
    </feature>
    <feature name="custom-armv4t-extension">
        <!--
            maps to a simple scratch register within the emulator. the GDB
            client can read the register using `p }custom` and set it using
            `set }custom=1337`
        -->
        <reg name="custom" bitsize="32" type="uint32"/>

        <!--
            pseudo-register that return the current time when read.

            notably, i've set up the target to NOT send this register as part of
            the regular register list, which means that GDB will fetch/update
            this register via the 'p' and 'P' packets respectively
        -->
        <reg name="time" bitsize="32" type="uint32"/>
    </feature>
</target>#0f
 TRACE gdbstub::protocol::recv_packet     > <-- $qXfer:features:read:target.xml:aa4,ffb#3f
 TRACE gdbstub::protocol::response_writer > --> $l#6c
 TRACE gdbstub::protocol::recv_packet     > <-- $qTStatus#49
 INFO  gdbstub::gdbstub_impl              > Unknown command: qTStatus
 TRACE gdbstub::protocol::response_writer > --> $#00
 TRACE gdbstub::protocol::recv_packet     > <-- $?#3f
 TRACE gdbstub::protocol::response_writer > --> $S05#b8
 TRACE gdbstub::protocol::recv_packet     > <-- $qfThreadInfo#bb
 TRACE gdbstub::protocol::response_writer > --> $mp01.01#cd
 TRACE gdbstub::protocol::recv_packet     > <-- $qsThreadInfo#c8
 TRACE gdbstub::protocol::response_writer > --> $l#6c
 TRACE gdbstub::protocol::recv_packet     > <-- $qAttached:1#fa
GDB queried if it was attached to a process with PID 1
 TRACE gdbstub::protocol::response_writer > --> $1#31
 TRACE gdbstub::protocol::recv_packet     > <-- $Hc-1#09
 TRACE gdbstub::protocol::response_writer > --> $OK#9a
 TRACE gdbstub::protocol::recv_packet     > <-- $qC#b4
 INFO  gdbstub::gdbstub_impl              > Unknown command: qC
 TRACE gdbstub::protocol::response_writer > --> $#00
 TRACE gdbstub::protocol::recv_packet     > <-- $g#67
 TRACE gdbstub::protocol::response_writer > --> $00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000107856341200005555xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000000078563412#0a
 TRACE gdbstub::protocol::recv_packet     > <-- $qfThreadInfo#bb
 TRACE gdbstub::protocol::response_writer > --> $mp01.01#cd
 TRACE gdbstub::protocol::recv_packet     > <-- $qsThreadInfo#c8
 TRACE gdbstub::protocol::response_writer > --> $l#6c
 TRACE gdbstub::protocol::recv_packet     > <-- $qXfer:memory-map:read::0,ffb#18
 TRACE gdbstub::protocol::response_writer > --> $m<?xml version="1.0"?>
<!DOCTYPE memory-map
    PUBLIC "+//IDN gnu.org//DTD GDB Memory Map V1.0//EN"
            "http://sourceware.org/gdb/gdb-memory-map.dtd">
<memory-map>
    <memory type="ram" start="0x0" length="0x100000000"/>
</memory-map>#76
 TRACE gdbstub::protocol::recv_packet     > <-- $qXfer:memory-map:read::f4,ffb#82
 TRACE gdbstub::protocol::response_writer > --> $l#6c
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,4#35
 TRACE gdbstub::protocol::response_writer > --> $00000000#7e
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,4#35
 TRACE gdbstub::protocol::response_writer > --> $00000000#7e
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,2#5f
 TRACE gdbstub::protocol::response_writer > --> $04b0#f6
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffe,2#35
 TRACE gdbstub::protocol::response_writer > --> $0000#7a
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,2#33
 TRACE gdbstub::protocol::response_writer > --> $0000#7a
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,2#5f
 TRACE gdbstub::protocol::response_writer > --> $04b0#f6
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffe,2#35
 TRACE gdbstub::protocol::response_writer > --> $0000#7a
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,2#33
 TRACE gdbstub::protocol::response_writer > --> $0000#7a
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,4#35
 TRACE gdbstub::protocol::response_writer > --> $00000000#7e
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m5554fffc,4#35
 TRACE gdbstub::protocol::response_writer > --> $00000000#7e
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m55550000,4#61
 TRACE gdbstub::protocol::response_writer > --> $04b02de5#26
 TRACE gdbstub::protocol::recv_packet     > <-- $m0,4#fd
 TRACE gdbstub::protocol::response_writer > --> $00000000#7e
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:setfs:0#bf
 TRACE gdbstub::protocol::response_writer > --> $F0#76
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:open:2f746d702f72656d6f74652e747874,0,0#2c
 TRACE gdbstub::protocol::response_writer > --> $F00#a6
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:pread:0,1000,0#ef
 TRACE gdbstub::protocol::response_writer > --> $F06;origin#6f
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:pread:0,1000,6#f5
 TRACE gdbstub::protocol::response_writer > --> $F00;#e1
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:close:0#b0
 TRACE gdbstub::protocol::response_writer > --> $F0#76
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:open:2f746d702f72656d6f74652e747874,601,1c0#27
 TRACE gdbstub::protocol::response_writer > --> $F00#a6
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:pwrite:0,0,changed#87
 TRACE gdbstub::protocol::response_writer > --> $F07#ad
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:close:0#b0
 TRACE gdbstub::protocol::response_writer > --> $F0#76
 TRACE gdbstub::protocol::recv_packet     > <-- $vFile:unlink:2f746d702f72656d6f74652e747874#53
 TRACE gdbstub::protocol::response_writer > --> $F0#76

@bet4it
Copy link
Contributor Author

bet4it commented Jul 30, 2021

Some questions:

  1. Should functions like open, close, setfs return i64, or it's better to use Option<u64> or something?

  2. I want to use gdbstub::common::Pid as the argument of get_exec_file, but how can decode_hex be used on NonZeroUsize?

@daniel5151
Copy link
Owner

Ah, what a pleasant surprise! Host I/O is a feature I've been meaning to implement for a while now - thanks for taking a crack at it!

Before I dive into the PR (either later today, or at some point this weekend), the first thing I'll have to ask is that you split this PR into two separate ones - one for exec-file, and another for Host I/O. I suspect the former will be much easier to merge than the latter, as host I/O seems like something that'll require some more iteration.

@bet4it
Copy link
Contributor Author

bet4it commented Jul 30, 2021

In fact, I implement Host I/O firstly and then exec-file. Host I/O can be implemented separately, but exec-file can't. If we implement exec-file, gdb will request to read the file by Host I/O automatically.

@daniel5151
Copy link
Owner

daniel5151 commented Jul 30, 2021

That's fine. Even if the GDB client is doing something weird and trying to use unsupported Host I/O packets to fetch the reported exec-file, the worst thing that would happen is that gdbstub responds with some "unknown command" packets, and the GDB client aborts it's read request.

We should still be able to implement exec-file separately, even if it'd only be "useful" in conjunction with Host I/O support.

@bet4it
Copy link
Contributor Author

bet4it commented Jul 30, 2021

Do I need to send another PR, or just split this commit into two?

@daniel5151
Copy link
Owner

Thanks.

I'd prefer it if you sent a separate PR.
I'd recommend retrofitting this PR to focus on Host I/O only, and then opening a new PR that only implements exec-file.

@bet4it bet4it changed the title Support exec-file and Host I/O Support Host I/O operations Aug 2, 2021
@daniel5151 daniel5151 self-requested a review August 2, 2021 16:04
Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the inline comments, I also had a general question:

Could you also implement the few remaining Host I/O operations, for completeness sake? Namely: pwrite, fstat, unlink, and readlink. They shouldn't require much more effort, and I'm not sure if I'd merge the PR without them.

examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
src/protocol/commands.rs Outdated Show resolved Hide resolved
src/protocol/commands.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_open.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_open.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_pread.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_pread.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Show resolved Hide resolved
@bet4it
Copy link
Contributor Author

bet4it commented Aug 4, 2021

Aside from the inline comments, I also had a general question:

Could you also implement the few remaining Host I/O operations, for completeness sake? Namely: pwrite, fstat, unlink, and readlink. They shouldn't require much more effort, and I'm not sure if I'd merge the PR without them.

Yes, I can implement them, but I don't know how to test them. I never meet a situation which gdb will request such functions.
The most common usage of Host I/O is requesting remote files to get symbols, which won't change files on the remote, so functions like pwrite and unlink will never be called.
And if I implement them, the users who want to use Host I/O must implement all of them even they don't need it, they must fill the empty stub. Hope there is a way that these functions could be optional when users implement Host I/O .

@daniel5151
Copy link
Owner

Yes, I can implement them, but I don't know how to test them. I never meet a situation which gdb will request such functions.

I noodled around a bit in gdb while connected to a local gdbserver instance running a hello-world binary, and I think I managed to hit all the vFile packets in some capacity, just by typing out various info proc and remote commands.

I'm not entirely sure why the client sends fstat or readlink, but I can reliably trigger unlink and pwrite by using the GDB remote get/put/delete commands to read/write arbitrary remote files.

And if I implement them, the users who want to use Host I/O must implement all of them even they don't need it, they must fill the empty stub. Hope there is a way that these functions could be optional when users implement Host I/O .

This is a good point, and notably, it also applies to the PR in its current form as well. i.e: there's no reason any particular subset of vFile operations must be implemented together, as they are all building blocks from which more advanced functionality is derived (e.g: fetching the process's mappings). From the specs' perspective, the target is free to implement as many or as few of the operations as it likes, and the GDB client just has to work around a target's provided feature set.

So, with that in mind, I see two different approaches on how to tackle this:

  1. make each vFile operation its own sub-IDET, similar to those in ExtendedMode or Breakpoints. This would be quite verbose, but it would accurately reflect the optional nature of each operation, and result in optimal codegen (i.e: the final binary wouldn't include any extra packet parsing code for unimplemented vFile packets)
  2. "cheat", and just have a HostIoError::Unimplemented variant which would be used to provide a default impl for each handler method. This would be significantly less verbose than having separate IDETs for each vFile operation, but it would mean that anyone using the vFile IDET would also pull in packet parsing code for packets they aren't necessarily using.

While it's certainly more verbose, I think we should go with the first option. It'd be consistent with other existing IDETs in gdbstub, and it does have strictly better codegen + API safety properties.

src/common.rs Outdated Show resolved Hide resolved
src/common.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_open.rs Outdated Show resolved Hide resolved
src/protocol/console_output.rs Outdated Show resolved Hide resolved
src/gdbstub_impl/ext/host_io.rs Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing the missing vFile operations, and for integrating my feedback so effectively! I suspect we are only a few more iterations away from having this PR ready to merge :)

A couple general comment about the armv4t example:

  • please include at-least stub implementations for each of the vFile methods, just so it's easy to validate that they are in-fact being hit. this could be as simple as logging the provided parameters via eprintln!, and then returning Ok.
    • this eprintln! logging should be added to the existing open, pwrite, and close methods as well.
  • Feel free to leave in the pseudo-"proc fs" example, but please add a comment explaining why it's here (i.e: to support info proc mappings)
  • Please add a "scratch" memory-backed file that can be used to validate open/write/unlink. i.e: tweak Emu to have a pub(crate) scratch: Option<Vec<u8>> member, have it default-initialized to some sample string (e.g: Some(b"sample scratch vFile".to_vec())), and then write up the pread/pwrite/unlink handlers appropriately.

src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/mod.rs Show resolved Hide resolved
@daniel5151 daniel5151 self-requested a review August 6, 2021 16:10
src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_setfs.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
@daniel5151
Copy link
Owner

Now that we're getting close to the finish line, it's also a good time to update the # Validation section of the PR with the latest changes. Please make sure the logs clearly show each of the new handlers getting hit, and returning a reasonable response.

@bet4it
Copy link
Contributor Author

bet4it commented Aug 12, 2021

I want to implement real filesystem access in example, which may help people who want to use this feature.

There are two ways to do it:

  1. Borrow codes in https://github.com/luser/rust-gdb-remote-protocol/blob/master/src/libc_fs.rs, which will introduce libc dependency.
  2. Written it with std::fs::File. But I couldn't cover all possible combinations of HostIoOpenFlags and HostIoOpenMode, and it's troublesome to create a HostIoStat structure.

What do you think about it?

Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on my comment regarding the API docs: one of my goals with gdbstub is to present an API + docs that don't require users to go down the rabbit-hole of reading the actual GDB RSP. i.e; if you've never used the GDB RSP before, you should be able to get up and running with gdbstub without having an intimate knowledge on the underlying protocol.

This is primarily accomplished via two complimentary avenues:

  1. wrangling the spartan GDP RSP C-style APIs (i.e: raw numbers + invariants enforced by "comments") into rich Rust APIs (structs + enums that have invariants enforced by types)
  2. extensively documenting the APIs, such that all invariants that aren't implicitly enforced by the Rust typesystem are explicitly documented (e.g: "return number of bytes read")

Whenever possible, it's better to encode invariants in the type system, which we've managed to do pretty well with this latest API iteration. i.e: the return type of HostIoResult<(), Self> replaces the need to explicitly document "this method must return zero on success, or -1 + errno on error" - the Rust type system makes it impossible to return some kind of invalid value from the method.

Hopefully this provides some insight and guidance into why we've been iterating on the API so heavily. With each iteration, we've gradually shifted from the initial C-style API in your original PR, into a rich Rust-style API.

It's really a microcosm for writing low-level Rust code in general: why even write things in Rust if you're just going to end up writing code that looks like C 😄

src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
Cargo.toml Show resolved Hide resolved
@daniel5151
Copy link
Owner

I want to implement real filesystem access in example, which may help people who want to use this feature.

I think this would be a great idea!

One thing I should mention right off the bat though: you'd probably need to include some kind of shim for /proc/ access when integrating it into the armv4t example, since if you naively plumb everything through as-is, the GDB client will try and read data for pid 1, which is most certainly not the code running inside the emulator.

But with that out of the way: yes, I think it'd be nice to have some kind of in-tree "helper" implementation for implementing vFile (that would most-likely have to be feature-gated behind std, or even a dedicated host_io_helper feature)

e.g: something like

pub struct HostIoHelper<T: Target> {
    _target: PhantomData<T>,
    ...
}

impl<T: Target> HostIoHelper<T> {
    pub fn new() -> HostIoHelper { .. }

    pub fn pread<'a>(
        &mut self,
        fd: u32,
        count: usize,
        offset: usize,
        output: HostIoOutput<'a>,
    ) -> HostIoResult<HostIoToken<'a>, T> {
        // ... impl
    }

    // etc... for all other methods
}

// and then in a target's vFile impl

impl target::ext::host_io::HostIoPread for MyTarget {
    fn pread<'a>(
        &mut self,
        fd: u32,
        count: u32,
        offset: u32,
        output: HostIoOutput<'a>,
    ) -> HostIoResult<HostIoToken<'a>, Self> {
        // assuming they've already instantiated HostIoHelper earlier
        self.host_io_helper.pread(fd, count, offset, output)
    }
}

Now, as for whether to use libc directly or std::fs... I think I'd be alright with using libc directly here. In this case, we'd definitely want this helper to live under a separate host_io_helper feature, which would also toggle the libc dependency on/off.

Some final considerations:

  • If you're going with the libc approach, it might make sense to maintain a HashSet of allocated open fd's as part of the HostIoHelper struct, and make sure that all file handles are closed if the struct is dropped (i.e: even if the GDB connection doesn't close gracefully, we don't want to leak resources)
  • This work does not need to be part of this PR, and I don't mind leaving this to a followup effort (especially since it'll likely require a few more rounds of iteration, as we nail down the specifics of the API + use of libc FFI / unsafe)

@bet4it
Copy link
Contributor Author

bet4it commented Aug 13, 2021

I do some searching in gdb's source code.

hostio_open is only invoked in three places:
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gdb/remote.c#L12364-L12366
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gdb/remote.c#L12532-L12535
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gdb/remote.c#L12618-L12620

fileio_readlink is only used to get /proc/pid/cwd and /proc/pid/exe
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gdb/linux-tdep.c#L814
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gdb/linux-tdep.c#L824

And the call path to fileio_fstat:
bfd_get_size/bfd_get_mtime -> bfd_stat -> gdb_bfd_iovec_fileio_fstat -> target_fileio_fstat -> remote_target::fileio_fstat

So we don't need to cover all possible combinations of HostIoOpenFlags and HostIoOpenMode, HostIoOpenFlags only could be O_RDONLY or O_WRONLY | O_CREAT | O_TRUNC, and we only need to fill st_size and st_mtime in HostIoStat.

@bet4it
Copy link
Contributor Author

bet4it commented Aug 15, 2021

I think the implementation now provides a good example for users.

What, /proc/? That should be considered by users. If they don't handle it properly, they shouldn't use commands like info proc mappings.

@daniel5151
Copy link
Owner

daniel5151 commented Aug 15, 2021

I strongly approve we implement the helper in another PR. But in this PR we need some implementations, and I think the current implementation backed by std::fs is acceptable (after I complement the implementation of mode with OpenOptions).

Alright, I relent, we can keep this current implementation (with the one caveat below) 🙂
In a follow-up PR, we'll extract this implementation out of armv4t, and into a generic HostIoHelper struct.

What, /proc/? That should be considered by users. If they don't handle it properly, they shouldn't use commands like info proc mappings.

I still feel strongly that in the example code, we should at least do something like:

// disallow access to /proc/, as this would give bogus reading when running `info proc <...>`
if filename.starts_with(b"/proc") {
    return Err(HostIoError::Errno(HostIoErrno::ENOENT)); // or some other appropriate error code
}

It's not that much extra code, and it's a nice safety-guard against folks shooting themselves in the foot and not handling /proc/ access properly.

@daniel5151 daniel5151 marked this pull request as ready for review August 15, 2021 17:24
@daniel5151 daniel5151 self-requested a review August 15, 2021 17:24
Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, just gave one more holistic pass over the PR, and left some comments. Mostly nit / doc related stuff.

I'm super excited that we're only one or two iterations away from merging this bad boy!

examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
Comment on lines 55 to 58
let path = match std::str::from_utf8(filename) {
Ok(v) => v,
Err(_) => return Err(HostIoError::Errno(HostIoErrno::ENOENT)),
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, we should probably use std::ffi::OsStr here, and then have some platform-specific cfg blocks to decide how to construct the OsStr (e.g: from_bytes on Unix, something else on Windows, etc...)

If you want to try and do things the Right Way in this PR, feel free to take a crack at it. Otherwise, this approach is fine for now, and we can punt the nitty-gritty details of how to properly handle this to the follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't rust have a universal way to handle with filename?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike other languages, which try to pave over platform differences when possible (Go is one example that comes to mind), Rust takes the approach of "abstract when possible, but use platform-specific behavior when a reasonable abstraction is impossible".

Essentially, Rust gives you the flexibility to be as "correct" as you'd like, without implicitly locking you in to some kind of built-in abstraction.

The "easy" approach that many applications use is to convert paths into str, and then wrap those in a std::path::Path. I'd wager that this works fine 95% of the time, which is why I don't mind taking this approach here.

It's moreso that there are some file paths that aren't valid strs on Unix/Windows, and to be 100% correct, there should be platform-specific logic to convert raw &[u8] buffers into platform-specific std::ffi::OsStr values (which can then be wrapped in std::path::Path).

...but that's hard, and the extra 5% might not be worth it. I just thought I'd point it out, moreso as an opportunity for "learning" rather than as something we'd actually want to dig into / properly handle.

examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_setfs.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
@bet4it
Copy link
Contributor Author

bet4it commented Aug 16, 2021

About the stat struct:

How GDB creates it on Windows:
https://github.com/bminor/binutils-gdb/blob/gdb-10.2-release/gnulib/import/stat.c#L312-L320

And in this page https://sourceware.org/gdb/onlinedocs/gdb/struct-stat.html:

...
st_ino
    No valid meaning for the target. Transmitted unchanged.
...
st_uid
st_gid
st_rdev
    No valid meaning for the target. Transmitted unchanged.
...

Does this mean these fields are not used?

@bet4it
Copy link
Contributor Author

bet4it commented Aug 16, 2021

About the transformation from std::io::Error to HostIoError:

There are more ErrorKind variants in rust nightly now. Should we use them now or want until they are in stable?

Refer: rust-lang/rust#86442

examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
@daniel5151
Copy link
Owner

daniel5151 commented Aug 16, 2021

About the stat struct:

How GDB creates it on Windows:

Sure, if we're going to try and pave over platform-specific stuff, might as well do what GDB does. As part of the implementation, make sure to leave a comment "citing your sources" for why certain fields are being filled with dummy-data.

Does this mean these fields are not used?

I'm not 100% sure, but I think the wording here is written from the perspective of the other kind of I/O packets defined by the GDB RSP - i.e: target-initiated file I/O. If you read it from that perspective, it makes a bit more sense, as the target will be reading data about the host's files, and things like "inode number" or "host-side uid/gid" won't have much meaning to the target.

If you want to dig into the reference GDB client / gdbserver implementation to see what it does when implementing the vFile:stat command, feel free to do so, as that might give us some insight into how we might implement these fields on non-POSIX platforms.

About the transformation from std::io::Error to HostIoError:

There are more ErrorKind variants in rust nightly now. Should we use them now or want until they are in stable?

The ErrorKind enum is marked non-exhaustive, which means that it's expected that new reasons will be added over time. Given that gdbstub is a stable-oriented library, I'd stick to the current stable mapping, and once this PR is merged in, we could open a tracking issue that tracks the upstream issue, implementing those additional mappings once the new variants become available on stable.

@bet4it
Copy link
Contributor Author

bet4it commented Aug 17, 2021

make sure to leave a comment "citing your sources" for why certain fields are being filled with dummy-data

Which source? The link I provided seems not be so persuasive🧐

@bet4it
Copy link
Contributor Author

bet4it commented Aug 17, 2021

Just find that we can't use pwrite with binary data because of this:

// validate that the body is valid ASCII
if !body.is_ascii() {
return Err(PacketParseError::NotAscii);
}

And when RUST_LOG=trace is enabled, pread with binary data will cause a panic:
core::str::from_utf8(&self.msg).unwrap(), // buffers are always ascii

pwrite with binary data will lead to log of <invalid packet>

@daniel5151
Copy link
Owner

make sure to leave a comment "citing your sources" for why certain fields are being filled with dummy-data

Which source? The link I provided seems not be so persuasive🧐

Oh, I don't have anything specific in mind. What I meant to say is that you should make sure to leave a inline comment that explains why the specific data is being stubbed out, potentially citing sources if need be.

Just find that we can't use pwrite with binary data

Oh shit. Would you look at that. It seems that past-me assumed that all GDB RSP packets would be 7-bit clean, when the reality is that some of the later packets (such as vFile) are able to transmit raw binary data over the wire as well...

It's a good thing you caught this now, because this is a serious bug that we need to address!


To add to those examples you gave, we'd also have to consider

info!("Unknown command: {}", core::str::from_utf8(cmd).unwrap());

I did a quick audit of the gdbstub codebase, and it seems like it should be safe to remove these ASCII checks + swap out instances of log!("foo {}", core::str::from_utf8(cmd).unwrap()) with log!("foo {:?}", core::str::from_utf8(cmd)); as a quick-and-dirty workaround.

The exception to that would be:

#[cfg(feature = "std")]
trace!(
    "--> ${}#{:02x?}",
    core::str::from_utf8(&self.msg).unwrap(), // buffers are always ascii
    checksum
);

In this case, I'd actually swap it out for:

#[cfg(feature = "std")]
trace!(
    "--> ${}#{:02x?}",
    String::from_utf8_lossy(&self.msg),
    checksum
);

i.e: use, because this is already gated behind the std feature, we can use String::from_utf8_lossy to keep the output more human-readable when possible (rather than relying on the {:?} output).

Long term, I'll spend a bit of time playing around with a good way to debug-print mostly ASCII strings in gdbstub, but for now, this should unblock us in this PR.


Also, quite importantly, now that you've pointed out how to properly handle binary data, you'll also need to fix the following code in vFile:pwrite:

// ...
let offset = decode_hex_buf(body.next()?).ok()?;
let data = body.next()?; // <-- incorrect
Some(vFilePwrite{fd, offset, data})
// ...

Binary data sent to the client may include escaped bytes. Any instances of b'#' | b'$' | b'}' | b'*' in the data stream would actually be represented as two-bytes, b'}', 0x20 ^ <original data>.

This means that you'll have to write a decode_bin_buf helper method to compliment the existing decode_hex_buf method, which will unpack this compressed binary representation in-place.


It seems that we've stumbled into a small unforseen pit of complexity, but I'm glad we spotted this before the PR was merged.

@bet4it
Copy link
Contributor Author

bet4it commented Aug 18, 2021

What I meant to say is that you should make sure to leave a inline comment that explains why the specific data is being stubbed out, potentially citing sources if need be.

Because it's just an example, users can fill it themselves if they really need it...🙃

@daniel5151 daniel5151 self-requested a review August 19, 2021 15:50
Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, did one last holistic overview of the PR, and left the (what I think will be) the very last batch of nits.

examples/armv4t/gdb/host_io.rs Outdated Show resolved Hide resolved
src/gdbstub_impl/ext/host_io.rs Outdated Show resolved Hide resolved
src/protocol/commands/_vFile_open.rs Outdated Show resolved Hide resolved
if cfg!(feature = "paranoid_unsafe") {
Ok(&mut buf[..j])
} else {
unsafe { Ok(buf.get_unchecked_mut(..j)) }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, my bad: debug_assert!(false) is obviously incorrect, but we should add a debug_assert!(j <= buf.len()) before calling get_unchecked_mut here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there are no debug_assert in decode_hex_buf?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason aside from the fact that I forgot to add them. I should probably insert them at some point, just in case...

src/protocol/packet.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
src/target/ext/host_io.rs Outdated Show resolved Hide resolved
@daniel5151 daniel5151 self-requested a review August 20, 2021 17:36
Copy link
Owner

@daniel5151 daniel5151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a looooong journey, but I'm happy to say that I think we've finally made it 🎉

Thanks again for putting up with my endless battery of nits and comments - I sincerely hope you've come away from this PR with a better idea of gdbstub's philosophy to code quality and correctness. This is a project that really tries to embody the Rust ethos of "Fast, Reliable, Productive. Pick Three", but pulling that off does require a bit of a shift in mindset, if you're coming from a more traditional C/C++ background.

And with that, lets merge this bad boy 🚀

@daniel5151 daniel5151 merged commit 9227dfd into daniel5151:dev/0.6 Aug 20, 2021
@bet4it bet4it deleted the file branch August 21, 2021 02:02
@daniel5151 daniel5151 mentioned this pull request Aug 24, 2021
13 tasks
@daniel5151 daniel5151 mentioned this pull request Jun 3, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants