Skip to content

Commit

Permalink
fix: use less fragile method for rodata segment init
Browse files Browse the repository at this point in the history
Previous to this commit, we were expecting the rodata segments to be
encoded in a specific order, and placed on the advice stack before
anything else. With this commit, we now simply expect to the advice map
to contain each segment keyed by its commitment hash, which we then move
to the advice stack on demand, and immediately pipe to memory.

This means the order of the segments no longer matters, and the advice
stack is not sensitive to codegen changes or other influences which
might perturb the advice stack or otherwise disrupt our assumptions. It
also sets the stage for us to be able to initialize rodata after a
context switch, as at that point the advice stack will be in an unknown
condition, and using the advice map gives us certainty that we can
arrange to have exactly what we need on the advice stack, when we need
it.

Additionally, I've updated the `midenc debug` input config file, as well
as the usage documentation to reflect this.

The last related change to this, will be emitting the rodata segments to
disk in a convenient form, so that when the compiler emits the program,
it also emits the segments alongside it, making it convenient to run the
debugger against that program (or via the VM directly).
  • Loading branch information
bitwalker committed Aug 16, 2024
1 parent 7dad30e commit 5e56ab7
Show file tree
Hide file tree
Showing 5 changed files with 216 additions and 61 deletions.
23 changes: 14 additions & 9 deletions codegen/masm/src/masm/program.rs
Original file line number Diff line number Diff line change
Expand Up @@ -173,14 +173,13 @@ impl Program {
// Emit data segment initialization code
//
// NOTE: This depends on the program being executed with the data for all data
// segments having been pushed on the advice stack in the same order as visited
// here, with the same encoding. The program will fail to execute if it is not
// set up correctly.
// segments having been placed in the advice map with the same commitment and
// encoding used here. The program will fail to execute if this is not set up
// correctly.
//
// TODO(pauls): To facilitate automation of this, we should emit a file to disk
// that includes the raw encoding of the data we expect to be placed on the advice
// stack, in a manner which allows us to simply read that file as an array of felt
// and use that directly via `AdviceInputs`
// TODO(pauls): To facilitate automation of this, we should emit an inputs file to
// disk that maps each segment to a commitment and its data encoded as binary. This
// can then be loaded into the advice provider during VM init.
let pipe_preimage_to_memory = "std::mem::pipe_preimage_to_memory".parse().unwrap();
for segment in self.library.segments.iter() {
// Don't bother emitting anything for zeroed segments
Expand Down Expand Up @@ -229,8 +228,14 @@ impl Program {
let digest = Rpo256::hash_elements(&elements);
let span = SourceSpan::default();

// COM
block.push(Op::Pushw(digest.into()), span);
log::debug!(
"computed commitment for data segment at offset {offset} ({size} bytes, \
{num_elements} elements): '{digest}'"
);

// Move rodata from advice map to advice stack
block.push(Op::Pushw(digest.into()), span); // COM
block.push(Op::AdvInjectPushMapVal, span);
// write_ptr
block.push(Op::PushU32(base.waddr), span);
// num_words
Expand Down
30 changes: 25 additions & 5 deletions hir/src/constants.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ impl ConstantData {
Some(u32::from_le_bytes(unsafe { bytes.read() }))
}
}
impl From<ConstantData> for Vec<u8> {
#[inline(always)]
fn from(data: ConstantData) -> Self {
data.0
}
}
impl FromIterator<u8> for ConstantData {
fn from_iter<T: IntoIterator<Item = u8>>(iter: T) -> Self {
Self(iter.into_iter().collect())
Expand Down Expand Up @@ -126,26 +132,40 @@ impl fmt::LowerHex for ConstantData {
impl FromStr for ConstantData {
type Err = ();

#[inline]
fn from_str(s: &str) -> Result<Self, Self::Err> {
Self::from_str_be(s).map_err(|_| ())
}
}
impl ConstantData {
pub fn from_str_be(s: &str) -> Result<Self, &'static str> {
const NOT_EVEN: &str = "invalid hex-encoded data: expected an even number of hex digits";
const NOT_HEX: &str = "invalid hex-encoded data: contains invalid hex digits";

let s = s.strip_prefix("0x").unwrap_or(s);
let len = s.len();
if len % 2 != 0 {
return Err(());
return Err(NOT_EVEN);
}
// Parse big-endian
let pairs = len / 2;
let mut data = Vec::with_capacity(pairs);
let mut chars = s.chars();
while let Some(a) = chars.next() {
let a = a.to_digit(16).ok_or(())?;
let b = chars.next().unwrap().to_digit(16).ok_or(())?;
let a = a.to_digit(16).ok_or(NOT_HEX)?;
let b = chars.next().unwrap().to_digit(16).ok_or(NOT_HEX)?;
data.push(((a << 4) + b) as u8);
}

// Make little-endian
data.reverse();
Ok(Self(data))
}

pub fn from_str_le(s: &str) -> Result<Self, &'static str> {
let mut data = Self::from_str_be(s)?;
// Make little-endian
data.0.reverse();
Ok(data)
}
}

/// This maintains the storage for constants used within a function
Expand Down
101 changes: 99 additions & 2 deletions midenc-debug/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,105 @@ interoperate with `midenc`.

# Usage

The easiest way to use the debugger, is via `midenc run`, and giving it a path to a
program compiled by `midenc compile`.
The easiest way to use the debugger, is via `midenc debug`, and giving it a path to a
program compiled by `midenc compile`. See [Program Inputs](#program-inputs) for information
on how to provide inputs to the program you wish to debug. Run `midenc help debug` for more
detailed usage documentation.

The debugger may also be used as a library, but that is left as an exercise for the reader for now.

## Example

```shell
# Compile a program to MAST from a rustc-generated Wasm module
midenc compile foo.wasm -o foo.masl

# Load that program into the debugger and start executing it
midenc debug foo.masl
```

## Program Inputs

To pass arguments to the program on the operand stack, or via the advice provider, you have two
options, depending on the needs of the program:

1. Pass arguments to `midenc debug` in the same order you wish them to appear on the stack. That
is, the first argument you specify will be on top of the stack, and so on.
2. Specify a configuration file from which to load inputs for the program, via the `--inputs` option.

### Via Command Line

To specify the contents of the operand stack, you can do so following the raw arguments separator `--`.
Each operand must be a valid field element value, in either decimal or hexadecimal format. For example:

```shell
midenc debug foo.masl -- 1 2 0xdeadbeef
```

If you pass arguments via the command line in conjunction with `--inputs`, then the command line arguments
will be used instead of the contents of the `inputs.stack` option (if set). This lets you specify a baseline
set of inputs, and then try out different arguments using the command line.

### Via Inputs Config

While simply passing operands to the `midenc debug` command is useful, it only allows you to specify
inputs to be passed via operand stack. To provide inputs via the advice provider, you will need to use
the `--inputs` option. The configuration file expected by `--inputs` also lets you tweak the execution
options for the VM, such as the maximum and expected cycle counts.

An example configuration file looks like so:

```toml
# This section is used for execution options
[options]
max_cycles = 5000
expected_cycles = 4000

# This section is the root table for all inputs
[inputs]
# Specify elements to place on the operand stack, leftmost element will be on top of the stack
stack = [1, 2, 0xdeadbeef]

# The `inputs.rodata` section is a list of rodata segments that should be placed
# in the advice map before the program is executed. Programs compiled by midenc
# will have a prologue generated in their entrypoint that writes this data to linear
# memory, by moving it from the advice map to the advice stack (using the commitment
# digest), and then invoking `std::mem::pipe_preimage_to_memory`.
#
# The raw binary data is chunked up into 4 byte chunks, and then converted to field
# elements by first treating each chunk as a big-endian u32 value, and then creating
# the field element from that value. The data will arrive on the advice stack in an
# order that ensures it is written to linear memory in the same order as it appears
# in the raw binary data.
#
# You can specify one or more of these segments
[[inputs.rodata]]
digest = '0xb9691da1d9b4b364aca0a0990e9f04c446a2faa622c8dd0d8831527dbec61393'
# Specify a path to the binary data for this segment
path = 'foo.bin'
# Or, alternatively, specify the binary data in hexadecimal form directly
# data = '0x...'

# This section contains input options for the advice provider
[inputs.advice]
# Specify elements to place on the advice stack, leftmost element will be on top
stack = [1, 2, 3, 4]

# The `inputs.advice.map` section is a list of advice map entries that should be
# placed in the advice map before the program is executed. Entries with duplicate
# keys are handled on a last-write-wins basis.
[[inputs.advice.map]]
# The key for this entry in the advice map
digest = '0x3cff5b58a573dc9d25fd3c57130cc57e5b1b381dc58b5ae3594b390c59835e63'
# The values to be stored under this key
values = [1, 2, 3, 4]

[[inputs.advice.map]]
digest = '0x20234ee941e53a15886e733cc8e041198c6e90d2a16ea18ce1030e8c3596dd38''
values = [5, 6, 7, 8]
```

# Debugger Usage

Once started, you will be dropped into the main debugger UI, stopped at the first cycle of
the program. The UI is organized into pages and panes, with the main/home page being the
Expand Down
93 changes: 56 additions & 37 deletions midenc-debug/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,16 @@ impl DebuggerConfig {
mut file: DebuggerConfigFile,
cwd: Option<PathBuf>,
) -> Result<Self, String> {
let rodata = match file.inputs.rodata.take() {
Some(path) => {
let inputs = StackInputs::new(file.inputs.stack.into_iter().map(|felt| felt.0).collect())
.map_err(|err| format!("invalid value for 'stack': {err}"))?;
let mut advice_inputs = AdviceInputs::default()
.with_stack(file.inputs.advice.stack.into_iter().rev().map(|felt| felt.0))
.with_map(file.inputs.advice.map.into_iter().map(|entry| {
(entry.digest.0, entry.values.into_iter().map(|felt| felt.0).collect::<Vec<_>>())
}));

for segment in file.inputs.rodata {
let data = if let Some(path) = segment.path {
let path = if let Some(cwd) = cwd.as_ref() {
if path.is_relative() {
cwd.join(path)
Expand All @@ -62,22 +70,11 @@ impl DebuggerConfig {
} else {
path
};
Some(decode_rodata_from_path(&path)?)
}
None => None,
};
let inputs = StackInputs::new(file.inputs.stack.into_iter().map(|felt| felt.0).collect())
.map_err(|err| format!("invalid value for 'stack': {err}"))?;
let mut advice_inputs = AdviceInputs::default()
.with_stack(file.inputs.advice.stack.into_iter().rev().map(|felt| felt.0))
.with_map(file.inputs.advice.map.into_iter().map(|entry| {
(entry.digest.0, entry.values.into_iter().map(|felt| felt.0).collect::<Vec<_>>())
}));
if let Some(mut rodata) = rodata {
// The data needs to be reversed so that the first bytes of data are what appear
// on the operand stack first.
rodata.reverse();
advice_inputs.extend_stack(rodata);
decode_rodata_from_path(&path)?
} else {
decode_rodata(&segment.data)?
};
advice_inputs.extend_map([(segment.digest.0, data)]);
}

Ok(Self {
Expand Down Expand Up @@ -121,17 +118,26 @@ struct DebuggerConfigFile {
#[derive(Debug, Clone, Default, Deserialize)]
#[serde(default)]
struct Inputs {
/// A path to the file containing the rodata segments dumped by the compiler
///
/// The decoded data will be placed at the top of the advice stack so that it
/// is immediately available for the program to consume.
rodata: Option<PathBuf>,
/// The rodata segments to place in the advice map
rodata: Vec<DataSegment>,
/// The contents of the operand stack, top is leftmost
stack: Vec<crate::Felt>,
/// The inputs to the advice provider
advice: Advice,
}

#[derive(Debug, Clone, Deserialize)]
struct DataSegment {
/// The commitment digest for this segment
digest: Digest,
/// A path to the file containing the raw binary data for this segment
#[serde(default)]
path: Option<PathBuf>,
/// The raw data for this segment (mutually exclusive with `path`)
#[serde(default, deserialize_with = "deserialize_rodata_bytes")]
data: Vec<u8>,
}

#[derive(Debug, Clone, Default, Deserialize)]
#[serde(default)]
struct Advice {
Expand Down Expand Up @@ -206,6 +212,19 @@ impl<'de> Deserialize<'de> for Digest {
}
}

fn deserialize_rodata_bytes<'de, D>(deserializer: D) -> Result<Vec<u8>, D::Error>
where
D: serde::Deserializer<'de>,
{
use midenc_hir::ConstantData;

String::deserialize(deserializer).and_then(|hex| {
ConstantData::from_str_be(hex.as_str())
.map_err(|err| serde::de::Error::custom(format!("invalid rodata: {err}")))
.map(Vec::<u8>::from)
})
}

fn deserialize_execution_options<'de, D>(deserializer: D) -> Result<ExecutionOptions, D::Error>
where
D: serde::Deserializer<'de>,
Expand Down Expand Up @@ -293,7 +312,7 @@ mod tests {
fn debugger_config_with_advice() {
let text = toml::to_string_pretty(&toml! {
[inputs]
stack = [1, 2, 3]
stack = [1, 2, 0x3]

[inputs.advice]
stack = [1, 2, 3, 4]
Expand All @@ -310,7 +329,7 @@ mod tests {
"0x3cff5b58a573dc9d25fd3c57130cc57e5b1b381dc58b5ae3594b390c59835e63",
)
.unwrap();
let file = DebuggerConfig::parse_str(&text).unwrap();
let file = DebuggerConfig::parse_str(&text).unwrap_or_else(|err| panic!("{err}"));
assert_eq!(file.inputs.values(), &[RawFelt::new(3), RawFelt::new(2), RawFelt::new(1)]);
assert_eq!(
file.advice_inputs.stack(),
Expand All @@ -329,12 +348,16 @@ mod tests {
#[test]
fn debugger_config_with_rodata() {
const RODATA_SAMPLE: &[u8] = "hello world\0data\0strings\nü".as_bytes();

let rodata_cwd = std::path::Path::new(env!("CARGO_MANIFEST_DIR"));
let rodata_path = rodata_cwd.join("testdata").join("rodata-sample.bin");
let text = toml::to_string_pretty(&toml! {
[inputs]
stack = [1, 2, 3]
rodata = "testdata/rodata-sample.bin"

[[inputs.rodata]]
digest = "0x2786346021744030bf0b9eb930712993609fb0425f7bda70e38ffc23c2f11df2"
path = "testdata/rodata-sample.bin"

[inputs.advice]
stack = [1, 2, 3, 4]
Expand All @@ -345,24 +368,20 @@ mod tests {
.unwrap();

let mut expected = decode_rodata(RODATA_SAMPLE).unwrap();
let digest = miden_processor::crypto::Rpo256::hash_elements(&expected);
assert_eq!(expected[0], RawFelt::new(u32::from_be_bytes([b'h', b'e', b'l', b'l']) as u64));
// The elements are reversed when placed on the advice stack so that they are read in byte
// order
expected.reverse();

// Bypass parse_str so that we can specify the working directory context
let file =
toml::from_str::<DebuggerConfigFile>(&text).unwrap_or_else(|err| panic!("{err}"));
let file = DebuggerConfig::from_inputs_file(file, Some(rodata_cwd.to_path_buf())).unwrap();

assert_eq!(file.inputs.values(), &[RawFelt::new(3), RawFelt::new(2), RawFelt::new(1)]);
assert_eq!(file.advice_inputs.stack().len(), 4 + expected.len());
assert!(file.advice_inputs.stack().starts_with(&[
RawFelt::new(4),
RawFelt::new(3),
RawFelt::new(2),
RawFelt::new(1)
]));
assert!(file.advice_inputs.stack().ends_with(expected.as_slice()));
assert_eq!(file.advice_inputs.stack().len(), 4);
assert_eq!(
file.advice_inputs.stack(),
&[RawFelt::new(4), RawFelt::new(3), RawFelt::new(2), RawFelt::new(1)]
);
assert_eq!(file.advice_inputs.mapped_values(&digest), Some(expected.as_slice()));
}
}
Loading

0 comments on commit 5e56ab7

Please sign in to comment.