-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MassaLabs: Implement a Intermediate Representation to improve the compilation process #359
base: next
Are you sure you want to change the base?
Conversation
Translation from AST has been vastly commented out, to rework
Updated inlining test template
- remove leading and ending new lines - display calls as SSA - remove var name in function return signatures - handle Operation {Sub, Mul} - display Constants as Type(value)
+ remap operation's parametters to new body's NodeIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long! I wanted to make sure I spent some time thinking about this and evaluating the design.
I've left some comments on some details that I would like clarification on, or that feel a bit awkward (or at least unusual), as well as one which goes into much more detail on an alternative representation for operations/values/etc. that I think might provide a much nicer substrate to build on. I would be interested to get your take on that, in terms of whether it would be easier for you to work with, or whether there are other constraints I am not recalling that make it less ideal.
In any case, I think what is implemented here can work, though there are a few things I'm uncertain about, as mentioned in some of my comments. I'm going through the inlining pass in more detail right now, but wanted to get my initial feedback to you first.
Feel free to ping me with any questions, or if you just want to bounce an idea of me, happy to discuss this further!
Operation::Matrix(_vec) => todo!(), | ||
} | ||
}*/ | ||
|
||
/// Insert the operation and return its node index. If an identical node already exists, return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Insert the operation and return its node index. If an identical node already exists, return
How does this uniquing interact with the use list? You probably want to scope this uniquing to a specific region (e.g. function body), or the use list will not be useful until after everything is fully inlined (otherwise you end up with things like a variable being used by a completely different function in the same program because they have the same data, and rewrites within a function body (for example) would then be unsafe (or incomplete, i.e. you might fail to identify an expression tree as unused because the root expression was combined with another one in a different region where it is used, but you've now got dead code in the original function that isn't being recognized as such.
I might be missing some other reasons why this isn't an issue in practice, as I'm still going through my review, but wanted to ask about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea to scope uniqueness per region, for now the unique property was not upheld during inlining but expected afterwards due to the way we use PlaceHolder nodes, which means we have to do a pass of CSE. We think we can do without those node types after scope regionalisation is implemented, it also seems to be the best option in our opinion.
} | ||
} | ||
|
||
pub fn pretty(graph: &MirGraph, roots: &[NodeIndex]) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add a dependency on the miden-formatting
crate, it contains a generic pretty printer based on the "Prettier" algorithm by Philip Wadler. You just have to implement the PrettyPrint
trait, which defines how to render the item in abstract terms, and then the pretty printer takes care of making it look good. You can then do things like have the Display
impl delegate to the PrettyPrint
impl and get it for "free".
We use it in Miden VM as well as the compiler, so it'd be good to use it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, we'll convert our implementation to use miden-formatting instead of our custom implementation once other points discussed here are resolved/implemented.
/// Begin primitive operations | ||
|
||
/// Evaluates to a [TypedValue] | ||
Value(SpannedMirValue), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit out of place to make "values" a type of operation - operations should produce and consume values, but not represent values directly IMO.
The exception to this would be the notion of a "constant" operation, which can contain the constant operand, and effectively materializes an SSA value for that constant. This lets analysis and transformation be uniform over operations.
The representation of values can still distinguish between values with different semantics though, e.g. the different type of bindings. This corresponds to how you typically unify operation results and block arguments in a typical compiler IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on this, we go into more details on this reply #359 (comment)
/// Call (func, arguments) | ||
Call(NodeIndex, Vec<NodeIndex>), | ||
/// Fold an Iterator according to a given FoldOperator and a given initial value | ||
Fold(NodeIndex, FoldOperator, NodeIndex), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make all of these variants proper structures, so as to give the fields actual names, otherwise it makes downstream code much more awkward and difficult to read. Tuple variants with a single field are a bit more debatable, but it is usually the case with IR entities that each variant has custom behaviors and such, and implementing that on top of a big enum like this is awkward, so I would define a new struct for all of these variants (similar to what you've done with SpannedMirValue
and SpannedVariable
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point and shouldn't be too hard to implement. We'll incorporate it into our design.
|
||
/// A reference to a specific variable in a function | ||
/// Variable(MirType, argument position) | ||
Variable(SpannedVariable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that we would remove any notion of "variables" in the translation from the AST to the IR, so the IR is always in SSA form. What's the distinction between Value
and Variable
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable is improperly named here, it is meant to represent function parameters and not variables, will rename to Parameter to avoid further confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think my actual question is: does Variable
(or Parameter
if you rename it) represent a named binding, or an SSA value?
My concern is that named bindings remain in the IR in any form. In other words, we shouldn't need to concern ourselves with names and their associated lexical scoping rules - particulary in the presence of uniquing. I would expect the IR to be composed of, essentially, blocks (with optional parameter lists) and operations (with operands and results), where block parameters, operands, and results are all SSA values. The dataflow of the program is then implicit in the edges of the value dependency graph.
As an aside, it might be a bit of a misnomer to call them SSA values, as AirScript doesn't really need to enforce the dominance property of SSA-form programs (i.e. all uses of a value are dominated by the definition of that value). The only property we would need to enforce is that the value dependency graph doesn't have cycles. In effect, by construction, the program will be in SSA form, even though we don't really have a control flow graph per se, more of a loose scheduling order.
In any case, renaming from Variable
to something else is probably still a good idea, since it implies the notion of a mutable binding, but I wanted to try and understand the distinction between these and Value
. If I understand correctly, you are using Variable
and Value
to distinguish between what I call BlockArgument
and OpResult
in my other comment, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you are using Variable and Value to distinguish between what I call BlockArgument and OpResult in my other comment, is that correct?
Exactly for the Variable, there is no concept of a variable assignment, hence my comment that it is improperly named.
We currently use Value only as leaf nodes (SpannedMirValue), and ways to combine them as parents referencing other nodes (Add, Sub, Mul, Call, Fold, For, Definition, etc.)
As an aside, it might be a bit of a misnomer to call them SSA values, as AirScript doesn't really need to enforce the dominance property of SSA-form programs
Indeed, the SSA form is already implicit and extrapolated from the dependency graph. Although it is currently ordered as laid out in blocks, we could reorder those based on a post-order traversal of the dependency graph.
NB: After reviewing your other comments on the advantage of storing SSA values in the nodes to help with debugging, we'll follow your advice and add a way to track the SSA numbers in the graph directly
The only property we would need to enforce is that the value dependency graph doesn't have cycles.
For now we assume that there are no cycles in the graph, but that might be worth enforcing explicitly.
My concern is that named bindings remain in the IR in any form.
Modulo the pending fix on the inlining soundness, the inlining pass does get rid of all Parameters, Calls, and Definition (except the root Definitions)
Variable(SpannedVariable), | ||
/// A function definition (Vec_params, optional return_variable, body) | ||
/// Definition(Vec<Variable>, Variable, body) | ||
Definition(Vec<NodeIndex>, Option<NodeIndex>, Vec<NodeIndex>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to distinguish between function definitions and evaluators, as their semantics are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, will do.
#[derive(Debug, Eq, PartialEq, Copy, Clone)] | ||
pub enum Value { | ||
#[derive(Debug, Eq, PartialEq, Clone)] | ||
pub enum MirValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would distinguish between "values", "operands" and "results" (as separate structs), and make MirValue
a struct with fields, one of which is ValueKind
that more or less contains what is in this enum currently (setting aside other feedback below).
To better elaborate the structure I'm talking about, I've sketched out below how all of the pieces fit together, in theory.
/// An intrusive linked-list for tracking uses of a value as an operand
pub type UseList = intrusive_collections::linked_list::LinkedList<OperandAdapter>;
intrusive_collections::intrusive_adapter!(pub OperandAdapter = Rc<Operand>: Operand { link: intrusive_collections::LinkedListLink });
/// A [Context] is used for global state associated with the current compilation context/session.
///
/// For example, uniqued storage could be handled here, IR entities could even be allocated from it,
/// but that's not anything we need to try and do.
#[derive(Default)]
pub struct Context {
next_value_id: Cell<u32>,
}
impl Context {
/// Allocates a new unique value identifier from the current context
pub fn make_value_id(&self) -> ValueId {
let id = ValueId(self.next_value_id.get());
self.next_value_id.set(id.as_u32() + 1);
id
}
}
/// An operand is an argument to an operation, and semantically represents a use of a [Value] by an [Operation].
#[derive(Spanned)]
pub struct Operand {
/// The link in the use list of the value this operand references
link: intrusive_collections::LinkedListLink,
/// The source location associated with the _use_ (not the def) of the value
#[span]
span: SourceSpan,
/// A reference to the (type-erased) operation to which this operand is attached (owned by)
owner: Rc<dyn Op>,
/// A reference to the value of this operand.
value: RefCell<Rc<dyn Value>>,
/// The index of this operand in its owner's operand vector
index: Cell<usize>,
}
impl Operand {
pub fn is_linked(&self) -> bool {
self.link.is_linked()
}
pub fn owner(&self) -> Rc<dyn Op> {
self.owner.clone()
}
pub fn value_id(&self) -> ValueId {
self.value.borrow().id()
}
pub fn value(&self) -> Rc<dyn Value> {
self.value.borrow().clone()
}
pub fn index(&self) -> usize {
self.index.get()
}
}
impl Eq for Operand {}
impl PartialEq for Operand {
fn eq(&self, other: &Self) -> bool {
/// Equality is tied to identity for operands
core::ptr::addr_eq(self, other)
}
}
/// A unique integer identifier for an SSA value in the program.
///
/// These are allocated by [Context], and only by [Context].
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct ValueId(u32);
impl ValueId {
pub const fn as_u32(&self) -> u32 {
self.0
}
}
impl fmt::Display for ValueId {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "v{}", &self.0)
}
}
/// The [Value] trait represents attributes and behaviors common to all
/// kinds of values in the IR.
///
/// In addition to their role in the IR, values are the means by which the
/// use-def graph is queried and maintained.
///
/// This trait is object safe, to allow for `dyn Value` use.
pub trait Value: Spanned {
/// This allows us to downcast to the concrete type, while allowing
/// unsized implementations of this trait, i.e. `Rc<dyn Value>` also
/// implements `Value`, where if `Any` was a supertrait, then trait
/// objects wrapped in a `Rc` or other smart pointer would not be
/// usable in generic functions where `Value` is a bound.
fn as_any(&self) -> &dyn core::any::Any;
fn as_any_mut(&mut self) -> &mut dyn core::any::Any;
/// Get the SSA value identifier for this value
fn id(&self) -> ValueId;
/// Get the type of this value.
///
/// Here, I'm assuming that something like `Type::Unknown` represents
/// an unknown type, rather than using `Option`, but that would work fine
/// too.
fn ty(&self) -> Type;
/// Access the use list of this value
fn uses(&self) -> Ref<'_, UseList>;
fn uses_mut(&mut self) -> RefMut<'_, UseList>;
/// Check if this value is used
fn is_used(&self) -> bool {
!self.uses().is_empty()
}
/// Add a new user of this value
fn add_user(&self, user: Rc<Operand>) {
self.uses_mut().push_back(user);
}
/// Remove a use from this value safely (i.e. without assuming
/// that it is actually in the list). Requires scanning the list.
fn remove_user(&self, user: Rc<Operand>) {
assert!(user.is_linked());
let mut users = self.uses_mut();
let mut cursor = users.front_mut();
while let Some(current) = cursor.get() {
if current == &user { break; }
}
cursor.remove();
}
/// Remove a use from this value in constant time, assuming
/// you can guarantee that the user is a member of this value's
/// use list (typically the case).
unsafe fn remove_user_unchecked(&mut self, user: Rc<Operand>) {
assert!(user.is_linked());
let mut users = self.uses_mut();
users.cursor_mut_from_ptr(&*user).remove();
}
}
/// Since we will often be working with `Rc<dyn Value>`, we can implement
/// functions on `dyn Value` to expose useful functionality that we otherwise
/// can't implement via `Value` without destroying its object safety.
impl dyn Value {
/// Check if the concrete type of this value is `T`
pub fn is<T: Value>(&self) -> bool {
self.as_any().is::<T>()
}
/// Downcast this value to its concrete type `T`, or return `None` if
/// the value is of a different type.
pub fn downcast_ref<T: Value>(&self) -> Option<&T> {
self.as_any().downcast_ref::<T>()
}
pub fn downcast_mut<T: Value>(&mut self) -> Option<&mut T> {
self.as_any_mut().downcast_mut::<T>()
}
}
/// An [OpResult] represents an operation-produced value.
///
/// It implements the [Value] trait.
#[derive(Spanned)]
pub struct OpResult {
/// The operation which produced this result (i.e. owned by)
owner: Rc<dyn Op>,
/// The unique SSA value identifier for this result
id: ValueId,
/// The source location associated with the variable defined by this result
#[span]
span: SourceSpan,
/// The type of this result, if known
ty: RefCell<Type>,
/// The index of this result in its owning operation's result list
index: usize,
/// The use list of this result
users: RefCell<UseList>,
}
impl OpResult {
pub fn new(ty: Type, span: SourceSpan, index: usize, owner: Rc<dyn Op>, context: &Context) -> Rc<Self> {
Rc::new(Self {
owner,
id: context.make_value_id(),
span,
ty: RefCell::new(ty),
index,
users: Default::default(),
})
}
}
impl Value for OpResult {
fn as_any(&self) -> &dyn core::any::Any {
self
}
fn as_any_mut(&mut self) -> &mut dyn core::any::Any {
self
}
fn id(&self) -> ValueId { self.id }
fn ty(&self) -> Type { self.ty.borrow().clone() }
fn uses(&self) -> Ref<'_, UseList> { self.users.borrow() }
fn uses_mut(&self) -> RefMut<'_, UseList> { self.users.borrow_mut() }
}
/// A [BlockArgument] represents a value introduced via a block parameter list
#[derive(Spanned)]
pub struct BlockArgument {
owner: Rc<Block>,
id: ValueId,
span: SourceSpan,
ty: RefCell<Type>,
index: usize,
users: RefCell<UseList>,
}
impl BlockArgument {
pub fn new(ty: Type, span: SourceSpan, index: usize, owner: Rc<Block>, context: &Context) -> Rc<Self> {
Rc::new(Self {
owner,
id: context.make_value_id(),
span,
ty: RefCell::new(ty),
index,
users: Default::default(),
})
}
}
impl Value for BlockArgument {
fn as_any(&self) -> &dyn core::any::Any {
self
}
fn as_any_mut(&mut self) -> &mut dyn core::any::Any {
self
}
fn id(&self) -> ValueId { self.id }
fn ty(&self) -> Type { self.ty.borrow().clone() }
fn uses(&self) -> Ref<'_, UseList> { self.users.borrow() }
fn uses_mut(&self) -> RefMut<'_, UseList> { self.users.borrow_mut() }
}
/// A list of [Block] in a [Region]
pub type BlockList = intrusive_collections::linked_list::LinkedList<BlockAdapter>;
intrusive_collections::intrusive_adapter!(pub BlockAdapter = Rc<Block>: Block { link: intrusive_collections::LinkedListLink });
pub struct Region {
parent: Rc<dyn Op>,
body: RefCell<BlockList>,
}
impl Region {
pub fn new(parent: Rc<dyn Op>) -> Self {
Self {
parent,
body: Default::default(),
}
}
pub fn parent(&self) -> Rc<dyn Op> {
self.parent.clone()
}
pub fn is_empty(&self) -> bool {
self.body.borrow().is_empty()
}
pub fn entry_block(&self) -> Option<Rc<Block>> {
self.body.borrow().front().clone_pointer()
}
pub fn body(&self) -> Ref<'_, BlockList> {
self.body.borrow()
}
pub fn body_mut(&self) -> RefMut<'_, BlockList> {
self.body.borrow_mut()
}
pub fn insert_at_end(&self, block: Rc<Block>) {
assert!(!block.is_linked());
self.body_mut().push_back(block);
}
}
/// A list of [BlockItem] (or ops) in a [Block]
pub type BlockBody = intrusive_collections::linked_list::LinkedList<BlockItemAdapter>;
intrusive_collections::intrusive_adapter!(pub BlockItemAdapter = Box<BlockItem>: BlockItem { link: intrusive_collections::LinkedListLink });
/// Represents type-erased operations in the body of a [Block]
pub struct BlockItem {
pub op: Rc<dyn Op>,
}
pub struct Block {
link: LinkedListLink,
parent: RefCell<Rc<Region>>,
args: RefCell<SmallVec<[Rc<BlockArgument>; 4]>>,
body: RefCell<BlockBody>,
}
impl Block {
pub fn new(parent: Rc<Region>) -> Self {
Self {
link: Default::default(),
parent,
args: Default::default(),
body: Default::default(),
}
}
pub fn parent(&self) -> Rc<Region> {
self.parent.borrow().clone()
}
pub fn args(&self) -> Ref<'_, [Rc<BlockArgument>]> {
Ref::map(self.args.borrow(), |args| args.as_slice())
}
pub fn args_mut(&self) -> RefMut<'_, SmallVec<[Rc<BlockArgument>; 4]>> {
self.args.borrow_mut()
}
pub fn body(&self) -> Ref<'_, BlockBody> {
self.body.borrow()
}
pub fn body_mut(&self) -> RefMut<'_, BlockBody> {
self.body.borrow_mut()
}
pub fn insert_at_start(self: Rc<Self>, op: Rc<dyn Op>) {
assert!(!op.is_linked(), "cannot insert an already-linked operation into a new block");
op.as_operation_mut().parent = Some(Rc::clone(&self));
self.body.borrow_mut().push_front(Box::new(BlockItem { op }));
}
pub fn insert_at_end(self: Rc<Self>, op: Rc<dyn Op>) {
assert!(!op.is_linked(), "cannot insert an already-linked operation into a new block");
op.as_operation_mut().parent = Some(Rc::clone(&self));
self.body.borrow_mut().push_back(Box::new(BlockItem { op }));
}
}
/// This type is used to provide the underlying storage and common
/// behavior to all operations. This makes implementing individual
/// operations much more straightforward, as they can largely delegate
/// to this type for most things.
#[derive(Default)]
pub struct Operation {
/// The intrusive link for this operation in its parent block.
link: LinkedListLink,
/// The parent block of this operation, or `None` if it is top-level
parent: Option<Rc<Block>>,
/// The source location where this operation was derived
span: SourceSpan,
/// The set of operands for this operation
operands: SmallVec<[Rc<Operand>; 2]>,
/// The set of results of this operation
results: SmallVec<[Rc<OpResult>; 1]>,
/// The regions nested under this operation.
regions: SmallVec<[Region; 1]>,
}
impl Operation {
pub fn is_top_level(&self) -> bool {
self.parent.is_none()
}
pub fn set_parent(&mut self, parent: Rc<Block>) {
self.parent = Some(parent);
}
/// Replace all usages of `value` with `replacement` and update the respective use lists.
pub fn replace_all_uses_of(&mut self, value: ValueId, replacement: Rc<dyn Value>) {
for (i, operand) in self.operands.iter().enumerate() {
if operand.value_id() == value {
let prev_value = core::mem::replace(&mut *operand.value.borrow_mut(), replacement.clone());
prev_value.remove_user(Rc::clone(operand));
replacement.add_user(Rc::clone(operand));
}
}
}
}
/// The [Op] trait represents the attributes and behavior common to all ops.
///
/// Most functions are implemented automatically by delegating to the underlying
/// [Operation] storage.
///
/// This trait is object safe, to allow `dyn Op` use.
pub trait Op {
/// For casting
fn as_any(&self) -> &core::any::Any;
fn as_any_mut(&mut self) -> &mut core::any::Any;
/// Provide access to the raw [Operation]
fn as_operation(&self) -> Ref<'_, Operation>;
fn as_operation_mut(&self) -> RefMut<'_, Operation>;
/// This would in theory be used for printing operations generically
fn opcode(&self) -> &'static str { core::any::type_name::<Self>() }
/// The following are some examples of functionality provided by this trait, but
/// there is quite a bit more that you'd want in order to support a variety of
/// different transformations.
fn num_operands(&self) -> usize { self.as_operation().operands.len() }
fn get_operand(&self, index: usize) -> Rc<Operand> {
self.as_operation().operands[index].clone()
}
fn set_operand(&self, index: usize, value: Rc<dyn Value>) {
let mut operation = self.as_operation_mut();
let mut operand = &mut operation.operands[index];
let replaced = core::mem::replace(&mut operand.borrow_mut().value, value);
replaced.remove_user(operand);
}
/// Replace all uses of `value` with `replacement` in the operands of this operation.
///
/// This will also handle removing uses of the old value, and adding uses of the replacement.
fn replace_uses_of(&self, value: ValueId, replacement: Rc<dyn Value>) {
self.as_operation_mut().replace_uses_of(value, replacement);
}
fn num_results(&self) -> usize {
self.as_operation().results.len()
}
fn get_result(&self, index: usize) -> Rc<OpResult> {
self.as_operation().results[index].clone()
}
fn has_regions(&self) -> bool {
!self.as_operation().regions.is_empty()
}
fn get_region(&self, index: usize) -> Rc<Region> {
self.as_operation().regions[index].clone()
}
}
/// We implement some functionality on `dyn Op` so that we can keep `Op` object safe, but still
/// provide some of the richer mutation/transformation actions we want to support.
impl dyn Op {
/// Add `value` as a new operand of this operation, adding the resulting operand
/// as a user of that value.
pub fn append_operand(self: Rc<Self>, value: Rc<dyn Value>, span: SourceSpan) -> Rc<Operand> {
let owner = Rc::clone(&self);
let mut operation = self.as_operation_mut();
let index = operation.operands.len();
let operand = Rc::new(Operand {
link: Default::default(),
span,
index,
value: RefCell::new(value.clone()),
owner
});
operation.operands.push(Rc::clone(&operand));
value.add_user(operand.clone());
operand
}
}
/// An example definition of a primitive concrete operation
#[derive(Spanned)]
pub struct Add(Operation);
impl Add {
pub fn create(lhs: Rc<dyn Value>, rhs: Rc<dyn Value>, span: SourceSpan, context: &Context) -> Rc<Self> {
let op = Rc::new(Self(Operation {
span,
..Default::default()
}));
let ty = lhs.ty();
op.append_operand(lhs);
op.append_operand(rhs);
let owner = Rc::clone(&self) as Rc<dyn Op>;
let mut operation = op.0.borrow_mut();
let index = operation.results.len();
operation.results.push(OpResult::new(
ty,
span,
index,
owner,
context,
));
op
}
}
impl Op for Add {
fn as_any(&self) -> &core::any::Any { self }
fn as_any_mut(&mut self) -> &mut core::any::Any { self }
fn as_operation(&self) -> Ref<'_, Operation> { self.0.borrow() }
fn as_operation_mut(&self) -> RefMut<'_, Operation> { self.0.borrow_mut() }
}
/// An example definition of a structured concrete operation
///
/// This represents a function definition, it does not have any operands, nor any results,
/// it is also semantically isolated from above, i.e. it cannot reference values in its body,
/// that are not defined within its body.
///
/// It contains a single region, representing the function body.
#[derive(Spanned)]
pub struct Function {
op: Operation,
name: QualifiedIdentifier,
params: SmallVec<[Type; 4]>,
result: Type,
}
impl Function {
pub fn create(params: impl IntoIterator<Item = Type>, result: Type, span: SourceSpan, context: &Context) -> Rc<Self> {
let op = Rc::new(Self {
op: Operation {
span,
..Default::default()
},
params: params.into_iter().collect(),
result,
}));
// Initialize function body
let region = Rc::new(Region::new(Rc::clone(&op));
op.regions.borrow_mut().push(Rc::clone(®ion));
// Intialize entry block
let entry = Rc::new(Block::new(Rc::clone(®ion)));
let entry_args = op.params.iter().enumerate().map(|(index, ty)| {
BlockArgument::new(ty.clone(), span, index, Rc::clone(&entry), context)
});
entry.args_mut().extend(entry_args);
region.insert_at_end(entry);
op
}
pub fn name(&self) -> &QualifiedIdentifier {
&self.name
}
pub fn body(&self) -> Rc<Region> {
self.op.regions[0].clone()
}
}
impl Op for Function {
fn as_any(&self) -> &core::any::Any { self }
fn as_any_mut(&mut self) -> &mut core::any::Any { self }
fn as_operation(&self) -> Ref<'_, Operation> { self.0.borrow() }
fn as_operation_mut(&self) -> RefMut<'_, Operation> { self.0.borrow_mut() }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current design for the IR is closer to a sea-of-nodes than the more common CFG representation. The main difference with a classic sea-of-nodes implementation is that we do preserve ordering in the function/block bodies. This also means that our SSA values can be extrapolated deterministically when pretty printing rather than being stored in the nodes directly. We currently do this by traversing block bodies in order, but could also be done with post-order traversal of the dependency graph.
We have a few questions/remarks about your proposed structure:
- What would be the advantage of using an intrusive linked list here?
- Switch from sea-of-nodes to CFG: might delay work on passes and need a rewrite of the lowering AST->MIR. The 2 models are valid for an IR, with varying advantages and drawbacks.
- We agree that the node types should be more properly separated, we'll tend towards your proposed design on this aspect over time, incorporating elements as we need them for this implementation.
- Keeping the regionalised owner attributes would help simplify passes greatly. We'll incorporate that into our design.
Overall we think we can benefit from the proposed changes without necessarily changing the core design of the sea-of-nodes system we already have. This also minimizes time spent on redoing existing work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current design for the IR is closer to a sea-of-nodes than the more common CFG representation.
I should note that this is true of the representation described above as well - more precisely, it is a simplified representation of a Regionalized Value-State Dependence Graph (RVSDG), where the connections between nodes are entirely dictated by the value dependency graph. The exception to this, if you can call it that, are "SSA regions", i.e. regions with multiple blocks, where values are expected to adhere to the SSA dominance property - but you can also have regions where all operations go in a single block with no implicit scheduling order, beyond that dictated by the value dependency graph. It is basically up to the semantics of each operation to dictate what kind of regions it has. MLIR, which uses a similar representation, names these two kinds of regions "SSACFG" regions and "graph" regions.
This also means that our SSA values can be extrapolated deterministically when pretty printing rather than being stored in the nodes directly.
The use of ValueId
above is really one of debugging convenience, you can absolutely skip it, and assign integral identifiers during pretty printing.
The tradeoff of that, is that it means that if you are debugging an issue with the compiler, dumping a vector of values doesn't give you a way to correlate those values to what you see in the pretty printed output (and from that, to the source code), all you get are opaque values (and some associated information like types, but that is commonly not enough to know precisely which value you are looking at, or how it was derived). It is convenient to be able to correlate the output from something like dbg!(value)
with the pretty printed output and source code of the program/function/region, so that you can better understand what is happening. In any case, you could certainly omit it, I just find in practice that it is worth assigning a debugging-friendly canonical identifier for uses like that (one can also just perform this only when under #[cfg(debug_assertions)]
or something, plenty of different ways to approach things of this nature).
What would be the advantage of using an intrusive linked list here?
The biggest benefits are:
- You don't need to allocate any memory for containers that a node might be a member of, e.g., the list of operations in a block, a list of users of some symbol, all can be maintained by links between nodes directly.
- It lets you manipulate nodes in the graph independently of their parents.
- It permits a node to be present in multiple lists at once. For example, if you wanted to track symbol usage (where operations can be symbols), then an operation might have an intrusive link both for its parent block and the use list of the symbol it uses (if it uses one).
- Many common transformations/rewrites rely on: inserting nodes at the front or back of some list of nodes, inserting a node relative to some other node; moving nodes around relative to their current position, or to the position of another node; appending a list of nodes to another list, splicing a list of nodes into the middle of another list; splitting a list of nodes at arbitrary positions; and "stealing" a list of nodes from its current owner, leaving an empty list in its place. The intrusive linked-list is very efficient at all of these operations, in particular it is a constant-time operation to construct a cursor to a node in its containing list, so that you can then perform some operation on the list relative to that cursor (e.g. removing the node, splitting the list at that node, splicing another list at that node, inserting a new node). Vectors are great for cache locality, iteration, indexed access, and pushing to the end (assuming it doesn't need to reallocate the backing storage in the process) - but they are much less efficient at all of the other operations I mentioned above.
Basically, it is an excellent data structure for a compiler IR which relies on an implicit graph, rather than an explicit one. The primary downside of linked lists is the potential for poor cache locality - but that is mitigated significantly by allocating nodes from an arena. I didn't go that far with my example code, but in practice that is usually what I do. Most traversals of an IR are jumping between different node types (regions -> blocks -> operations -> values -> operations -> blocks -> values -> ...), so cache locality is inherently an issue even when using vectors for node storage. Vectors are of course the primary alternative, using them as an arena, and using handles (indexes in some vector of a specific node type) for node-to-node references - implementing things like lists of regions, blocks, operations, etc., as vectors of handles. It's not a bad alternative by any means, perhaps even better in some regards, but it does make some things more awkward in Rust, particularly mutation.
Switch from sea-of-nodes to CFG: might delay work on passes and need a rewrite of the lowering AST->MIR. The 2 models are valid for an IR, with varying advantages and drawbacks.
As an aside: I want to push back on characterizing this as a CFG representation - it is an RVSDG, which is a form of dataflow graph. To the extent that there is any control flow represented, it is by convention only (i.e. the fact that operations are ordered in a block is incidental). What actually dictates whether a given region must also represent a CFG, is whether or not you implement analyses/transformations over it that depend upon the SSA dominance property. Since AirScript doesn't have any real notion of control flow, only data flow, I think it is fair to say that we don't have any particular reason to require this of our IR, regardless of representation.
In any case, I certainly agree that switching representations is probably not worth it at this point, since I don't see any reason to assume that the current approach is not going to work out. My primary purpose in bringing it up and elaborating on it is two-fold:
- The current sea-of-nodes representation is fine, aside from nits like splitting up the representation of values and operations, but I have two concerns (not unresolvable ones, but want to note them), and I wanted to frame it in comparison to the alternative approach that I probably would have taken if I was the one doing the work:
- Uniquing nodes, and the interaction with program transformations. It may be that this isn't actually an issue in practice, but we'll want to keep an eye on this, and test thoroughly all of the potential edge cases.
- The ability for less senior engineers to reason about this representation and work on the AirScript compiler. More thorough documentation once things are firmed up will be essential, particularly any assumptions made that have implications for correctness of the translation from the AST. For example, at what points are bindings resolved to SSA values, and does that occur before/after uniquing (presumably all bindings are resolved before uniquing, as let-shadowing is permitted in AirScript, but all of those kind of details should be outlined).
- I had outlined the RVSDG design in our previous discussions elsewhere, (in Intermediate representation #358 IIRC), but not elaborated on what that would actually look like in practice. I wanted to do so here in order to compare/contrast, and provide you with a concrete idea of what I'm talking about if I refer to things not present in this specific IR.
To be 100% clear, it was more to provide a background for discussion about this IR, rather than to dictate a new design.
We agree that the node types should be more properly separated, we'll tend towards your proposed design on this aspect over time, incorporating elements as we need them for this implementation.
👍
Keeping the regionalised owner attributes would help simplify passes greatly. We'll incorporate that into our design.
I'm assuming it will be necessary in order to do per-region uniquing. However, after mulling it over further, perhaps it is not necessary to do per-region uniquing? I would like to better understand the implications of global uniquing, and I think that requires a broader set of tests and implemented functionality to see how it all interacts. It may very well be that there are no adverse effects because of how simple AirScript is in terms of primitive operations.
Overall we think we can benefit from the proposed changes without necessarily changing the core design of the sea-of-nodes system we already have. This also minimizes time spent on redoing existing work.
Sure, I think that's fine. At the very least, you have an example of an alternative approach if you find you need to elaborate on the current representation to address any issues you run into. Again, the idea was more to compare/contrast and provide an explanation of what I'm talking about, rather than dictate how you implement the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that this is true of the representation described above as well - more precisely, it is a simplified representation of a Regionalized Value-State Dependence Graph (RVSDG), where the connections between nodes are entirely dictated by the value dependency graph
I see, I think we misinterpreted your proposed implementation here, you can safely discard any mention of a CFG in our previous comment.
After thinking more about it, your design does seem to present another advantage in the case of dead code elimination. Since nodes are referenced by pointer (or Rc
) and not their NodeIndex in the Vec of all nodes, this would enable automatic cleaning of unreferenced nodes without worrying about having to shift subsequent NodeIndices in order to compactify the graph. Using reference counting makes this a lot easier to manage than doing it manually
The use of ValueId above is really one of debugging convenience, you can absolutely skip it, and assign integral identifiers during pretty printing.
Noted, this is a very good point I hadn't thought about, we'll add tracking of ssa values to the nodes
The biggest benefits are: ...
I see what you mean, thanks for the detailed explanation. We'll probably not implement this immediately, it would probably be worth measuring real performance before optimizing this part. We'll keep that structure in mind as a potential future optimization.
As an aside: I want to push back on characterizing this as a CFG representation - it is an RVSDG, which is a form of dataflow graph
Noted, that is mostly due to our misunderstanding of your proposed design. After thinking about it longer, I am under the impression that the 2 designs are functionally equivalent.
To be 100% clear, it was more to provide a background for discussion about this IR, rather than to dictate a new design.
No problem! And thank you for this detailed feedback, it clears a few points I hadn't understood properly.
I'm assuming it will be necessary in order to do per-region uniquing. However, after mulling it over further, perhaps it is not necessary to do per-region uniquing? I would like to better understand the implications of global uniquing, and I think that requires a broader set of tests and implemented functionality to see how it all interacts. It may very well be that there are no adverse effects because of how simple AirScript is in terms of primitive operations.
That was our impression aswell, it seemed a bit irrelevant after the inlining/unrolling passes since they remove all function "scopes" and only keeps the root block. Note that we havent fully thought through the implications for conditional's and loop's blocks.
Our current understanding is that keeping it regionalized makes sense if we want to have pre-inlining/unrolling optimizations, since after inlining/unrolling, those would all be sharing the same scope anyway.
We'll implement regionalisation for now, but might try to remove that requirement in the future if at all possible.
At the very least, you have an example of an alternative approach if you find you need to elaborate on the current representation to address any issues you run into. Again, the idea was more to compare/contrast and provide an explanation of what I'm talking about, rather than dictate how you implement the IR.
Thanks again for this!
|
||
use crate::MirGraph; | ||
|
||
pub struct ConstantPropagation<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass becomes trivial with the design I outlined above, after one has implemented the generic pre- and post-order visitors for Operation
, basically:
- Perform a post-order visit of the
Function
, pushing all of the operations which have at least one use of a constant op result on a stack. This could also be an opportunity to unique constants, and separately enqueue constant ops that should be erased, while also recording a mapping from the values produced by those ops to the canonical value for that constant. - Process the next item on the work stack, and determine if that operation can be constant folded based on its constant operands. If so, fold the operation. If folded in place, add all users of the op's results to the work stack, and go to the next op on the stack. If the op can be folded away completely and replaced with a constant value, go to step 3. If the operation could not be folded, start processing the next item.
- Materialize a new constant op representing the folded constant, and insert it before the op we want to replace.
- Replace the results of the original op with those of the constant op (updating all the users at the same time). Add each unique using operation to the work stack, if not already present on the stack.
- Erase the original op. Go to step 2.
Because the data flow graph is implicit in the edges between operations/operands/results, doing the above is quite efficient, since we only need to do a single traversal of the graph, and we only revisit operations when one of their uses has been constant folded (so as to see if the change makes it foldable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree.
// diagnostics: &'a DiagnosticsHandler, | ||
//} | ||
|
||
pub struct Inlining {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this pass, you can again use a post-order visit of the graph, inlining the regions of structured ops (e.g. comprehensions, functions) into the containing region, by replacing all uses of entry block arguments with the operands used at the call site.
The intrusive linked lists containing operations in a Block
can be spliced/split and stolen relative to a cursor, so it is quite efficient to do the actual mechanical inlining.
It might not also be all that important to do inlining with the representation I outlined above, since I suspect it is relatively straightforward to lower to AIR without it, but some optimizations will be region or block-local, so that's the only tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intrusive linked lists containing operations in a Block can be spliced/split and stolen relative to a cursor, so it is quite efficient to do the actual mechanical inlining.
Wouldn't reusing existing nodes by splicing them during inlining cause conflicts in case a function is called twice (or more) with different arguments?
The current implementation duplicates those nodes for this reason. The main problem in doing that is maintaining the unique property globally, which should be fixed once a regionalized uniquing strategy is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't reusing existing nodes by splicing them during inlining cause conflicts in case a function is called twice (or more) with different arguments?
Sorry, I didn't mean stealing then splicing in this context, but rather that splicing the cloned list of ops into the middle of the op list you want to inline into, is very efficient due to the intrusive lists, only requiring writes to two pointers.
The exception would be things like a comprehension that turns out to only have a single iteration, you wouldn't need to clone in that case, but could just steal the list from the comprehension body.
random_values, | ||
trace_columns, | ||
bindings: Default::default(), | ||
}; | ||
|
||
// Insert placeholders nodes for future Operation::Definition (needed for function bodies to call other functions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit on what the placeholder op is and why it is needed? Again, i suspect the representation I outlined above makes this unnecessary, but would like to make sure I understand the problem you ran into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PlaceHolder Operation variant is currently used to obtain the NodeIndex before the node is fully defined to help with incremental graph edits. Main caveat is that we might have to rework the uniquing strategy.
Should be possible to do without this node type with a slight rework of how we insert nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. The approach I've taken with this in the past, and which I've seen in other similar data structures, e.g. cranelift_entity::PrimaryMap<K, V>
, have an underlying vector of Option<V>
, or require V
be default-constructible, so that:
- You can allocate indices before storing anything at that slot while ensuring that all indices are valid for the underlying vector, i.e. given an index, they can simply do
items.resize_with(index + 1, || None)
oritems.resize_with(index + 1, Default::default)
, to ensure that the index is valid, and returns something sensible. I think theOption<V>
approach works best, since you can actually determine if a slot is occupied or not, but both options let you allocate indices before you construct the actualV
. - You can reuse slots, assuming you use the
Option<V>
representation, by maintaining the next available index (or a free list of them, depending on how you "release" nodes). You then check if you have a slot you can reuse when a new index is requested. - The primary API can be basically
fn push(&mut self, node: V) -> K
, i.e. you store the node and get back its handle, but if you need to store the key in the node, you need to use two operations:fn alloc_key(&mut self) -> K
andunsafe fn insert(&mut self, index: K, node: V);
. The latter is marked unsafe since the index must be valid, and allocated byalloc_key
, though it isn't strictly speaking an unsafe operation in terms of memory safety.
In any case, using that approach is definitely preferable to needing a special placeholder variant in your node type that you then need to handle everywhere even though you know it will never appear in practice.
} | ||
} | ||
|
||
fn inline_call( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me whether this transformation is sound:
inline_all
is iterating over what amounts to a snapshot of the body of the outer definition (if I understand correctly).- But here in
inline_call
, the actual body of the definition is being mutated/replaced (to insert new items at the index at which the call occurred in the outer def). But after splicing ops into the body of the outer def, all subsequent calls toinline_call
are using stale indices.
Am I misunderstanding what's actually happening in inline_all
/inline_call
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree with your analysis and are aware of the current implementation's drawbacks. The next version will use a post-order visit coupled with a work stack where we add newly generated Call nodes while inlining.
} | ||
} | ||
|
||
fn inline_all(ir: &mut MirGraph) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things I was hoping we could achieve with the new IR was avoiding unbounded recursion to visit the graph and perform analysis/transformation. Writing it in recursive style for the initial implementation is fine, but we should try and refactor it into iterative style so that the pass can handle arbitrarily deep graphs without panicking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming might be causing confusion here, there is no unbounded recursion, just a for loop calling inline_all -> inline_call -> inline_body -> inline_op. I'll try and simplify this once the proposed Node design is implemented with back-references via Op.owner, and will rename the functions to differentiate more easily between inline_all and inline_call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was under the impression that inline_call
transitively visits all inlineable items reachable from the root node being visited, but if I understand your description correctly, it is only visiting the immediate child of the Call
node, i.e. it isn't also inlining calls in the callee, just the body of the callee without further inlining. Is that correct?
If so, then yeah we're good - I think I assumed the inliner was recursing into children, which is what prompted my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. it isn't also inlining calls in the callee, just the body of the callee without further inlining. Is that correct?
That is correct, with this slight difference:
The current implementation - which has soundness issues related to inlined calls not being rewritten in the stale iterator - does visit all bodies via a loop in inline_all
.
The fixed implementation won't have those issues and will instead only visit the "main" (aka entry points) bodies, adding both found and newly inserted Call
nodes to the work stack, and visiting only nodes added to the work stack.
use air_pass::Pass; | ||
|
||
#[test] | ||
fn test_inlining() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test a slightly more complex module, containing multiple calls in the same block, and calls to functions which themselves contain calls, preferably with at least one of the non-leaf functions containing a comprehension that calls a function in its body. This will catch most of the edge cases that come up when inlining non-trivial programs. It is particularly important to test the interaction of comprehensions containing function calls.
In particular, my question about soundness of how calls are inlined would probably be answered by whether or not such a test produces the expected expansion of the input program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more complex tests to assert soundness of this pass.
All of the transformation tests are operating on the AST using the old passes AFAICT, so those tests will need to be ported to the IR to validate that the IR upholds the expected behavior. Once the lowering to AIR is implemented, we can also evaluate how the resulting AIR looks. I'd say porting over the tests for inlining and some of the other transformations (once implemented in the IR), are the most essential, but will be a bit tedious, since those tests are against the AST, and the IR differs in various ways. Those tests largely aimed to ensure that:
That should be fine to split up into call inlining and loop unrolling, but you'll want to still probably want to visit the comprehension bodies during the function call inlining pass, to ensure that there are no calls remaining in the program to inline after it completes. Then, when unrolling comprehensions, no further work needs to be done. The tradeoff there is that you won't have the opportunity to do constant propagation across comprehension boundaries until the comprehensions are unrolled, which means you'll have more work to do when unrolling comprehensions. Probably the result would be that you need to run both inlining passes first, and only then do constant propagation and dead code elimination. With compilers, transformation ordering is a whole science/art, luckily we're working with a pretty restrictive ISA, but we'll need to be careful to ensure that transformations either don't make too many assumptions about when they are run, or that we run them in the exact order needed. |
Agreed with your analysis. Pass ordering can probably be reworked after the fact, we'll keep that in mind while implementing the various passes. |
@bitwalker, first of all, thank you for this detailed answer, it will sure help a lot! |
You bet! Sorry if presenting a sketch of the RVSDG design gave you a scare - didn't want to stress you out worrying about needing to refactor things, hopefully it just provided a useful reference for past and future conversations.
Yes, I believe all of my questions are addressed, except maybe a few small fresh ones I've left as comments. Ping me once a few of the more complex tests are implemented, and I'll re-review ASAP. Thanks for all the hard work! |
No problem! It will serve as a reference for further discussions, and as a target to tend towards as we refactor our implementation based on what we need.
I have left comments addressing these. For now I don't think the remaining points require immediate attention, we already have plenty to do based on your feedback. I'll be sure to ping you when a review is needed. Thanks again for all your valuable feedback! |
The goal of this PR is to introduce a Middle Intermediate Representation, to avoid making optimization on the AST directly, while keeping enough information to handle type checking, optimization for each pass.
See the initial issue and design discussion for additional context:
Putting it as draft now that we have:
Parse to AST > Constant propagation > Lowering to MIR
all pass, but the resulting MIR is not checked. We may need to improve testing on this side.Additionally, we have a partial pretty printer for the MIR to help us debug in the future (to ensure the graph constructed is what we expect after each pass)
@bitwalker, don't hesitate to comment on things that should be handled differently.
For now, we haven't fully settled on the various nodes of the MIR graph, as we add / change things depending on the needs of our implementation. We will also add checks made to ensure the proper diagnostics are raised (potentially after each pass as discussed previously), but we will probably do this at a later stage.