Skip to content

Commit 28d6c02

Browse files
authored
cse machine specs added (#1822)
1 parent 45ec35b commit 28d6c02

File tree

1 file changed

+370
-0
lines changed

1 file changed

+370
-0
lines changed

docs/specs/source_3_cse.tex

Lines changed: 370 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
\input source_header.tex
2+
3+
\begin{document}
4+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5+
\docheader{2025}{Source}{\S 3 CSE Machine}{Martin Henz}
6+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7+
8+
\section{Purpose of CSE machine}
9+
10+
The CSE machine can run programs of the language Source \S 3.
11+
The terms \emph{statement}, \emph{expression},
12+
\emph{name}, etc refer to the
13+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}.
14+
15+
\section{Values and environments}
16+
17+
The CSE machine handles the following kinds of values:
18+
\begin{itemize}
19+
\item number,
20+
\item Boolean value (\lstinline{true} and \lstinline{false}),
21+
\item string,
22+
\item $\texttt{null}$,
23+
\item $\texttt{undefined}$,
24+
\item array (including array with length 2 that we call \emph{pair}), and
25+
\item closure, consisting of a list of parameters, a body statement, and an \emph{environment};
26+
a closure is either \emph{simple} or \emph{complex}.
27+
\end{itemize}
28+
Numbers, Boolean values, strings, and arrays are specified in the
29+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}.
30+
Environments are stacks (lists) of frames, and frames specify the bindings of some names.
31+
If a frame has a binding for a name, the name is either \emph{unassigned} or
32+
bound to a value.
33+
The \emph{global environment} has a single frame, the \emph{global frame}, which has
34+
bindings for all predeclared names of Source \S 3.
35+
Non-primitive predeclared functions are bound to closures whose environment is the global environment,
36+
and whose parameters and bodies are given in the
37+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}.
38+
Primitive functions are explained below.
39+
40+
\section{Value producing statements}
41+
42+
In Source \S 3, constant, variable, and function declarations, \lstinline{break}
43+
and \lstinline{continue} are non-value-producing.
44+
Expression statements, conditional statements, \lstinline{return} statements,
45+
and loops are value-producing.
46+
A statement sequence ($\textit{statement}\ldots$) is value-producing if any of its
47+
component statements is value-producing. A block is value-producing if its body
48+
is value-producing.
49+
50+
\section{Running the machine}
51+
52+
Before running a program, it is checked for syntactic consistency: all names need
53+
to be declared in the program or predeclared in the global environment, return
54+
statements can only occur in function bodies, \lstinline{continue;} and \lstinline{break;}
55+
can only occur in loops outside of function bodies that are enclosed by the loop, assignment to a name declared with \lstinline{const} is not allowed, and function parameters cannot
56+
be redeclared directly in the body block of the function.
57+
58+
The CSE machine has three components:
59+
\begin{description}
60+
\item[C (control):] a stack (list) of program components (expressions and statements) and \emph{instructions}
61+
(see Section~\ref{transitions} for details on instructions),
62+
\item[S (stash):] a stack (list) of values, and
63+
\item[E (environments):] a set of environments,
64+
one of which is designated as the \emph{current environment}.
65+
\end{description}
66+
The CSE machine is running a given program $P$ by placing it in the control, wrapped
67+
in a block \verb#{# $P$ \verb#}#. The stash is intially empty and the environments only
68+
contain the global environment.
69+
70+
\section{Machine transitions}
71+
\label{transitions}
72+
The CSE machine keeps transforming $C$, $S$, and $E$, based on
73+
the first element of $C$.
74+
That element is popped, i.e. it is not
75+
included in the new $C$ after the transformation.
76+
The following rules describe the additional
77+
changes in $C$, $S$, and $E$, based on the first element of~$C$.
78+
79+
\subsection*{Statements}
80+
81+
\begin{description}
82+
83+
\item[\textit{statement}$\ldots$:] The component statements of the sequence
84+
are pushed on $C$; each
85+
value-producing statement is followed by a \texttt{pop} instruction if it is not
86+
the last value-producing statement of the sequence. (The exact position of this
87+
\texttt{pop} instruction may vary.)
88+
89+
\item[\texttt{const}/\texttt{let}\ $\textit{name}$ \ \texttt{=} \ \textit{expression}:]
90+
\textit{expression} is pushed on $C$, followed by an instruction \texttt{asgn} \textit{name},
91+
followed by a \texttt{pop} instruction.
92+
93+
\item[\texttt{function}...:]
94+
The corresponding constant declaration is pushed on $C$.
95+
96+
\item[\texttt{return}\ $\textit{expression}$ \texttt{;}:]
97+
$\textit{expression}$ is pushed on $C$, followed by a \texttt{return} instruction.
98+
99+
\item[\texttt{if (}\ $\textit{expression}$ \texttt{)}\ $\textit{block}_1$\
100+
\texttt{else}\ $\textit{block}_2$:]
101+
$\textit{expression}$ is pushe on $C$, followed by a \texttt{branch} instruction
102+
that has
103+
$\textit{block}_1$ as its consequent branch and
104+
$\textit{block}_2$ as its alternative branch.
105+
106+
\item[\texttt{while (}\ $\textit{expression}$ \texttt{)}\ $\textit{block}$:]
107+
The name $\texttt{undefined}$ is pushed on $C$, followed by
108+
$\textit{expression}$,
109+
followed by a \texttt{while} instruction
110+
that has
111+
$\textit{block}$ as its body and $\textit{expression}$ as its predicate.
112+
If $\textit{block}$ contains a \texttt{break}
113+
statement that is not included in a nested block, the \texttt{while} instruction
114+
is followed by a \texttt{brk mark} instruction.
115+
116+
\item[$\texttt{for (}\texttt{let}\ \textit{name} \ \texttt{=} \ \textit{expression}_1;\ \textit{expression}_2;\ \textit{expression}_3\texttt{)}\ \textit{block}$:]
117+
The corresponding for loop without
118+
loop control variable is pushed on $C$ as specified in
119+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}.
120+
121+
\item[$\texttt{for (}\ \textit{expression}_1\texttt{;}\ \textit{expression}_2\texttt{;}\ \textit{expression}_3\texttt{)}\ \textit{block}$:]
122+
The name $\texttt{undefined}$ is pushed on $C$,
123+
followed by $\textit{expression}_1$,
124+
followed by a \texttt{pop} instruction,
125+
followed by a \texttt{for} instruction
126+
that has
127+
$\textit{block}$ as its body,
128+
$\textit{expression}_2$ as its predicate,
129+
and $\textit{expression}_3$ as its increment expression.
130+
If $\textit{block}$ contains a \lstinline{break;}
131+
statement that is not included in a nested block, the \texttt{for} instruction
132+
is followed by a \texttt{brk mark} instruction.
133+
134+
\item[\texttt{break;}:]
135+
A $\texttt{break}$ instruction is pushed on $C$.
136+
137+
\item[\texttt{continue;}:]
138+
A $\texttt{continue}$ instruction is pushed on $C$.
139+
140+
\item[\texttt{\{}\ $\textit{statement}\ldots$ \texttt{\}}:]
141+
The statement sequence $\textit{statement}\ldots$ is pushed on $C$.
142+
If the current environment is needed after the block, the
143+
statement sequence is followed by an \texttt{env} instruction that
144+
refers to the current environment.
145+
If the statement sequence contains declarations outside of
146+
any block, a new environment is added to $E$ that extends the current environment
147+
with a frame in which all declared names are unassigned. This new environment
148+
is now considered the current environment.
149+
150+
\end{description}
151+
152+
\subsection*{Expressions}
153+
154+
155+
\begin{description}
156+
\item[Primitive expressions:]
157+
Primitive expressions (numbers, strings, \lstinline{true}, \lstinline{false},
158+
\lstinline{null}) are pushed on $S$.
159+
160+
\item[$\textit{name}$:]
161+
$\textit{name}$ is looked up in the current environment, frame-by-frame starting with
162+
the first frame, until a frame is found that has a binding for
163+
$\textit{name}$. If
164+
$\textit{name}$ is unassigned in the frame, the program execution is terminated
165+
and an error is displayed. If
166+
$\textit{name}$ is bound to a value, that value is pushed on $S$.
167+
168+
\item[$\textit{expression}_1\ \textit{binary-operator}\ \textit{expression}_2$:]
169+
$\textit{expression}_1$ is pushed on $C$, followed by
170+
$\textit{expression}_2$, followed by a
171+
$\textit{binary-operator}$ instruction for
172+
$\textit{binary-operator}$.
173+
174+
\item[$\textit{unary-operator}\ \textit{expression}$:]
175+
$\textit{expression}$ is pushed on $C$, followed by a
176+
$\textit{unary-operator}$ instruction for
177+
$\textit{unary-operator}$.
178+
179+
\item[$\textit{expression}_1\ \textit{binary-logical}\ \textit{expression}_2$:]
180+
The corresponding conditional expression
181+
is pushed on $C$ as specified in
182+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}.
183+
184+
\item[$\textit{expression}\ \texttt{(}\ \textit{expression}_1,\ldots,\textit{expression}_n\
185+
\texttt{)}$:]
186+
$\textit{expression}$ is pushed on $C$, followed by
187+
$\textit{expression}_1$, followed by ..., followed by
188+
$\textit{expression}_n$, followed by
189+
the instruction \texttt{call}\ $n$.
190+
191+
\item[$\textit{names} \texttt{ => } \textit{expression}$:]
192+
A simple closure is pushed on $S$ that has
193+
$\textit{names}$ as parameters,
194+
$\textit{expression}$ as body,
195+
the length of \textit{names} as arity,
196+
and the current environment as environment.
197+
198+
\item[$\textit{names} \texttt{ => \{ return}\ \textit{expression}\texttt{; \}}$:]
199+
A simple closure is pushed on $S$ that has
200+
$\textit{names}$ as parameters,
201+
$\textit{expression}$ as body, and the current environment as environment.
202+
203+
\item[$\textit{names} \texttt{ => } \textit{block}$:]
204+
A complex closure is pushed on $S$ that has
205+
$\textit{names}$ as parameters,
206+
$\textit{block}$ as body, and the current environment as environment.
207+
208+
\item[$\textit{name} \texttt{ = } \textit{expression}$:]
209+
$\textit{expression}$ is pushed on $C$, followed by
210+
an \texttt{asgn}\ $\textit{name}$ instruction.
211+
212+
\item[$\textit{expression}_1\ \texttt{[}\ \textit{expression}_2\ \texttt{]}\ \texttt{=}\ \textit{expression}_3$:]
213+
$\textit{expression}_1$ is pushed on $C$, followed by
214+
$\textit{expression}_2$, followed by
215+
$\textit{expression}_3$, followed by
216+
an \texttt{arr asgn} instruction.
217+
218+
\item[$\textit{expression}_1\ \texttt{?}\ \textit{expression}_2\ \texttt{:}\
219+
\textit{expression}_3$:]
220+
$\textit{expression}_1$ is pushed on $C$, followed by
221+
a \texttt{branch} instruction that has $\textit{expression}_2$ as
222+
consequent branch and $\textit{expression}_3$ and alternative branch.
223+
224+
\item[$\textit{expression}_1\ \texttt{[}\ \textit{expression}_2\ \texttt{]}$:]
225+
$\textit{expression}_1$ is pushed on $C$, followed by
226+
$\textit{expression}_2$, followed by an
227+
\texttt{arr acc} instruction.
228+
229+
\item[$\texttt{[}\textit{expression}_1\texttt{,}\ \ldots\ \texttt{,} \textit{expression}_n \texttt{]}$:]
230+
$\textit{expression}_1$ is pushed on $C$, followed by
231+
$\textit{expression}_2$, etc until
232+
$\textit{expression}_n$, followed by an \texttt{arr lit} $n$ instruction.
233+
234+
\end{description}
235+
236+
\subsection*{Instructions}
237+
238+
\begin{description}
239+
240+
\item[$\texttt{pop}$:]
241+
The first value on $S$ is popped.
242+
243+
\item[$\textit{binary-operator}$:]
244+
The first two values on $S$ are replaced by the result of
245+
applying the operator to the second and first as operands, in this order.
246+
If the operands do not comply with the
247+
types specified in Section~3 of
248+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3},
249+
the program execution is terminated an an error is displayed.
250+
251+
\item[$\textit{unary-operator}$:]
252+
The first value on $S$ is replaced by the result of
253+
applying the operator to it.
254+
If the operand does not comply with the
255+
types specified in Section~3 of
256+
\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3},
257+
the program execution is terminated and an error is displayed.
258+
259+
\item[$\texttt{asgn}\ \textit{name}$:]
260+
$\textit{name}$ is looked up in the current environment, frame-by-frame starting with the
261+
first frame,
262+
until a frame is found that has a binding for
263+
$\textit{name}$.
264+
% If $\textit{name}$ is unassigned in the frame, the program execution is terminated
265+
% and an error is displayed. If
266+
This frame is changed such that $\textit{name}$ is bound to the first value on $S$.
267+
268+
%\item[$\texttt{init}\ \textit{name}$:]
269+
%$\textit{name}$ is bound to the first value on $S$ in the first frame of
270+
%the current environment.
271+
272+
\item[$\texttt{return}$:]
273+
The control items are popped one-by-one, starting with the first, until
274+
a \texttt{mark} instruction is reached, which is also popped.
275+
276+
\item[$\texttt{branch}$:]
277+
The first value $b$ is popped from $S$. If $b$ is true, the branch instructions
278+
consequent is pushed on $C$, if $b$ is false,
279+
the branch instructions
280+
alternative is pushed on $C$, and otherwise the program
281+
execution is terminated with an error.
282+
283+
\item[$\texttt{while}$:]
284+
The first value $b$ is popped from $S$. If $b$ is true,
285+
the next value is popped from $S$, and the body of the
286+
while instruction is pushed on $C$, followed by the predicate
287+
of the while instruction, followed by the while instruction itself.
288+
If $b$ is false, no action is taken.
289+
Otherwise the program
290+
execution is terminated with an error.
291+
292+
\item[$\texttt{for}$:]
293+
The first value $b$ is popped from $S$. If $b$ is true,
294+
the next value is popped from $S$, and the body of the
295+
for instruction is pushed on $C$,
296+
followed by the increment expression
297+
of the for instruction,
298+
followed by the predicate
299+
of the for instruction, followed by the for instruction itself.
300+
If $b$ is false, no action is taken.
301+
Otherwise the program
302+
execution is terminated with an error.
303+
304+
\item[$\texttt{break}$:]
305+
The control items are popped one-by-one from $C$, starting with the first, until
306+
a \texttt{brk mark} instruction is reached, which is also popped.
307+
308+
\item[$\texttt{continue}$:]
309+
The control items are popped one-by-one from $C$, starting with the first, until
310+
a \texttt{while} instruction is reached, which is kept on $C$.
311+
312+
\item[$\texttt{env}$:]
313+
Execution continues with the environment of the \texttt{env} instruction as
314+
the current environment.
315+
316+
\item[$\texttt{call}\ n$:]
317+
The $n + 1$st element on $S$ (starting counting with 1) needs to be a closure
318+
or primitive function with arity $n$, otherwise the program
319+
execution is terminated with an error.
320+
321+
If the $n + 1$st element on $S$ is a primitive function,
322+
the first $n + 1$ values on $S$ are replaced by the result of applying the primitive
323+
function to the first $n$ elements on $S$ in reverse order in which they appear.
324+
325+
If the $n + 1$st element on $S$ is a closure,
326+
the body of the closure is pushed on $C$.
327+
This is followed by a \texttt{mark} instruction if the closure is complex.
328+
If the current environment is needed after the call instruction, this
329+
is followed by an $\texttt{env}$ instruction that refers to the current environment.
330+
If $n \neq 0$, a new environment is added to $E$
331+
that extends the environment of the closure with a frame in which the
332+
parameters of the closure are bound to the first $n$ elements on $S$
333+
in reverse order in which they appear.
334+
This new environment is now considered the current environment.
335+
The first $n + 1$ values are popped from $S$.
336+
337+
\item[\texttt{arr lit} $n$:]
338+
An array value with $n$ elements is constructed, whose first $n$ array entries
339+
are the first $n$ elements on $S$
340+
(starting counting with 1) in reverse order in which they appear: The array entry
341+
at index $n - 1$ is the first value on $S$ and the array entry
342+
at index $0$ is the $n$th value on $S$. The first $n$ values on $S$ are replaced
343+
by the array value.
344+
345+
\item[$\texttt{arr acc}$:]
346+
The second value on $S$ (starting counting from 1) needs to be an array value, and the first
347+
value on $S$ needs to be an index---a non-negative integer from 1 to $2^{32} - 2$
348+
(4,294,967,294)---otherwise the execution of the program is terminated with an error.
349+
The first two values on $S$ are replaced by
350+
the array value at the given index or
351+
the value \texttt{undefined}
352+
if the array does not have a value at the given index.
353+
354+
\item[$\texttt{arr asgn}$:]
355+
The third value on $S$ (starting counting from 1) needs to be an array value, and the second
356+
value on $S$ needs to be an index---a non-negative integer from 1 to $2^{32} - 2$
357+
(4,294,967,294)---otherwise the execution of the program is terminated with an error.
358+
The array value at the given index is replaced by the first value on $S$, or added
359+
to the array if the array did not have a value at the given index.
360+
The second and third values are removed from $S$, but the first value is kept.
361+
362+
\end{description}
363+
364+
\subsection*{Result}
365+
366+
When $C$ is empty,
367+
the first value of $S$ is the result of program, or if $S$ is empty,
368+
the value \lstinline{undefined} is the result of the program.
369+
370+
\end{document}

0 commit comments

Comments
 (0)