|
| 1 | +\input source_header.tex |
| 2 | + |
| 3 | +\begin{document} |
| 4 | + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 5 | + \docheader{2025}{Source}{\S 3 CSE Machine}{Martin Henz} |
| 6 | + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 7 | + |
| 8 | +\section{Purpose of CSE machine} |
| 9 | + |
| 10 | +The CSE machine can run programs of the language Source \S 3. |
| 11 | +The terms \emph{statement}, \emph{expression}, |
| 12 | +\emph{name}, etc refer to the |
| 13 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}. |
| 14 | + |
| 15 | +\section{Values and environments} |
| 16 | + |
| 17 | +The CSE machine handles the following kinds of values: |
| 18 | +\begin{itemize} |
| 19 | +\item number, |
| 20 | +\item Boolean value (\lstinline{true} and \lstinline{false}), |
| 21 | +\item string, |
| 22 | +\item $\texttt{null}$, |
| 23 | +\item $\texttt{undefined}$, |
| 24 | +\item array (including array with length 2 that we call \emph{pair}), and |
| 25 | +\item closure, consisting of a list of parameters, a body statement, and an \emph{environment}; |
| 26 | +a closure is either \emph{simple} or \emph{complex}. |
| 27 | +\end{itemize} |
| 28 | +Numbers, Boolean values, strings, and arrays are specified in the |
| 29 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}. |
| 30 | +Environments are stacks (lists) of frames, and frames specify the bindings of some names. |
| 31 | +If a frame has a binding for a name, the name is either \emph{unassigned} or |
| 32 | +bound to a value. |
| 33 | +The \emph{global environment} has a single frame, the \emph{global frame}, which has |
| 34 | +bindings for all predeclared names of Source \S 3. |
| 35 | +Non-primitive predeclared functions are bound to closures whose environment is the global environment, |
| 36 | +and whose parameters and bodies are given in the |
| 37 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}. |
| 38 | +Primitive functions are explained below. |
| 39 | + |
| 40 | +\section{Value producing statements} |
| 41 | + |
| 42 | +In Source \S 3, constant, variable, and function declarations, \lstinline{break} |
| 43 | +and \lstinline{continue} are non-value-producing. |
| 44 | +Expression statements, conditional statements, \lstinline{return} statements, |
| 45 | +and loops are value-producing. |
| 46 | +A statement sequence ($\textit{statement}\ldots$) is value-producing if any of its |
| 47 | +component statements is value-producing. A block is value-producing if its body |
| 48 | +is value-producing. |
| 49 | + |
| 50 | +\section{Running the machine} |
| 51 | + |
| 52 | +Before running a program, it is checked for syntactic consistency: all names need |
| 53 | +to be declared in the program or predeclared in the global environment, return |
| 54 | +statements can only occur in function bodies, \lstinline{continue;} and \lstinline{break;} |
| 55 | +can only occur in loops outside of function bodies that are enclosed by the loop, assignment to a name declared with \lstinline{const} is not allowed, and function parameters cannot |
| 56 | +be redeclared directly in the body block of the function. |
| 57 | + |
| 58 | +The CSE machine has three components: |
| 59 | +\begin{description} |
| 60 | +\item[C (control):] a stack (list) of program components (expressions and statements) and \emph{instructions} |
| 61 | +(see Section~\ref{transitions} for details on instructions), |
| 62 | +\item[S (stash):] a stack (list) of values, and |
| 63 | +\item[E (environments):] a set of environments, |
| 64 | +one of which is designated as the \emph{current environment}. |
| 65 | +\end{description} |
| 66 | +The CSE machine is running a given program $P$ by placing it in the control, wrapped |
| 67 | +in a block \verb#{# $P$ \verb#}#. The stash is intially empty and the environments only |
| 68 | +contain the global environment. |
| 69 | + |
| 70 | +\section{Machine transitions} |
| 71 | +\label{transitions} |
| 72 | +The CSE machine keeps transforming $C$, $S$, and $E$, based on |
| 73 | +the first element of $C$. |
| 74 | +That element is popped, i.e. it is not |
| 75 | +included in the new $C$ after the transformation. |
| 76 | +The following rules describe the additional |
| 77 | +changes in $C$, $S$, and $E$, based on the first element of~$C$. |
| 78 | + |
| 79 | +\subsection*{Statements} |
| 80 | + |
| 81 | +\begin{description} |
| 82 | + |
| 83 | +\item[\textit{statement}$\ldots$:] The component statements of the sequence |
| 84 | +are pushed on $C$; each |
| 85 | +value-producing statement is followed by a \texttt{pop} instruction if it is not |
| 86 | +the last value-producing statement of the sequence. (The exact position of this |
| 87 | +\texttt{pop} instruction may vary.) |
| 88 | + |
| 89 | +\item[\texttt{const}/\texttt{let}\ $\textit{name}$ \ \texttt{=} \ \textit{expression}:] |
| 90 | +\textit{expression} is pushed on $C$, followed by an instruction \texttt{asgn} \textit{name}, |
| 91 | +followed by a \texttt{pop} instruction. |
| 92 | + |
| 93 | +\item[\texttt{function}...:] |
| 94 | +The corresponding constant declaration is pushed on $C$. |
| 95 | + |
| 96 | +\item[\texttt{return}\ $\textit{expression}$ \texttt{;}:] |
| 97 | +$\textit{expression}$ is pushed on $C$, followed by a \texttt{return} instruction. |
| 98 | + |
| 99 | +\item[\texttt{if (}\ $\textit{expression}$ \texttt{)}\ $\textit{block}_1$\ |
| 100 | +\texttt{else}\ $\textit{block}_2$:] |
| 101 | +$\textit{expression}$ is pushe on $C$, followed by a \texttt{branch} instruction |
| 102 | +that has |
| 103 | +$\textit{block}_1$ as its consequent branch and |
| 104 | +$\textit{block}_2$ as its alternative branch. |
| 105 | + |
| 106 | +\item[\texttt{while (}\ $\textit{expression}$ \texttt{)}\ $\textit{block}$:] |
| 107 | +The name $\texttt{undefined}$ is pushed on $C$, followed by |
| 108 | +$\textit{expression}$, |
| 109 | +followed by a \texttt{while} instruction |
| 110 | +that has |
| 111 | +$\textit{block}$ as its body and $\textit{expression}$ as its predicate. |
| 112 | +If $\textit{block}$ contains a \texttt{break} |
| 113 | +statement that is not included in a nested block, the \texttt{while} instruction |
| 114 | +is followed by a \texttt{brk mark} instruction. |
| 115 | + |
| 116 | +\item[$\texttt{for (}\texttt{let}\ \textit{name} \ \texttt{=} \ \textit{expression}_1;\ \textit{expression}_2;\ \textit{expression}_3\texttt{)}\ \textit{block}$:] |
| 117 | +The corresponding for loop without |
| 118 | +loop control variable is pushed on $C$ as specified in |
| 119 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}. |
| 120 | + |
| 121 | +\item[$\texttt{for (}\ \textit{expression}_1\texttt{;}\ \textit{expression}_2\texttt{;}\ \textit{expression}_3\texttt{)}\ \textit{block}$:] |
| 122 | +The name $\texttt{undefined}$ is pushed on $C$, |
| 123 | +followed by $\textit{expression}_1$, |
| 124 | +followed by a \texttt{pop} instruction, |
| 125 | +followed by a \texttt{for} instruction |
| 126 | +that has |
| 127 | +$\textit{block}$ as its body, |
| 128 | +$\textit{expression}_2$ as its predicate, |
| 129 | +and $\textit{expression}_3$ as its increment expression. |
| 130 | +If $\textit{block}$ contains a \lstinline{break;} |
| 131 | +statement that is not included in a nested block, the \texttt{for} instruction |
| 132 | +is followed by a \texttt{brk mark} instruction. |
| 133 | + |
| 134 | +\item[\texttt{break;}:] |
| 135 | +A $\texttt{break}$ instruction is pushed on $C$. |
| 136 | + |
| 137 | +\item[\texttt{continue;}:] |
| 138 | +A $\texttt{continue}$ instruction is pushed on $C$. |
| 139 | + |
| 140 | +\item[\texttt{\{}\ $\textit{statement}\ldots$ \texttt{\}}:] |
| 141 | +The statement sequence $\textit{statement}\ldots$ is pushed on $C$. |
| 142 | +If the current environment is needed after the block, the |
| 143 | +statement sequence is followed by an \texttt{env} instruction that |
| 144 | +refers to the current environment. |
| 145 | +If the statement sequence contains declarations outside of |
| 146 | +any block, a new environment is added to $E$ that extends the current environment |
| 147 | +with a frame in which all declared names are unassigned. This new environment |
| 148 | +is now considered the current environment. |
| 149 | + |
| 150 | +\end{description} |
| 151 | + |
| 152 | +\subsection*{Expressions} |
| 153 | + |
| 154 | + |
| 155 | +\begin{description} |
| 156 | +\item[Primitive expressions:] |
| 157 | +Primitive expressions (numbers, strings, \lstinline{true}, \lstinline{false}, |
| 158 | +\lstinline{null}) are pushed on $S$. |
| 159 | + |
| 160 | +\item[$\textit{name}$:] |
| 161 | +$\textit{name}$ is looked up in the current environment, frame-by-frame starting with |
| 162 | +the first frame, until a frame is found that has a binding for |
| 163 | +$\textit{name}$. If |
| 164 | +$\textit{name}$ is unassigned in the frame, the program execution is terminated |
| 165 | +and an error is displayed. If |
| 166 | +$\textit{name}$ is bound to a value, that value is pushed on $S$. |
| 167 | + |
| 168 | +\item[$\textit{expression}_1\ \textit{binary-operator}\ \textit{expression}_2$:] |
| 169 | +$\textit{expression}_1$ is pushed on $C$, followed by |
| 170 | +$\textit{expression}_2$, followed by a |
| 171 | +$\textit{binary-operator}$ instruction for |
| 172 | +$\textit{binary-operator}$. |
| 173 | + |
| 174 | +\item[$\textit{unary-operator}\ \textit{expression}$:] |
| 175 | +$\textit{expression}$ is pushed on $C$, followed by a |
| 176 | +$\textit{unary-operator}$ instruction for |
| 177 | +$\textit{unary-operator}$. |
| 178 | + |
| 179 | +\item[$\textit{expression}_1\ \textit{binary-logical}\ \textit{expression}_2$:] |
| 180 | +The corresponding conditional expression |
| 181 | +is pushed on $C$ as specified in |
| 182 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}. |
| 183 | + |
| 184 | +\item[$\textit{expression}\ \texttt{(}\ \textit{expression}_1,\ldots,\textit{expression}_n\ |
| 185 | +\texttt{)}$:] |
| 186 | +$\textit{expression}$ is pushed on $C$, followed by |
| 187 | +$\textit{expression}_1$, followed by ..., followed by |
| 188 | +$\textit{expression}_n$, followed by |
| 189 | +the instruction \texttt{call}\ $n$. |
| 190 | + |
| 191 | +\item[$\textit{names} \texttt{ => } \textit{expression}$:] |
| 192 | +A simple closure is pushed on $S$ that has |
| 193 | +$\textit{names}$ as parameters, |
| 194 | +$\textit{expression}$ as body, |
| 195 | +the length of \textit{names} as arity, |
| 196 | +and the current environment as environment. |
| 197 | + |
| 198 | +\item[$\textit{names} \texttt{ => \{ return}\ \textit{expression}\texttt{; \}}$:] |
| 199 | +A simple closure is pushed on $S$ that has |
| 200 | +$\textit{names}$ as parameters, |
| 201 | +$\textit{expression}$ as body, and the current environment as environment. |
| 202 | + |
| 203 | +\item[$\textit{names} \texttt{ => } \textit{block}$:] |
| 204 | +A complex closure is pushed on $S$ that has |
| 205 | +$\textit{names}$ as parameters, |
| 206 | +$\textit{block}$ as body, and the current environment as environment. |
| 207 | + |
| 208 | +\item[$\textit{name} \texttt{ = } \textit{expression}$:] |
| 209 | +$\textit{expression}$ is pushed on $C$, followed by |
| 210 | +an \texttt{asgn}\ $\textit{name}$ instruction. |
| 211 | + |
| 212 | +\item[$\textit{expression}_1\ \texttt{[}\ \textit{expression}_2\ \texttt{]}\ \texttt{=}\ \textit{expression}_3$:] |
| 213 | +$\textit{expression}_1$ is pushed on $C$, followed by |
| 214 | +$\textit{expression}_2$, followed by |
| 215 | +$\textit{expression}_3$, followed by |
| 216 | +an \texttt{arr asgn} instruction. |
| 217 | + |
| 218 | +\item[$\textit{expression}_1\ \texttt{?}\ \textit{expression}_2\ \texttt{:}\ |
| 219 | +\textit{expression}_3$:] |
| 220 | +$\textit{expression}_1$ is pushed on $C$, followed by |
| 221 | +a \texttt{branch} instruction that has $\textit{expression}_2$ as |
| 222 | +consequent branch and $\textit{expression}_3$ and alternative branch. |
| 223 | + |
| 224 | +\item[$\textit{expression}_1\ \texttt{[}\ \textit{expression}_2\ \texttt{]}$:] |
| 225 | +$\textit{expression}_1$ is pushed on $C$, followed by |
| 226 | +$\textit{expression}_2$, followed by an |
| 227 | +\texttt{arr acc} instruction. |
| 228 | + |
| 229 | +\item[$\texttt{[}\textit{expression}_1\texttt{,}\ \ldots\ \texttt{,} \textit{expression}_n \texttt{]}$:] |
| 230 | +$\textit{expression}_1$ is pushed on $C$, followed by |
| 231 | +$\textit{expression}_2$, etc until |
| 232 | +$\textit{expression}_n$, followed by an \texttt{arr lit} $n$ instruction. |
| 233 | + |
| 234 | +\end{description} |
| 235 | + |
| 236 | +\subsection*{Instructions} |
| 237 | + |
| 238 | +\begin{description} |
| 239 | + |
| 240 | +\item[$\texttt{pop}$:] |
| 241 | +The first value on $S$ is popped. |
| 242 | + |
| 243 | +\item[$\textit{binary-operator}$:] |
| 244 | +The first two values on $S$ are replaced by the result of |
| 245 | +applying the operator to the second and first as operands, in this order. |
| 246 | +If the operands do not comply with the |
| 247 | +types specified in Section~3 of |
| 248 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}, |
| 249 | +the program execution is terminated an an error is displayed. |
| 250 | + |
| 251 | +\item[$\textit{unary-operator}$:] |
| 252 | +The first value on $S$ is replaced by the result of |
| 253 | +applying the operator to it. |
| 254 | +If the operand does not comply with the |
| 255 | +types specified in Section~3 of |
| 256 | +\href{https://docs.sourceacademy.org/source_3.pdf}{\color{blue}Specification of Source \S 3}, |
| 257 | +the program execution is terminated and an error is displayed. |
| 258 | + |
| 259 | +\item[$\texttt{asgn}\ \textit{name}$:] |
| 260 | +$\textit{name}$ is looked up in the current environment, frame-by-frame starting with the |
| 261 | +first frame, |
| 262 | +until a frame is found that has a binding for |
| 263 | +$\textit{name}$. |
| 264 | +% If $\textit{name}$ is unassigned in the frame, the program execution is terminated |
| 265 | +% and an error is displayed. If |
| 266 | +This frame is changed such that $\textit{name}$ is bound to the first value on $S$. |
| 267 | + |
| 268 | +%\item[$\texttt{init}\ \textit{name}$:] |
| 269 | +%$\textit{name}$ is bound to the first value on $S$ in the first frame of |
| 270 | +%the current environment. |
| 271 | + |
| 272 | +\item[$\texttt{return}$:] |
| 273 | +The control items are popped one-by-one, starting with the first, until |
| 274 | +a \texttt{mark} instruction is reached, which is also popped. |
| 275 | + |
| 276 | +\item[$\texttt{branch}$:] |
| 277 | +The first value $b$ is popped from $S$. If $b$ is true, the branch instructions |
| 278 | +consequent is pushed on $C$, if $b$ is false, |
| 279 | +the branch instructions |
| 280 | +alternative is pushed on $C$, and otherwise the program |
| 281 | +execution is terminated with an error. |
| 282 | + |
| 283 | +\item[$\texttt{while}$:] |
| 284 | +The first value $b$ is popped from $S$. If $b$ is true, |
| 285 | +the next value is popped from $S$, and the body of the |
| 286 | +while instruction is pushed on $C$, followed by the predicate |
| 287 | +of the while instruction, followed by the while instruction itself. |
| 288 | +If $b$ is false, no action is taken. |
| 289 | +Otherwise the program |
| 290 | +execution is terminated with an error. |
| 291 | + |
| 292 | +\item[$\texttt{for}$:] |
| 293 | +The first value $b$ is popped from $S$. If $b$ is true, |
| 294 | +the next value is popped from $S$, and the body of the |
| 295 | +for instruction is pushed on $C$, |
| 296 | +followed by the increment expression |
| 297 | +of the for instruction, |
| 298 | +followed by the predicate |
| 299 | +of the for instruction, followed by the for instruction itself. |
| 300 | +If $b$ is false, no action is taken. |
| 301 | +Otherwise the program |
| 302 | +execution is terminated with an error. |
| 303 | + |
| 304 | +\item[$\texttt{break}$:] |
| 305 | +The control items are popped one-by-one from $C$, starting with the first, until |
| 306 | +a \texttt{brk mark} instruction is reached, which is also popped. |
| 307 | + |
| 308 | +\item[$\texttt{continue}$:] |
| 309 | +The control items are popped one-by-one from $C$, starting with the first, until |
| 310 | +a \texttt{while} instruction is reached, which is kept on $C$. |
| 311 | + |
| 312 | +\item[$\texttt{env}$:] |
| 313 | +Execution continues with the environment of the \texttt{env} instruction as |
| 314 | +the current environment. |
| 315 | + |
| 316 | +\item[$\texttt{call}\ n$:] |
| 317 | +The $n + 1$st element on $S$ (starting counting with 1) needs to be a closure |
| 318 | +or primitive function with arity $n$, otherwise the program |
| 319 | +execution is terminated with an error. |
| 320 | + |
| 321 | +If the $n + 1$st element on $S$ is a primitive function, |
| 322 | +the first $n + 1$ values on $S$ are replaced by the result of applying the primitive |
| 323 | +function to the first $n$ elements on $S$ in reverse order in which they appear. |
| 324 | + |
| 325 | +If the $n + 1$st element on $S$ is a closure, |
| 326 | +the body of the closure is pushed on $C$. |
| 327 | +This is followed by a \texttt{mark} instruction if the closure is complex. |
| 328 | +If the current environment is needed after the call instruction, this |
| 329 | +is followed by an $\texttt{env}$ instruction that refers to the current environment. |
| 330 | +If $n \neq 0$, a new environment is added to $E$ |
| 331 | +that extends the environment of the closure with a frame in which the |
| 332 | +parameters of the closure are bound to the first $n$ elements on $S$ |
| 333 | +in reverse order in which they appear. |
| 334 | +This new environment is now considered the current environment. |
| 335 | +The first $n + 1$ values are popped from $S$. |
| 336 | + |
| 337 | +\item[\texttt{arr lit} $n$:] |
| 338 | +An array value with $n$ elements is constructed, whose first $n$ array entries |
| 339 | +are the first $n$ elements on $S$ |
| 340 | +(starting counting with 1) in reverse order in which they appear: The array entry |
| 341 | +at index $n - 1$ is the first value on $S$ and the array entry |
| 342 | +at index $0$ is the $n$th value on $S$. The first $n$ values on $S$ are replaced |
| 343 | +by the array value. |
| 344 | + |
| 345 | +\item[$\texttt{arr acc}$:] |
| 346 | +The second value on $S$ (starting counting from 1) needs to be an array value, and the first |
| 347 | +value on $S$ needs to be an index---a non-negative integer from 1 to $2^{32} - 2$ |
| 348 | +(4,294,967,294)---otherwise the execution of the program is terminated with an error. |
| 349 | +The first two values on $S$ are replaced by |
| 350 | +the array value at the given index or |
| 351 | +the value \texttt{undefined} |
| 352 | +if the array does not have a value at the given index. |
| 353 | + |
| 354 | +\item[$\texttt{arr asgn}$:] |
| 355 | +The third value on $S$ (starting counting from 1) needs to be an array value, and the second |
| 356 | +value on $S$ needs to be an index---a non-negative integer from 1 to $2^{32} - 2$ |
| 357 | +(4,294,967,294)---otherwise the execution of the program is terminated with an error. |
| 358 | +The array value at the given index is replaced by the first value on $S$, or added |
| 359 | +to the array if the array did not have a value at the given index. |
| 360 | +The second and third values are removed from $S$, but the first value is kept. |
| 361 | + |
| 362 | +\end{description} |
| 363 | + |
| 364 | +\subsection*{Result} |
| 365 | + |
| 366 | +When $C$ is empty, |
| 367 | +the first value of $S$ is the result of program, or if $S$ is empty, |
| 368 | +the value \lstinline{undefined} is the result of the program. |
| 369 | + |
| 370 | +\end{document} |
0 commit comments