-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine Array_Index or Function_Call in AST #246
Comments
Hey, that is nice to see that you're trying to do this. But please beware that this is harder than you might think, due to MATLAB's... lets say casual relationship with common sense when it comes to language design. I cannot do this until I get semantic analysis working, because I need to know what functions are. The only way you can tell apart a function call from an array access is analysis there is nothing in the syntax of the language that deals with this. This I would consider a serious mistake from a language design point of view, but whatever. It's even worse. Consider this:
Why is this hard? Consider this calling context:
So yeah. Here this could be either an array access or a function pointer call; and this is sadly valid MATLAB. I have not yet decided how to deal with this. Do I treat any function as essentially generic, and give you multiple ASTs for each unique way of calling it? Or do I just make it illegal inside the MISS_HIT subset of MATLAB? |
You could do something a bit different, that might work without a working sem. You could create a run-time environment and use that. And one of the things you could do is make everything a class, and then if you have a function-instance you do call the Not sure if I am explaining it in a way that makes sense; to paraphrase generate code that is way more inefficient but deals with these problems dynamically. Similar to how python has dynamic typing; you could build a class hierarchy that supports everything you need and then generate code using it. |
Thanks for the heads up. I understand it is difficult to do the semantic analysis because of the complexity of Matlab language. So for now I came up with another way by providing an inherited class for numpy array to implement the Anyway, I will watch this todo item and hopefully we can do it in the clean way. |
Yes, that is what I had in mind. Since it's Python anyway, I think the performance impact of this is negligible. Some additional - extremely superficial - comments when looking at your code so far:
Have a look also at the make target m_ast_picture.pdf, it will give some clarity on the structure of the AST. |
Sure I can do as you suggested later as for now it is just a prototype. The hint on types picture is really useful! |
Also, a few more things to help you out... There is a tree-traversal construct that you might be able to use: miss_hit/miss_hit_core/m_ast.py Line 208 in 0737748
You can subclass this and overwrite the visit and visit_end. Visit is called when a node is visited for the first time, and visit_end is called after all the children have been visited. Both have 3 parameters:
If you want to know what kind of parents a node can have, you can check the set_parent method on the node; which will contain an assert. For example: miss_hit/miss_hit_core/m_ast.py Lines 696 to 698 in 0737748
This is set_parent of the Class_Definition node, and so what is enforced here is that a class definition can only have one possible parent node, a Class_File node. All nodes have a parent, except the compilation unit node. Finally, for code generation you likely have more than one file. I saw you extended mh_debug_parser; but this is a really bad starting point because it only ever processes one file. It's for me to test the parser. You really want to create a proper back-end, which gives you proper command-line handling, access to the config mechanism, etc. For this a good example is MH Lint, as it's the smallest. First you subclass the result class. This can be trivial: Lines 154 to 158 in 0737748
Then you subclass command_line.MISS_HIT_Back_End, as seen here: Line 160 in 0737748
There are three functions you may want to overwrite:
HTH |
Yes I indeed implemented the Translating A=A+3+4i;
disp(' a b " ');
disp('x');
disp(A(:, 4, 2:end));
clear A;
[out(4), ~] = func_example(~A(1, end), A(end-1)', A(3:(end-1)).');
function [out, out2] = func_example(A, B, C)
out = [];
out = [1;2;3];
out = [-.1 -0.1 .1; 1, func_example(A, B, C)+out+3, 3];
out2 = [out; out] / B;
out2 = out{A, B};
if (C == 1) && (B > 3) || (A==4)
out2 = out2 | 1;
elseif C == 2
else
for i = 1:A
continue
out2 = addone(out2(1:5:2)./A, end) + out(i)\out(:);
break
end
return
end
end
function out = addone(A)
out=A+1;
end into def func_example(A, B, C):
out = mnp.array([])
out = mnp.stack((1, 2, 3))
out = mnp.stack(([-.1, -0.1, .1], [1, (func_example(A, B, C) + out) + 3, 3]))
out2 = mnp.mrdivide(mnp.stack((out, out)), B)
out2 = out[A, B]
if ((C == 1) and (B > 3)) or (A == 4):
out2 = out2 | 1
elif C == 2:
pass
else:
for i in range(1, A):
continue
out2 = addone[out2(range(1, 2, 5)) / A, end] + (mnp.mldivide(out(i), out[:]))
break
return
return out, out2
def addone(A):
out = A + 1
return out
A = (A + 3) + 4j
disp(' a b " ')
disp('x')
disp(A[:, 4, range(2, end)])
clear('A')
out[4], _ = func_example(~(A[1, end]), (A[end - 1]).T, (A[range(3, end - 1)]).T) The results seems promising so I am now working on a new package to provide a wrapper for I indeed encounter problem when it comes to multiple Matlab files, especially the |
Another idea in my mind is to insert the original Matlab script into the generated python code as comment in corresponding position. I did notice the |
Let me answer the second comment first, as the first one will take time. Good news, there is, but it's the other way around. Kinda. miss_hit/miss_hit_core/m_ast.py Lines 174 to 176 in 0737748
To go the other way is not possible right now, but you could go through the token stream. Normally the token stream is discarded when the AST is made, but there is a way to preserve this (mh_style uses it to pretty print). You can use the same setup MH Style uses: miss_hit/miss_hit_core/mh_style.py Lines 1018 to 1022 in 0737748
miss_hit/miss_hit_core/mh_style.py Lines 1070 to 1080 in 0737748
Most AST nodes do remember the token that is important to them, e.g. t_op for the binary operator points to the token that is the miss_hit/miss_hit_core/m_ast.py Lines 2824 to 2840 in 0737748
In general if have some conventions (but you can see them enforced via asserts):
So, long story short. Right now this is kinda hard and messy; but you could create a MATLAB pretty printer for the expressions you need and use that. But, if you go through the effort of preserving the token stream then the good news is that you can pull out comments as well. And that might be super nice for tracability. |
I am not sure I understand everything in this question. Naked_Expression_Statement can occur at any point where you get a statement, script or no. For example:
The first and last line in the function is a naked_expression_statement. Secondly, script files should translate nearly into python files, so no issue there. Finally scripts using eval() are EVIL. You might want to just not do that. As far as I know, the MATLAB coder, Simulink coder, and Embedded Coder all do not support eval. I suggest you do the same :) |
Thanks for the detailed explanation. I will try to follow the way you described but that may come lower priority for now.
For the second question, let me explain with an example. Say we have two disp('this is a.m');
disp('there might be complex operation in a.m');
j = 0;
for i = 1:x
j = j+i;
end disp('this is b.m');
x = 5;
disp('the following is similar to Linux sourcing script')
a;
disp(['j:', num2str(j)]); When excuting BTW, I have separate the translating logic into a separate backend |
Great, it will make a lot of things much easier :)
I understand now, OK. So what I would suggest is that you will need to start keeping track of a little symbol table for yourself. The way I intend to deal with this eventually is that:
Again, I have plans to do this, but I just haven't had the energy to start with this mammoth task. You could first get a list of files that will be processed; and then when you deal with things like I would try to translate as a function and pass the variables as parameters; or otherwise have a python module generated called global_workspace that uses these variables there. This is going to be ugly. You could perhaps start by not supporting that construct, unless in the MATLAB code your targetting this is extremely common. |
I may be out of my depth here, but I notice that one or two problems you refer to above appear to have been solved by Victor Leikehman in Smop. (https://github.com/victorlei/smop) It might be worth looking at his code to see how he handled comments for instance. |
You may find the references in https://github.com/mbdevpl/transpyle/issues/7 helpful.
Python 3.6 with whole-line comments outside expressions is fully supported. Presence of end-of-line comments or comments in expressions might result in errors.
Currently, the Fortran unparser uses special attribute fortran_metadata attached to selected Python AST nodes, and therefore unparsing raw Python AST created directly from ordinary Python file might not work as expected. The above behaviour will change in the future. Bysiek's main interest is in HPC and he is using Fortran with Typing to achieve speed and Python to provide maintainability. |
Hi @RobBW , thanks for the info about Transpyle. I am aware of smop but somehow it failed to produce the correct result when my first trying. So I tend to produce my own version of Matlab to python project. And I think smop do not have some key features of Matlab like general I am working on a fully Matlab compatible package and will open source it after considering key components done. The key idea is to translate Matlab code as-is and run without heavy modifications. |
@RobBW - random aside - a quick superficial read of the lexer in smop indicates to me that language support is probably quite partial. Specifically I have doubts about the correctness of matrix lexing, and I also think command-form is not dealt with either. Also, classdef is not supported either: https://github.com/victorlei/smop/blob/bdad96b715d1dd75ce8ab4724f76b9b1bb1f61cd/smop/lexer.py#L139 There is tons of MATLAB parsers on github, I reviewed about 10 of them before I decided that I need to write my own; because none of them are complete. And once you get to something complete, the complications pile on and on :) As to comments and parsing; they way they have done it is through "comment statements". See a discussion here: #239 This is indeed one of the way I was thinking of dealing with comments, but it's not done yet. It adds further complications to the language definition, a complication I already encountered with the pragmas. |
Hi @florianschanda, I completed the first working prototype on translating Matlab to python. The project contains two main parts: one language translator mh_python under forking of miss_hit with the same AGPL license; a separate runtime engine mat2py under MIT license to provide Matlab accent in Python. There's still a lot of things to do but I am considering the collaboration patterns now. For now my fork is identical to miss_hit except two new scripts. I plan to deploy a web services for the new script |
@chaoqing sorry for the long radio silence I had some real life stuff to deal with. I am pleasantly surprised how far you managed to take this concept, without semantic analysis. I'll get a lot easier. The licensing seems fine. AGPL for the translator, whatever else you want for the run-time support. I am open to collaboration, but if we want to merge mh_python then I will need to do a very detailed review. I have fairly strict ideas as to when something is ready for release (a key one being it operates under any reasonable circumstance). I also need to think about copyright. I will either ask for an assignment (similar to what the FSF/gcc does) or I will create a 3rdparty directory. I honestly haven't thought this far ahead. |
Hi @florianschanda, yes I know merging my fork back to BTW, for anyone interested on the emulator, it now supports basic Matlab math operation and plot command. If a |
First thanks for this awesome codebase which save me a lot of efforts when I am trying to create a tool converting Matlab to Python3/Numpy code within this in-progress fork.
What kind of feature is this?
I noticed there is an ToDo item here. Determine the exact type of Reference to be Array_Index or Function_Call is a critical step when converting the Matlab to Numpy-like code.
Describe the solution you'd like
Just want to check the the progress in this item. I may implement my own version, most probably ugly, if no update from the upstream soon.
Thanks again for this fantastic repo! I really enjoy reading the well organized methods and code. And of cause I will consider do a pull request after my fork is ready.
The text was updated successfully, but these errors were encountered: