Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARC auto-sync architecture #2570

Open
wants to merge 28 commits into
base: next
Choose a base branch
from

Conversation

R33v0LT
Copy link
Contributor

@R33v0LT R33v0LT commented Dec 5, 2024

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added the necessary ARC files for the disassembler to work properly
  • I've added ARC to python bindings
  • I've added ARC to auto-sync scripts
  • Dev fuzzing is done without errors
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This PR adds support for the ARC architecture in capstone. It is worth noting that this architecture is experimental in lvm, that is, full support for all instructions is not fully implemented. Also, MC tests in llvm (like llvm_root/llvm/test/MC/Disassembler/ARC/ldst.txt ) have a format different from other architectures, which is why tests auto-generated by MCUpdater.py does not add instruction bytes to yaml. I think this format will change when ARC is fully supported
To successfully pass the autosync test, a merge of the following PR is required: capstone-engine/llvm-capstone#69

Test plan

There are no differences in the way capstone is built with the ARC architecture
MC tests were generated by MCUpdater. Only instruction bytes are manually added. All other tests are added manually
At the moment, all tests are passing successfully, there are CI/CD results to confirm

Closing issues

None

@R33v0LT R33v0LT marked this pull request as draft December 5, 2024 14:47
@R33v0LT
Copy link
Contributor Author

R33v0LT commented Dec 11, 2024

@Rot127, fuzzing, as I expected, shows that those instructions that are not yet supported for ARC in llvm lead to incorrect decoding and program crash. llvm-mc from the official github also fails with an error. Should we do something now or ignore it until llvm updates?

@Rot127
Copy link
Collaborator

Rot127 commented Dec 11, 2024

You can fix it in our fork of LLVM. And open an issue in LLVM with a reference to the fix commit. They can cherry-pick it. If you have the time, you can also open a PR with the fix yourself.

@R33v0LT
Copy link
Contributor Author

R33v0LT commented Dec 11, 2024

The problem is solved now. The DecodeGPR32RegisterClass return value was not checked (in llvm disassembler also)
I must reran fuzzing again to check if all is good now. If so, I'll marked this PR as ready for review

@R33v0LT R33v0LT marked this pull request as ready for review December 13, 2024 09:44
arch/ARC/ARCDisassembler.c Outdated Show resolved Hide resolved
@Rot127
Copy link
Collaborator

Rot127 commented Dec 17, 2024

Do you consider this done? Then I would review it this week.

@R33v0LT R33v0LT closed this Dec 17, 2024
@R33v0LT R33v0LT reopened this Dec 17, 2024
@R33v0LT
Copy link
Contributor Author

R33v0LT commented Dec 17, 2024

Do you consider this done? Then I would review it this week.

Yes. I think it is done now

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continue later (at arc.h). But so far looks very good!

Result = decodeInstruction(DecoderTable64, Instr,
Insn64, Address, NULL);
if (MCDisassembler_Success == Result) {
;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you an issue please that we should also covert the LLVM_DEBUG stuff?
It seems to become more and more common to use.
No need to fix it here, but it nice to track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM_DEBUG is already present in the patches

arch/ARC/ARCGenCSFeatureName.inc Outdated Show resolved Hide resolved
MI->MRI = MRI;

ARC_LLVM_printInst(MI, MI->address, "", O);
// ARC_add_cs_groups(MI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ARC_add_cs_groups(MI);


void ARC_init_cs_detail(MCInst *MI)
{
if (detail_is_set(MI)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert condition and return early please.

static inline void add_cs_detail(MCInst *MI,
int /* arc_op_group */ op_group, ...)
{
if (!MI->flat_insn->detail)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!MI->flat_insn->detail)
if (!details_is_set(MI))

cstool/cstool.c Outdated Show resolved Hide resolved
ARC_INS_ST_DI_AW = 195
ARC_INS_ST_DI = 196
ARC_INS_ST = 197
ARC_INS_SUB1_ = 198
Copy link
Collaborator

@Rot127 Rot127 Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARC_INS_SUB1_ = 198
ARC_INS_SUB1_ = 198

These instruction ids ending in _ look fishy. I suspect they end in a . or with a {. But they for sure end with a non alphabetical character. Can you check if this is expected? For some archs a . actually belongs to the mnemonic (PPC). For some it doesn't (ARM).

If the mnemonic should not end in a special character, you can check the getNormalizedMnemonic function in the PrinterCapstone. You can remove special chars there.

Copy link
Contributor Author

@R33v0LT R33v0LT Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works like that because of instructions like sub1.$cc. I can replace last dot if the mnemonic cointains it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please remove it.

_fields_ = (
('type', ctypes.c_uint),
('value', ARCOpValue),
('access', ctypes.c_uint8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('access', ctypes.c_uint8)
('access', ctypes.c_int)

It was an enum in arc.h. Please either make it an uint8 or change it here to an int


class ARCOp(ctypes.Structure):
_fields_ = (
('type', ctypes.c_uint),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('type', ctypes.c_uint),
('type', ctypes.c_int),

enums are ints if I am not mistaken.

_fields_ = (
('op_count', ctypes.c_uint8),
('operands', ARCOp * 8),
('update_flags', ctypes.c_bool),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('update_flags', ctypes.c_bool),

Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a very straight forward review! Excellent job, well done!

Please add ARC in cs_v6_release_guide.md und New features.
Simple two liner is enough. Just please mention about the memory operands.

Otherwise, please address the other small requests.

Regarding the not yet implemented memory operand type: Do you want to add them? It complicates the details a bit. But on the other hand there seems only the MemOperandRI type. So not too much work I think? Or was there another problem?

Nonetheless, we can merge it like this. But we should have them before v6 Gold.
cc @XVilka @kabeor

@@ -36,8 +38,10 @@
"LoongArch": "LoongArch",
"SystemZ": "SystemZ",
"Mips": "Mips",
"ARC": "ARC",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ARC": "ARC",

Comment on lines 442 to 444
# from autosync.cpptranslator.patches.Helper import get_text
# if patch.get_main_capture_name() == "template_def":
# print(get_text(self.src, q[0].start_byte, q[0].end_byte).decode())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# from autosync.cpptranslator.patches.Helper import get_text
# if patch.get_main_capture_name() == "template_def":
# print(get_text(self.src, q[0].start_byte, q[0].end_byte).decode())

Comment on lines 436 to 437
# if "template_declaration" in pattern:
# print(self.tree.root_node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# if "template_declaration" in pattern:
# print(self.tree.root_node)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for this new Patch in test_patches.py.

Comment on lines +50 to +54
if len(param_list_caller.named_children) == 1:
# If function just return fieldFromInstruction(...)
inst_param: Node = param_list_caller.named_children[0]
else:
inst_param = param_list_caller.named_children[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a test for this edge case.

@@ -0,0 +1,60 @@
// Copyright © 2024 Rot127 <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your copyright

@@ -0,0 +1,99 @@
// Copyright © 2024 Rot127 <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also your copyright please

@@ -134,6 +135,8 @@ def test_file(fname):
("CS_ARCH_ALPHA", "CS_MODE_BIG_ENDIAN"): 56,
("CS_ARCH_HPPA", "CS_MODE_HPPA_11+CS_MODE_BIG_ENDIAN"): 57,
("CS_ARCH_HPPA", "CS_MODE_HPPA_20+CS_MODE_BIG_ENDIAN"): 58,
("CS_ARCH_ARC", "CS_MODE_LITTLE_ENDIAN"): 59,
("CS_ARCH_ARC", "CS_MODE_BIG_ENDIAN"): 60,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("CS_ARCH_ARC", "CS_MODE_BIG_ENDIAN"): 60,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests cover all the possible OP_GROUPS? If no, please add one test for each op group.
Otherwise, all good.

Also, please add tests which checks the implicit reads, writes and groups.

@R33v0LT R33v0LT requested a review from Rot127 December 25, 2024 12:55
Copy link
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Almost done. Just those few nitpicks.

Also, please mark the suggestions as resolved when done.

switch (op_group) {
default:
printf("ERROR: Operand group %d not handled!\n", op_group);
assert(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(0);
CS_ASSERT_RET(0);

ARC_INS_ADD_,
ARC_INS_ADD_F,
ARC_INS_ADD,
ARC_INS_AND_,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget about those ones.

class BadConditionCode(Patch):
"""
Patch return BadConditionCode
to assert(0 && "Unknown condition code passed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to assert(0 && "Unknown condition code passed")
to CS_ASSERT(0 && "Unknown condition code passed")

@@ -0,0 +1,32 @@
# Copyright © 2024 Dmitry Sibitsev <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your email please :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants