Skip to content

Commit

Permalink
Improve readme
Browse files Browse the repository at this point in the history
  • Loading branch information
x0r committed Apr 7, 2024
1 parent 5f2b1b6 commit 71bcb1e
Showing 1 changed file with 69 additions and 71 deletions.
140 changes: 69 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,84 +12,14 @@ It is a [Source-to-source translator](https://en.wikipedia.org/wiki/Source-to-so

(There is no working decompiler for 16 bit DOS code yet, because of DOS segmentation model, etc)

Examples:
Example results:

[![Test drive
3](http://img.youtube.com/vi/MzK9RVgeWGM/0.jpg)](http://www.youtube.com/watch?v=MzK9RVgeWGM)

[![Cryo
Dune](http://img.youtube.com/vi/f-HArAmtXTc/0.jpg)](http://www.youtube.com/watch?v=f-HArAmtXTc)

**Prerequisites:**
* **Python 3.9 or later:** Ensure you have Python installed on your system.
* **Assembly Source Code:** You'll need the assembly code you want to translate. Masm2c supports MASM 6 syntax and IDA Pro's .lst files. (your code should be compilable with uasm(jwasm)/masm6, link5/tlink and work under DOS.)

**Installation:**

**Install Dependencies:** Use pip to install the required libraries:

```
pip install -r requirements.txt
```
This will install lark, jsonpickle, and other necessary packages.

**Usage:**

1. **Basic Translation:** To translate an assembly file (e.g., game.asm) to C, use the masm2c.py script:

```
python masm2c.py game.asm
```
This generates C++ source files (e.g., game.cpp and game.h) and a .seg file containing segment information.

2. **Merging Procedures:**

* Masm2c provides different options for handling procedure merging, it helps to get working code in case there are jumps between procedures.
* It controlled by the -m or --mergeprocs flag:
* separate (default): Procedures are kept separate, and jumps between them use a global dispatch function.
* persegment: Procedures within the same segment are merged.
* single: All procedures are merged into a single function.


3. **Specifying Load Segment:**

* It helps to temporary emulate a specific code memory location to speedup conversion (e.g., DOSBox loading an .exe at 0x1a2), use the -lo or --loadsegment flag to specify the segment:

```
python masm2c.py -lo 0x1a2 game.asm
```
* For .com files loaded by DOSBox, use the -AT flag.
4. **Generating Globals Listing:**
* To generate a file with a listing of all global variables, labels, and procedures, use the -FL or --list flag:
```
python masm2c.py -FL game.asm
```
* This creates a file named game.list containing the information.
**Output Files:**
* **C++ Source Files (.cpp and .h):** These files contain the translated C code equivalent to your assembly source.
* **Segment File (.seg):** This file stores information about the segments in your assembly code. It can be used for merging data segments from multiple input files.
**Tips:**
* For better disassembly and translation, consider using tools like libDOSBox to collect runtime information (e.g., segment register values, memory access patterns).
* Masm2c scripts can help convert libDOSBox traces into annotations for disassemblers like IDA Pro.
The translation flow:
[![Diagram](http://www.plantuml.com/plantuml/png/NSwnRiCW40RWdQSuUJTHd3I3XogLkdHgto02SuceWBCuND6txpb97IiR-hyV-8zSJ2vJ36gWE5B2LA3vpFxYamcmFO3r1JHMRC0maC09AwxB7-zly9NfwjwP5KN3iHjMGV3M4LkgAb51i5GAnHwIAVu7OI276unJC0KTk2nPvjLjh3Z_qUowpM7_sANK_ofeN-S5qCDMGo3ZVBgeEP3yjaMeqw3bhEv1cmMNNU8xyM4S5tVYM57avIwFTXlQvUaUzXfoEVbq9ltDb9vwjstNblFCXXcZ3RzmzXLP7J6vAOO_)](http://www.plantuml.com/plantuml/png/NSwnRiCW40RWdQSuUJTHd3I3XogLkdHgto02SuceWBCuND6txpb97IiR-hyV-8zSJ2vJ36gWE5B2LA3vpFxYamcmFO3r1JHMRC0maC09AwxB7-zly9NfwjwP5KN3iHjMGV3M4LkgAb51i5GAnHwIAVu7OI276unJC0KTk2nPvjLjh3Z_qUowpM7_sANK_ofeN-S5qCDMGo3ZVBgeEP3yjaMeqw3bhEv1cmMNNU8xyM4S5tVYM57avIwFTXlQvUaUzXfoEVbq9ltDb9vwjstNblFCXXcZ3RzmzXLP7J6vAOO_)
It can make IDA Pro output assembler listing to be recompilable by
instrumented executing on emulator and compare each instruciton with
emulated [libdosbox](https://github.com/xor2003/libdosbox).
The following assembler example:

```assembler
Expand Down Expand Up @@ -137,6 +67,14 @@ struct Memory m = {
{'H','e','l','l','o',' ','W','o','r','l','d','!','\n','\r','$'}, // _msg
...
```
The translation flow:
[![Diagram](http://www.plantuml.com/plantuml/png/NSwnRiCW40RWdQSuUJTHd3I3XogLkdHgto02SuceWBCuND6txpb97IiR-hyV-8zSJ2vJ36gWE5B2LA3vpFxYamcmFO3r1JHMRC0maC09AwxB7-zly9NfwjwP5KN3iHjMGV3M4LkgAb51i5GAnHwIAVu7OI276unJC0KTk2nPvjLjh3Z_qUowpM7_sANK_ofeN-S5qCDMGo3ZVBgeEP3yjaMeqw3bhEv1cmMNNU8xyM4S5tVYM57avIwFTXlQvUaUzXfoEVbq9ltDb9vwjstNblFCXXcZ3RzmzXLP7J6vAOO_)](http://www.plantuml.com/plantuml/png/NSwnRiCW40RWdQSuUJTHd3I3XogLkdHgto02SuceWBCuND6txpb97IiR-hyV-8zSJ2vJ36gWE5B2LA3vpFxYamcmFO3r1JHMRC0maC09AwxB7-zly9NfwjwP5KN3iHjMGV3M4LkgAb51i5GAnHwIAVu7OI276unJC0KTk2nPvjLjh3Z_qUowpM7_sANK_ofeN-S5qCDMGo3ZVBgeEP3yjaMeqw3bhEv1cmMNNU8xyM4S5tVYM57avIwFTXlQvUaUzXfoEVbq9ltDb9vwjstNblFCXXcZ3RzmzXLP7J6vAOO_)
It can make IDA Pro output assembler listing to be recompilable by
instrumented executing on emulator and compare each instruciton with emulated [libdosbox](https://github.com/xor2003/libdosbox).
Features:
- 386 instructions (except FPU) are supported (well tested with QEMU
Expand All @@ -152,6 +90,66 @@ Features:
sources can be converted individually and linked together using
modern linker
**Prerequisites:**
* **Python 3.9 or later:** Ensure you have Python installed on your system.
* **Assembly Source Code:** You'll need the assembly code you want to translate. Masm2c supports MASM 6 syntax and IDA Pro's .lst files. (your code should be compilable with uasm(jwasm)/masm6, link5/tlink and work under DOS.)
**Installation:**
**Install Dependencies:** Use pip to install the required libraries:
```
pip install -r requirements.txt
```
This will install lark, jsonpickle, and other necessary packages.
**Usage:**
1. **Basic Translation:** To translate an assembly file (e.g., game.asm) to C, use the masm2c.py script:
```
python masm2c.py game.asm
```
This generates C++ source files (e.g., game.cpp and game.h) and a .seg file containing segment information.
2. **Merging Procedures:**
* Masm2c provides different options for handling procedure merging, it helps to get working code in case there are jumps between procedures.
* It controlled by the -m or --mergeprocs flag:
* separate (default): Procedures are kept separate, and jumps between them use a global dispatch function.
* persegment: Procedures within the same segment are merged.
* single: All procedures are merged into a single function.
3. **Specifying Load Segment:**
* It helps to temporary emulate a specific code memory location to speedup conversion (e.g., DOSBox loading an .exe at 0x1a2), use the -lo or --loadsegment flag to specify the segment:
```
python masm2c.py -lo 0x1a2 game.asm
```
* For .com files loaded by DOSBox, use the -AT flag.
4. **Generating Globals Listing:**
* To generate a file with a listing of all global variables, labels, and procedures, use the -FL or --list flag:
```
python masm2c.py -FL game.asm
```
* This creates a file named game.list containing the information.
**Output Files:**
* **C++ Source Files (.cpp and .h):** These files contain the translated C code equivalent to your assembly source.
* **Segment File (.seg):** This file stores information about the segments in your assembly code. It can be used for merging data segments from multiple input files.
**Tips:**
* For better disassembly and translation, consider using tools like libDOSBox to collect runtime information (e.g., segment register values, memory access patterns).
* Masm2c scripts can help convert libDOSBox traces into annotations for disassemblers like IDA Pro.
(3rd-party code used from: ASM2C (x86 instruction emulation),
tasm-recover (from SCUMMVM project; highly modified), QEMU x86
instructions test suit, FreeDOS memory manager.)
Expand Down

0 comments on commit 71bcb1e

Please sign in to comment.