This project attempts to extract the code from so called Universal Hex files generated by the PXT-based Microsoft MakeCode IDE for micro:bit (web IDE). PXT uses a technique called source embedding in order to add the code as (possibly compressed) text into the 0x0D
("Custom Data") records of an Intel HEX file.
The code extractor itself is realized as Python 3 script.
This project is not supporting the extraction of Python code from the BBC micro:bit. To do so, the uBitTool. As soon as the PXT code extractor is in a sufficiently working state, it may be added to the uBitTool - feel free to create a pull request.
- Clone this git repository
- Make sure the Python module dependencies are met:
pip3 install intelhex argparse lzma
- Run the script from a Python 3 environment (should be runnable under Windows, Linux and MacOS):
usage: extract.py [-h] [file]
extract.py
positional arguments:
file path to bbc micro:bit HEX input file
options:
-h, --help show this help message and exit
Warning The Python script will automatically create an output folder named after the input file (without extension).
The following files are created by the tool and contain data from intermediate extraction steps:
- _code_header.json
- _lzma_compressed_text.bin
- _packed_code.txt
Example usage and output files:
$ python extract.py sound-device.hex
Input file w/o extension: sound-device
Output folder: /Users/matthias/local_repos/microbit-pxt-code-extractor/sound-device
-------------------------------------------------------------------------
Embedded source dump:
0000 41 14 0E 2F B8 2F A2 BB 9D 00 40 10 00 00 00 00 |A.././....@.....|
0010 7B 22 63 6F 6D 70 72 65 73 73 69 6F 6E 22 3A 22 |{"compression":"|
0020 4C 5A 4D 41 22 2C 22 68 65 61 64 65 72 53 69 7A |LZMA","headerSiz|
0030 65 22 3A 32 39 34 2C 22 74 65 78 74 53 69 7A 65 |e":294,"textSize|
0040 22 3A 31 37 35 31 30 2C 22 6E 61 6D 65 22 3A 22 |":17510,"name":"|
0050 73 6F 75 6E 64 2D 64 65 76 69 63 65 22 2C 22 65 |sound-device","e|
0060 55 52 4C 22 3A 22 68 74 74 70 73 3A 2F 2F 6D 61 |URL":"https://ma|
0070 6B 65 63 6F 64 65 2E 6D 69 63 72 6F 62 69 74 2E |kecode.microbit.|
0080 6F 72 67 2F 22 2C 22 65 56 45 52 22 3A 22 35 2E |org/","eVER":"5.|
0090 30 2E 31 32 22 2C 22 70 78 74 54 61 72 67 65 74 |0.12","pxtTarget|
00A0 22 3A 22 6D 69 63 72 6F 62 69 74 22 7D 5D 00 00 |":"microbit"}]..|
00B0 80 00 95 45 00 00 00 00 00 00 00 3D 88 89 C6 54 |...E.......=...T|
00C0 36 C3 17 4F E4 F9 EC 0D 07 A9 22 3E D4 1C 7C B5 |6..O......">..|.|
00D0 AF A5 88 58 62 DF 18 4A B0 53 1D A2 B3 BA 13 -- |...Xb..J.S..... |
...
-------------------------------------------------------------------------
JSON header length: 0x9D00 (157)
Text length: 0x40100000 (4160)
Reserved: 0x0000
-------------------------------------------------------------------------
Embedded JSON header (pretty-printed):
{
"compression": "LZMA",
"headerSize": 294,
"textSize": 17510,
"name": "sound-device",
"eURL": "https://makecode.microbit.org/",
"eVER": "5.0.12",
"pxtTarget": "microbit"
}
Header size: 294
Text size: 17510
-------------------------------------------------------------------------
Text meta data:
Length of text before truncation: 4163
Length of text after truncation: 4160
Text is LZMA-compressed.
Writing LZMA compressed output text...
Decompressing LZMA text...
Writing packed code...
-------------------------------------------------------------------------
Code header dump (pretty-printed)
{
"name": "sound-device",
"comment": "",
"status": "unpublished",
"cloudId": "pxt/microbit",
"editor": "blocksprj",
"targetVersions": {
"branch": "v5.0.12",
"tag": "v5.0.12",
"commits": "https://github.com/microsoft/pxt-microbit/commits/97491d6832cccab6b5bdc05b58e4c6b5dcc18cdd",
"target": "5.0.12",
"pxt": "8.0.7"
}
}
Writing code header JSON file...
-------------------------------------------------------------------------
Code payload analysis (pretty-printed)
Length: 17519
Files: ['README.md', 'main.blocks', 'main.ts', 'pxt.json', 'test.ts']
Writing file 'README.md'...
Writing file 'main.blocks'...
Writing file 'main.ts'...
Writing file 'pxt.json'...
Writing file 'test.ts'...
And some details for the example:
$ cd sound-device
matthias@maehcbook sound-device % ll
total 128
drwxr-xr-x 10 matthias staff 320 31 Aug 22:22 .
drwxr-xr-x 10 matthias staff 320 31 Aug 22:22 ..
-rw-r--r-- 1 matthias staff 1433 31 Aug 22:22 README.md
-rw-r--r-- 1 matthias staff 314 31 Aug 22:22 _code_header.json
-rw-r--r-- 1 matthias staff 4160 31 Aug 22:22 _lzma_compressed_text.bin
-rw-r--r-- 1 matthias staff 17813 31 Aug 22:22 _packed_code.txt
-rw-r--r-- 1 matthias staff 13402 31 Aug 22:22 main.blocks
-rw-r--r-- 1 matthias staff 1991 31 Aug 22:22 main.ts
-rw-r--r-- 1 matthias staff 537 31 Aug 22:22 pxt.json
-rw-r--r-- 1 matthias staff 129 31 Aug 22:22 test.ts
Feel free to make any changes and support this project. ;)