Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch. 13 - Unnecessary complexity in parsing and serializing witness data in class Tx #288

Open
salmonberry7 opened this issue Nov 18, 2024 · 0 comments

Comments

@salmonberry7
Copy link

salmonberry7 commented Nov 18, 2024

In the Tx class method parse_segwit, following the BIP141 spec, the code :

for tx_in in inputs:
	num_items = read_varint(s)
	items = []
	for _ in range(num_items):
		item_len = read_varint(s)
		if item_len == 0:
			items.append(0)
		else:
			items.append(s.read(item_len))
	tx_in.witness = items

can be simplified to :

for tx_in in inputs:
	# read the witness field for input tx_in
	num_items = read_varint(s)
	items = []
	# each element in 'items' will be a byte sequence, possibly of length zero
	for _ in range(num_items):
		items.append(s.read(read_varint(s)))
	# 'items' itself may have length zero
	tx_in.witness = items

so the witness is just a list of byte sequences. There is no need for some elements in this list to be the integer zero, instead an empty byte sequence (as returned by s.read(0)) can be used.

The Tx class method serialize_segwit, again following BIP141 spec, would then be simplified correspondingly from :

for tx_in in self.tx_ins:
	result += int_to_little_endian(len(tx_in.witness), 1)
	for item in tx_in.witness:
		if type(item) == int:
			result += int_to_little_endian(item, 1)
		else:
			result += encode_varint(len(item)) + item

to :

for tx_in in self.tx_ins:
	result += encode_varint(len(tx_in.witness))
	for item in tx_in.witness:
		result += encode_varint(len(item)) + item

It is not necessary for certain items within the witness to have type integer and others to have type byte sequence and encode_varint should be used throughout rather than int_to_little_endian, in accordance with BIP141 spec.

If the above change was made it would mean that in Script.evaluate when processing the cases P2WPKH and P2WSH, when we extend the cmds list with the witness list (eg. via cmds.extend(witness)) we would add an empty list instead of an integer 0. However the latter two both produce the same effect when executed by Script.evaluate, namely the effect of running OP_0. Currently Script.parse parses an OP_0 by placing an integer 0 in cmds. By changing >= 1 to >= 0 in this line it could be changed to place an empty list instead. In Script.raw_serialize an integer 0 and an empty list both serialize to a single 0 byte. Thus the above code simplifications should not adversely affect the Script class.

Another reason for the above change is that in Script.parse, a Script command is stored in the cmds list as either (1) a list, in the case when a push data type Script operation has been encountered in the byte stream, with the data then stored in the list, or (2) an integer, when any other type of Script operation has been encountered in the byte stream, with the opcode then stored in the integer. But the current code treats OP_0 as type (2), however since OP_0 pushes an empty list onto the stack it is a push data type operation so it would be more consistent to treat it as type (1).

Though it looks like the above change should not present a problem, the entire program would have to be checked to ensure there are no potential undesired side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant