Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration bug on qnfs database key lookup - ERR: invalid literal for int() with base 16: b'EA59RST48MLqNoh_jNxHysEXFL4PeQlj2mII0TVEdz64' #863

Open
kentbull opened this issue Sep 16, 2024 · 2 comments
Labels
bug Something isn't working triage

Comments

@kentbull
Copy link
Contributor

kentbull commented Sep 16, 2024

Version

1.2.0rc2

Environment

Mac OS, Python 3.12.3, local deployment

Expected behavior

Starting up kli witness demo after performing kli migrate run for each witness should bring up the demo witness set.

Actual behavior

The following error occurs:

$ kli witness demo
Witness wan : BBilc4-L3tFUnfM_wJr4S4OJanAv_VmF_dJNN6vkf2Ha
ERR: invalid literal for int() with base 16: b'EA59RST48MLqNoh_jNxHysEXFL4PeQlj2mII0TVEdz64'

The bug is that the splitOnKey in LMDBer.getTopIoSetItemIter is using splitOnKey which expects the second part of the tuple to be a base16 serial number yet the key that is put in there is a dgKey, or a (pre, said) tuple.

class LMDBer(filing.Filer):
    ...
    def getTopIoSetItemIter(self, db, top=b'', *, sep=b'.'):
        ...
        for iokey, val in self.getTopItemIter(db=db, top=top):
            key, ion = splitOnKey(iokey, sep=sep)
            yield (key, val)

def splitOnKey(key, *, sep=b'.'):
    ...
    if isinstance(key, memoryview):
        key = bytes(key)
    top, on = splitKey(key, sep=sep)
    on = int(on, 16)
    return (top, on)

class Kevery:
    ...
        def processQueryNotFound(self):
        ...

        key = ekey = b''  # both start same. when not same means escrows found
        pre = b''
        sn = 0
        while True:  # break when done
            for (pre, said), edig in self.db.qnfs.getItemIter(keys=key):

Steps to reproduce

  1. Check out keripy repo and change to the keripy directory
  2. git switch v1.1.18
  3. python3 -m pip install -e ./
  4. kli witness demo
  5. `cd scripts/demo && source demo-scripts.sh
  6. Modify the ./scripts/demo/basic/delegate.sh to exit prior to the call on line 12 to kli delegate confirm --name delegator --alias delegator -Y & which will cause a query not found message to remain in that escrow, the hab.db.qnfs database.
  7. Run the script in `./scripts/demo/basic/delegate.sh
  8. git switch v1.2.0rc2
  9. python3 -m pip install -e ./
  10. Upgrade all of the witnesses:
  • One liner to migrate all six demo witnesses:
    • list=(wan wes wil wit wub wyz); printf '%s\n' "${list[@]}" | xargs -I {} kli migrate run --name {}
  1. kli witness demo
    then you will get a value error like:
ValueError: invalid literal for int() with base 16: b'EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt'

This will stop your witnesses from starting.

@kentbull kentbull added bug Something isn't working triage labels Sep 16, 2024
@kentbull
Copy link
Contributor Author

The modified delegate.sh script (exit before delegation confirmation) was key to reproducing this bug. Yet it may not be that important of a bug. It only occurs if you happen to have a message in the hab.db.qnfs database prior to upgrading from 1.1.18 to 1.2.0rc2. This occurs because in 1.1.18 the blank serial number (all zeroes) is not appended to the key used to look up an entry from the .qnfs database - the key looks something like this (note the periods) which fails the type conversion in splitOnKey:

EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt

Whereas in v1.2.0rc2 there is a set of zeros appended on the end that allows the string to pass the type conversion in splitOnKey:

EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.EBMbr7Z-pd4KJwzxuptSmCYqxrBnE2xKVO-MnjYkeUrt.00000000

While this does prevent a witness from starting and thereby could be a major problem for someone there is also the possibility of updating the migration script to add the number suffix on any entry in the .qnfs database.

@kentbull
Copy link
Contributor Author

From our spec call today we discussed that it makes the most sense to not include escrows during an upgrade.

(Phil) Specifically to clear out escrow databases at the beginning of a migration

(Kevin) It needs to be updated, the current escrow "clearing" is insufficent

            hby.db.gpwe.trim()
            hby.db.gdee.trim()
            hby.db.dpwe.trim()
            hby.db.gpse.trim()
            hby.db.epse.trim()
            hby.db.dune.trim()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant