Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

Use the new dag api and use ipld links for both heads and next pointers #213

Merged
merged 3 commits into from
Jan 14, 2019

Conversation

satazor
Copy link
Contributor

@satazor satazor commented Dec 28, 2018

Description (Required)

Use the new dag api and use ipld links for both heads and next pointers

The object api is deprecated and might be removed in future releases.
Additionally, we may now use https://explore.ipld.io/ to explore the log & entries!

With this change, we are now dealing with CIDs instead of multihashes. Therefore, this PR introduced fromCID and toCID. The fromMultihash and toMultihash became deprecated.
Moreover, all occurrences of hash were replaced to cid in both source code and tests.

Other changes (e.g. bug fixes, UI tweaks, refactors)

Updated IPFS to latest version as the current installed one has breaking changes to the dag api.
This way, we are already using the new dag api.

TODO

  • Add unit test
  • Add documentation

Closes #200, #106

@satazor
Copy link
Contributor Author

satazor commented Dec 28, 2018

Do not merge yet, I would like to make two small changes. Will do them as soon as I arrive home.

@aphelionz
Copy link
Contributor

Just saw this and haven't checked myself but is the performance issue mitigated? If so, what was it?

@shamb0t
Copy link
Contributor

shamb0t commented Dec 28, 2018

@aphelionz the perceived performance hit was due to monkey-patching here (using memstore to avoid hitting disk) and the new code wasn't using memstore, i.e. hitting disk, hence the slowdown

@satazor
Copy link
Contributor Author

satazor commented Dec 28, 2018

I've made the changes. On my end, this is good to go but please test it and inspect both a log and entry hashes in https://explore.ipld.io/ to see how it looks like!

Copy link
Contributor

@aphelionz aphelionz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment to address backwards compatibility, and a quick request to move a function to a utils file.

src/entry.js Outdated Show resolved Hide resolved
src/entry.js Outdated Show resolved Hide resolved
test/utils/mem-store.js Outdated Show resolved Hide resolved
API.md Outdated Show resolved Hide resolved
@aphelionz
Copy link
Contributor

Finally, run make rebuild and push all built files and es5 files when you make the last commit :)

@RichardLitt
Copy link
Contributor

Finally, run make rebuild and push all built files and es5 files when you make the last commit :)

Should we add this to the Contributing file?

@aphelionz
Copy link
Contributor

@RichardLitt Yes, definitely. Thanks for catching that.

src/utils/from-ipld-node.js Outdated Show resolved Hide resolved
@satazor satazor force-pushed the ipld branch 4 times, most recently from 2a9ae2f to a52c141 Compare January 3, 2019 17:49
@satazor
Copy link
Contributor Author

satazor commented Jan 5, 2019

So I took the journey of adding toCID and fromCID as well as replace all hash occurrences with cid. Everything is working as expected and added a few tests to cover the v0 compatibility.

In order to do this properly I had to dig into the code and I have detected the following:

  • A lot of JSDoc declarations were wrong or outdated; I fixed the wrong ones and updated the outdated ones.
  • log.heads return cids, log.tails return entries and log.tailCids (old log.tailHashes) return cids. The most correct would be log.heads and log.tails return cids and log.headEntries and log.tailEntries to return entries.
  • I noticed that the exclude argument used in several places were meant to be an Array<string,Entry> but in fact the code only handles Array<Entry>. Moreover, there's either a bug or a wrong test regarding this: Log.fromEntry passes the exclude as cids to LogIO.fetchAll which in turn calls LogIO.fetchAll. If you console.log the cache after parsing the excluded argument, we get { undefined: <cid> }. I think the code is wrong.. I tried to fix it by just passing the entries but some related tests started to failing.
  • The API.md and README.md are outdated. I haven't updated those because I think we should merge Add new args to examples and docs #177 first.
  • The examples are not working and seem outdated. Again, I haven't updated those because I think we should merge Add new args to examples and docs #177 first.
  • I've renamed Log.fromEntryHash to Log.fromEntryCid. Should I restore Log.fromEntryHash and deprecate it? I've removed it but I'm not sure if it's part of the public API or not. It's mentioned in the README.md but not in the API.md. My opinion is that Log.fromEntryCid makes more sense than Log.fromEntry in terms of public API, which leads to the next point.
  • Currently, the Log constructor and the Log.fromEntry functions are not backwards compatible in case you pass a v0 entry object. We need to transform the entries to their latest version in order to make it so (hence the migration pipeline, but we may hack it for now by just changing hash to cid). Though, I think that the entry objects are an implementation detail that should be hidden from consumers of the Log instance. Therefore, I strongly believe that we should not return entries from within most functions of the log but instead rely on their cids. Anyway, this would require a lot of changes so it's out of question, but let me know what you think about the backwards compatibility here.

Please advise regarding the points I mentioned above.


Please review the PR to see how it looks like. @shamb0t it would be great if you could test it locally, please :D.

@satazor satazor force-pushed the ipld branch 2 times, most recently from 6f9c52c to 0091067 Compare January 5, 2019 04:46
src/entry.js Show resolved Hide resolved
src/entry.js Show resolved Hide resolved
src/log-io.js Show resolved Hide resolved
src/log-io.js Show resolved Hide resolved
@shamb0t
Copy link
Contributor

shamb0t commented Jan 7, 2019

Thanks @satazor will take a look at this today!

@shamb0t
Copy link
Contributor

shamb0t commented Jan 7, 2019

Thanks for this @satazor! I honestly feel while this is the right direction, further changes are required for the hash/cid replacements and is perhaps out of scope for this PR.

  • log.heads return cids, log.tails return entries and log.tailCids (old log.tailHashes) return cids. The most correct would be log.heads and log.tails return cids and log.headEntries and log.tailEntries to return entries.

This sounds reasonable to me.

  • I noticed that the exclude argument used in several places were meant to be an Array<string,Entry> but in fact the code only handles Array<Entry>. Moreover, there's either a bug or a wrong test regarding this: Log.fromEntry passes the exclude as cids to LogIO.fetchAll which in turn calls LogIO.fetchAll. If you console.log the cache after parsing the excluded argument, we get { undefined: <cid> }. I think the code is wrong.. I tried to fix it by just passing the entries but some related tests started to failing.

This indeed looks like an issue with transforming the exclude to exclude<entry.hashes>, cache[e.cid] = e gives { 'undefined': somehash} bc e is a hash. Looks to me like the test is wrong but the test description is strange ("included excluded entries?") I suspect the test should expect to not include that entry, @haadcode can you confirm? Okay the code is indeed wrong, will make a PR to fix

Yes agreed we can leave the documentation until merging that PR 👍

  • I've renamed Log.fromEntryHash to Log.fromEntryCid. Should I restore Log.fromEntryHash and deprecate it? I've removed it but I'm not sure if it's part of the public API or not. It's mentioned in the README.md but not in the API.md. My opinion is that Log.fromEntryCid makes more sense than Log.fromEntry in terms of public API, which leads to the next point.
  • Currently, the Log constructor and the Log.fromEntry functions are not backwards compatible in case you pass a v0 entry object. We need to transform the entries to their latest version in order to make it so (hence the migration pipeline, but we may hack it for now by just changing hash to cid). Though, I think that the entry objects are an implementation detail that should be hidden from consumers of the Log instance. Therefore, I strongly believe that we should not return entries from within most functions of the log but instead rely on their cids. Anyway, this would require a lot of changes so it's out of question, but let me know what you think about the backwards compatibility here.

This does indeed break backwards compatibility and also requires changing occurrences of hash in the rest of the orbit-db codebase. Also Log.fromEntryHash/Cid is used in orbit-db-store and should be returned. I agree that the implementation details should be hidden. I think this should be out of the scope of this PR and imo would be simpler and preferable to either forego consistency bw to/fromMultihash or hack toMultihash to give the correct cid by checking the entry version and using dag-pb or dag-cbor according to that. I do think we should move in that direction but will require more thought and should be a separate PR imo. What do you think?

Edit: Or how about keeping to/fromCID as I do think its preferable but refraining from changing the occurrences of hash for now? I just feel those add many changes in many repos and will be error-prone

src/entry.js Show resolved Hide resolved
src/entry.js Outdated Show resolved Hide resolved
This change is backwards compatible, meaning the old logs can still be read.

Also updated IPFS to latest version as the current installed one has breaking changes to the dag api.
This way, we are already using the new dag api.
@satazor
Copy link
Contributor Author

satazor commented Jan 8, 2019

@shamb0t thanks for the response

Regarding:

log.heads return cids, log.tails return entries and log.tailCids (old log.tailHashes) return cids. The most correct would be log.heads and log.tails return cids and log.headEntries and log.tailEntries to return entries.

This sounds reasonable to me.

Should I create an issue for this? Since this is a breaking change I guess we should take care of this later.

This indeed looks like an issue with transforming the exclude to exclude<entry.hashes>, cache[e.cid] = e gives { 'undefined': somehash} bc e is a hash. Looks to me like the test is wrong but the test description is strange ("included excluded entries?") I suspect the test should expect to not include that entry, @haadcode can you confirm? Okay the code is indeed wrong, will make a PR to fix

I've rebased against master, thanks for looking and fixing this.

This does indeed break backwards compatibility and also requires changing occurrences of hash in the rest of the orbit-db codebase. Also Log.fromEntryHash/Cid is used in orbit-db-store and should be returned. I agree that the implementation details should be hidden. I think this should be out of the scope of this PR and imo would be simpler and preferable to either forego consistency bw to/fromMultihash or hack toMultihash to give the correct cid by checking the entry version and using dag-pb or dag-cbor according to that. I do think we should move in that direction but will require more thought and should be a separate PR imo. What do you think?

To have 100% backwards compatible we can:

  • Use cid in entries and add hash property back but make it non-enumerable by using Object.defineProperty. This way, orbit-db and other consumers of this module would still be working if they access entry.hash.
  • Restore fromEntryHash (but deprecated) because orbit-db is using it
  • Ensure that the "imported" entries get migrated to the latest version, e.g.: when using Log.fromEntry

All these 3 points are very easy to do. Let me know if I'm missing something and if you still prefer to rollback all the hash to cid changes in this PR. @haadcode could you also provide your feedback on this?

@shamb0t
Copy link
Contributor

shamb0t commented Jan 8, 2019

Should I create an issue for this? Since this is a breaking change I guess we should take care of this later.

Agreed we should take care of this later, opened an issue here 👍

I've rebased against master, thanks for looking and fixing this.

Thank you! 👍

To have 100% backwards compatible we can:

  • Use cid in entries and add hash property back but make it non-enumerable by using Object.defineProperty. This way, orbit-db and other consumers of this module would still be working if they access entry.hash.
  • Restore fromEntryHash (but deprecated) because orbit-db is using it
  • Ensure that the "imported" entries get migrated to the latest version, e.g.: when using Log.fromEntry

All these 3 points are very easy to do. Let me know if I'm missing something and if you still prefer to rollback all the hash to cid changes in this PR. @haadcode could you also provide your feedback on this?

These sound good, I would prefer moving forward than rolling back the changes as I think this is the correct direction and is useful. Am I correct in understanding the above would keep consumers of this module from breaking but will not read newly created entries until all hash occurrences are changed to cid. (assuming we ensure that 'imported' entries get migrated, then calling entry.cid in consumers will have the same affect for old entries and support new entries also) 👍

We can probably keep fromEntryCid as long as we still have the hash property in entries, this is also a simple change in orbit-db-store

test/entry.spec.js Outdated Show resolved Hide resolved
test/entry.spec.js Outdated Show resolved Hide resolved
test/entry.spec.js Outdated Show resolved Hide resolved
@haadcode
Copy link
Member

haadcode commented Jan 8, 2019

It all looks good to me, but having said that, I haven't worked on this at all and it's hard to confirm for sure on individual implementation details. From what I can tell, it looks correct.

I would want to request one final test though: we should have an old log (as fixtures prolly) and make sure it can be opened and operated on with the new version (this PR) in all possible ways: fromCID/toCID/fromMultihash/toMultihash/etc. We should make sure that the entries are indeed old version to make sure it can handle them correctly, ie. instead of identity the entries have key and instead of cid they have hash etc. I believe having such test would make everyone more comfortable to land this knowing that we don't break everything for users with old log. And we should have the same test in OrbitDB to test the compatibility (out of scope for this PR, but still). How does that sound @shamb0t and @satazor?

You've done great job @satazor and @shamb0t, thank you for doing all this! ❤️

@satazor
Copy link
Contributor Author

satazor commented Jan 8, 2019

I would want to request one final test though: we should have an old log (as fixtures prolly) and make sure it can be opened and operated on with the new version (this PR) in all possible ways: fromCID/toCID/fromMultihash/toMultihash/etc. We should make sure that the entries are indeed old version to make sure it can handle them correctly, ie. instead of identity the entries have key and instead of cid they have hash etc. I believe having such test would make everyone more comfortable to land this knowing that we don't break everything for users with old log. And we should have the same test in OrbitDB to test the compatibility (out of scope for this PR, but still). How does that sound @shamb0t and @satazor?

I have added a few fixtures, see https://github.com/orbitdb/ipfs-log/tree/040ff46b0f29eb78def3a6ad49e5d329b1162b64/test/fixtures. I will double check if all those scenarios are being tested with these fixtures and add more tests.

Moreover, I will be working on:

  • Add hash property back but make it non-enumerable by using Object.defineProperty. This way, orbit-db and other consumers of this module would still be working if they access entry.hash.
  • Restore fromEntryHash (but deprecated) because orbit-db is using it
  • Ensure that the "imported" entries get migrated to the latest version, e.g.: when using Log.fromEntry

..so that we get 100% backwards compatibility. Once done, it would be great if you guys could help me testing against some projects to ensure everything is working as expected.

@satazor
Copy link
Contributor Author

satazor commented Jan 10, 2019

I've made the changes mentioned above in this last commit. Also I changed fromMultihash to simply call fromCID so that it's simpler to reason about the code and tests.

I haven't added interop tests for Log.fromEntry and Log.fromJSON because I couldn't find any tests for them. Should I leave the PR like this or add a few? If so, where do I add them?

- Restore fromEntryHash
- Make fromMultihash simply call fromCID
@satazor
Copy link
Contributor Author

satazor commented Jan 10, 2019

@shamb0t @aphelionz @haadcode this is ready to be reviewed. Some things to point out:

  • Entry.toMultihash returns the same result than the current code on master, while the new Entry.toCID returns a CID based based on cbor and with IPLD links; the same applies to Log.toMultihash and Log.toCID
  • Entry.fromMultihash and the new Entry.fromCID are able to read old and new entries; the same applies to Log.fromMultihash and Log.fromCID
  • All the entry objects passed in as arguments to functions in the public API are made interoperable. The same applies to entries returned from the public API. This is thanks to the Entry.ensureInterop function. We may later remove this function in favor of the migration pipeline.
  • I haven't added interop tests for Log.fromEntry and Log.fromJSON because I couldn't find any tests for them. Should I leave the PR like this or add a few? If so, where do I add them?

@shamb0t
Copy link
Contributor

shamb0t commented Jan 14, 2019

Thank you @satazor! Everything lgtm 👍 I've tested and it works with old orbit-db addresses. I'm happy to merge this without additional interop tests, I'll add some to the orbit-db tests too.

We can discuss steps re: migration pipeline in #211 looks to me that a lot of the preliminary work is done here which is great!

@shamb0t shamb0t merged commit f45f896 into orbitdb-archive:master Jan 14, 2019
@satazor
Copy link
Contributor Author

satazor commented Jan 14, 2019

@shamb0t thanks for looking & merging this! Regarding the documentation & examples, you will be updating them as part of #177?

@shamb0t
Copy link
Contributor

shamb0t commented Jan 14, 2019

@satazor yes I've added those to the API documentation in #177

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPLD Support
5 participants