Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't drop fields from documents if they are not mapped in the entity #19

Open
dryajov opened this issue Jul 17, 2017 · 7 comments
Open

Comments

@dryajov
Copy link

dryajov commented Jul 17, 2017

Currently, hydrate will drop fields that are not mapped by an entity. This behavior is dangerous, specially with existing datasets, where data might be accessed across several unrelated application and not all fields might be relevant for a particular application. Dropping these fields is unexpected and can lead to loss of data, as such I strongly encourage this behavior to be reworked to something more predictable, even if it has an impact of performance, which I believe will be minimal in any case.

This is a continuation of the discussion initiated here - #15

@meirgottlieb Could you please provide pointers, I'd like to do a quick assessment of the effort, if its not too involved I might be able to work on it.

@meirgottlieb
Copy link
Contributor

Are you interested in doing this just for unmapped top-level Entity fields or also for fields in an Embedded document where the Embedded document is mapped but it contains fields that are not mapped?

@dryajov
Copy link
Author

dryajov commented Jul 17, 2017

For both, embedded and non embedded documents/top level and nested.

@deanwyns
Copy link

Hi guys, any update on this? It's kind of worrying that we might lose data this way in case we forget to map a field in a microservice, even though that microservice doesn't need that field.

I might be able to work on this in my free time, so if you'd be able to point me in the right direction, that would help me out a lot, @meirgottlieb.

I noticed in the other issue that we'd best go with the $set/$unset approach? Do you still think that's the best idea?

@meirgottlieb
Copy link
Contributor

Hey @deanwyns. No progress on this. If you want to work on it, that would be great.

I think the best approach is to use $set/$unset for updates. That being said, I suggest only doing this for top level entity fields and maybe embedded documents. I would not bother trying to do this for arrays or for documents embedded in arrays. The reason is that there will be situations that you can't handle for array changes. For example, you can't remove an item with $pop and perform a positional update at the same time. If anything changes in an array, I suggest just replacing the entire array.

So even if you implement this change, there are still situations where you are at risk of losing data in fields that are not modeled.

Using $set/$unset will also make the dirty checking slower and more complicated because you can't just stop checking when you find the first difference. (But then you save some time skipping serializing the document for replacement.)

@deanwyns
Copy link

Hi @meirgottlieb

Thanks, I'll see what I can do. I've dug a bit through the code but can't really seem to grasp it all yet. I've had experience with JPA/Hibernate in the past so I think it's great a similar setup was created here.

My initial idea was to proxy the entities using Proxy and mark a field as dirty whenever it has changed. So we'd keep a hashtable of fields that've changed and then update the database accordingly using a $set instruction whenever a dirty check is done. This way I guess there's no need for the one by one "equal" comparisons. Correct me if I'm wrong.

I see some code is using Object.observe (Observer class) but is it actually used? I see a lot of references to it but only in tests is the Observer class ever instantiated.

Do you think the proxy approach is the way to go, or is there a reason the fields are actually compared to the original document (without using proxies)? If proxies aren't a problem, would the best place to keep the "dirty field hashtable" be the ObjectLinks?

If I'm talking gibberish let me know :) the architecture seems very well thought out, but it takes some time to fully understand.

@meirgottlieb
Copy link
Contributor

Hey @deanwyns,

Originally I supported Object.observe for change tracking, ChangeTrackingType.Observe. What it did was observe the entity until something changed. Once something changed it would queue it for dirty checking, and then stop observing it.

When Google announced that they were going to pull support for Object.observe, I removed the ChangeTrackingType so nobody could use it anymore but left in much of the code that supported it figuring I could use Proxy once it was available in Node and perhaps take advantage of some of the code that supported Object.observe. You can see the changes to session.ts that removed support for Object.observe in this commit.

I think using Proxy for change tracking would be great but I don't know of the nuances involved or the overhead that Proxy will introduce. I can imagine that there will be situations where Proxy will cause problems so I don't think it should be the default way of doing this. But I do think it would a nice option.

I think the best way to implement support for $set/$unset is to do it during the dirty checking. That way if other types of change tracking are used, it'll still works fine. This should also be simpler than implementing proxies.

@meirgottlieb
Copy link
Contributor

meirgottlieb commented Aug 17, 2018

Just a note on the commit above - There was a bunch of other stuff in that commit as well. The main change was in _trackChanges:

    private _trackChanges(links: ObjectLinks): void {

        if ((links.flags & ObjectFlags.Dirty) || links.persister.changeTracking == ChangeTrackingType.DeferredImplicit) {
            this._makeDirty(links);
            return;
        }

        if (links.persister.changeTracking != ChangeTrackingType.Observe) {
            return;
        }

        if(links.observer) {
            // object is already being watched
            return;
        }

        links.observer = new Observer(() => {
            // value has changed
            // TODO: should this be added to the task queue in-case modification occurs during another operation?
            this._makeDirty(links);
            links.observer = undefined;
        });
        links.persister.watch(links.object, links.observer);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants