Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for virtual population with mongoose adapter #354

Merged
merged 11 commits into from
Jul 12, 2023

Conversation

Freezystem
Copy link
Contributor

@Freezystem Freezystem commented Apr 30, 2023

fix #355

This PR allows the pre-population of virtuals before converting the entity to an object when using mongoose adapter.

It compares the service model virtual field with request fields to populate.
If a field is in both lists it will be pre-populated with _id fields only in order to be expanded later in the populateDocs method.

Note:

The pre-population part could have been done in another function call before entityToObject methods but it would have required a lot more modification impacting other adapters.

Either one of this solution could have been done:

  • pre-populate in all actions (findById, findByIds, etc...) but it would have require to pass the context to all those functions and add pre-population condition in everyone of them resulting in a risky refacto.
  • pre-populate in a dedicated method called before entityToObject, but it would only serve mongoose purpose and be useless for other adapters adding useless no-op function in the workflow.

@Freezystem
Copy link
Contributor Author

@icebob can you take a look at this PR and let me know if everything is ok ? Thanks a lot 🙂

Copy link
Member

@icebob icebob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem with this PR is that it's a breaking change. Because the adapter.entityToObject is changed from sync to async. And also this method exists in all adapters, so you should change to async in all adapters as well. And it causes a major version update for all adapters.

Can you change that so it doesn't affect this method?

@Freezystem
Copy link
Contributor Author

I fully understand your concerns.
Actually I don't think other adapters needed to change.
As you can see in this line, adapter.entityToObject call is wrapped in a Promise.all so it could ever be sync or async. Also, the second parameter ctx is optional and defaulted to an empty object.
Moreover, I don't think this helper could have been written as synchronous as it have to use mongoose populate helper under the hood.

Another solution is the second one I mentionned in the PR:

pre-populate in a dedicated method called before entityToObject, but it would only serve mongoose purpose and be useless for other adapters adding useless no-op function in the workflow.
This may be cleaner but it would imply the creation of another no-op method for every DB adapter.

The implementation I made is strongly bonded to mongoose workflow and I can't think of anything else less cumbersome.
I also tried my best to avoid changes on other adapters, thus, this implementation should work flawlessly as is, without the need to alter the other ones and implying a breaking change.

Feel free to let me know if you have better implementation ideas and I'll try to modify my PR to implement it properly.

@Freezystem Freezystem requested a review from icebob May 24, 2023 22:03
@icebob
Copy link
Member

icebob commented Jun 12, 2023

I don't know why, but the tests are freezing on CI, locally it's running fine.

@Freezystem
Copy link
Contributor Author

I saw that and I was also trying to figure out why. I'm gonna investigate as deep as I can and try to understand why. My first guess would be an issue related to bluebird or promises in general.

@icebob
Copy link
Member

icebob commented Jun 12, 2023

Thanks, plz share if you find something.

@Freezystem
Copy link
Contributor Author

Ok I think I'm getting it 🤔
Node 10 and 12 are passing because mongoose adapter integration tests are excluded.

If I'm correct there is no mongodb server started on any workflows cause no docker container is poped in the workflow and all mongoose methods are mocked.
I was using a docker container on my local and that's why tests were passing.

I imagine that I have to rewrite all the tests with mocked methods instead unless you're okay to start a mongodb instance by adding a step in the github actions like so:

    - name: Start MongoDB
      uses: supercharge/[email protected]
      with:
        mongodb-version: 4.4

source

@Freezystem
Copy link
Contributor Author

I imagine that I have to rewrite all the tests with mocked methods instead unless you're okay to start a mongodb instance by adding a step in the github actions like so: ...

@icebob any thoughts on this one ?

@icebob
Copy link
Member

icebob commented Jul 11, 2023

Ohh, you are really right! I thought we have integrated tests as well (because the new database repo has), but now I'm checked and really, every test only unit tests.

If you have time and you can modify it, we should do the following steps:

  1. changing the test npm script to execute only unit tests.
  2. adding a new test:integration npm script which executes your integration tests. (same as here)
  3. Modifying the CI workflow add a new "Execute integration tests" step to run the new npm script but start a MongoDB server before the execution.

@icebob icebob merged commit 6ccc04b into moleculerjs:master Jul 12, 2023
@thib3113
Copy link
Contributor

thib3113 commented Sep 22, 2023

@Freezystem . How did you implement it in moleculer service ?

I'm interested to know, how, following the "one database per service" pattern1, and the actual moleculer implementation you can use this ?

You are initializing your service with one model (and only one), right ?
So, the model can't know the other schemas (and if know, the object can be in another database) ? and so can't populate them ? ( mongoose can't to it alone, it need to call the other service )


Actually, I think this PR doesn't work with pattern "one database per service" pattern

Also, in you're test, you are really linked to the "global mongoose connection" (that can't fit with the pattern, because differents services need to allow different connection) (you are creating models directly from mongoose like here - documentation )

So, maybe I miss something (it can be possible), or this logic need to be rethinked to follow the "one database per service" pattern .


Edit after more searches, I think the populate function need to populate only ID if the virtual is a reference, and the normal populate function will populate the rest by calling the actions . ( so, setting entity['<nameOfVirtual>'] = entity['<virtual.localField>'] )

Also, not sure to understand why you want to select only _id ? you already have the distant _id in the field, right ? so, you are asking mongodb, to load object, and just return _id, from _id ?

Footnotes

  1. moleculer documentation say the adapters are following the "one database per service" pattern in the first line of the documentation https://moleculer.services/docs/0.14/moleculer-db

@Freezystem
Copy link
Contributor Author

@thib3113 You're right, this fix can't work with the pattern one connection, one service (if your don't redeclare all the needed models in the created connection).
One db, one service won't work either and seems to be a bit over-complicated and really specific usage.

You'll generally wants your related models to be on the same db to be able to populate between them otherwise mongoose will not be able to make those links.

The fact is, in my case, I was using the same mongoose instance for all my services.
I'm not particularly a fan of it as I would prefer different connection for each service, but I was in a rush at the moment.

I see two "unsolvable" cases:

  • 1 service with 1 connection (with 1 model), same db : populating virtual can't work as current connection doesn't know about other related models.
  • 1 service with 1 connection (even with all models definition in it), different db: populating virtual can't work as current db can't reach related models documents even if it knows about it.

So.. In order to be able to populate virtuals properly we need two things:

  • all related models must be defined in the current connection (even if you don't use it to actually CRUD the other models)
  • all collections must be stored in the same db.

They are built in mechanism to use multiple DBs with mongoose, but as far as I understand it, you can't use populate from one DB to another with the same connection. You have to choose before requesting what DB you want to use for the query.

related docs:

Also you can't use the internal moleculer-db adapter mechanism to populate the virtuals because virtuals are a mongoose internal feature so documents must be in the mongoose document type to be able to retrieve those said virtuals.

If you envision other possibilities I'd be glad to assist you in the making of another PR concerning this issue.

@thib3113
Copy link
Contributor

Hello @Freezystem .

About the one database per service . This is the choice of @icebob when creating the moleculer-db ... supporting it on other adapters, and not on mongoose one seems strange .

About the rest, my idea can be to map virtuals manually ? ( so, lost the possibility to use justOne / options ), just, if found ref, we map entity['<nameOfVirtual>'] = entity['<virtual.localField>'] .
And so, use the populate function of moleculer-db .

the functionnality like justOne / options can be done by the propagate ( choosing the correct action ), and will also allow to use other moleculer functionnalities ( like cache ) .

Actually, I start a draft PR trying to correct the use of the pattern #370 . I plan to really support it . So, can't use the virtuals from mongoose, because models can be in different DB .

I'm not using virtuals a lot, so maybe I'm wrong on some points

@Freezystem
Copy link
Contributor Author

TL;DR: Virtuals are a Mongoose specific case.
This feature does not impact or concern other adapters.
If you ever want to use mongoose virtuals your connections MUST share their model definitions and ID related documents MUST be stored in the same DB.
In any other cases, you can (and must) use the built-in moleculer-db population mechanism

moleculer-db allows the user to define a connection per service, not necessarely one DB per service.
It could be but it's a very specific case. And if you're not using moleculer-db you'll have to do extra work to relate documents that reference each other stored in different DBs.

In the mongoose sense it means that if you don't instanciate all your schema as models in that newly created connection, mongoose will not know what you're talking about even if you're using the same DB to store your collections.

So yes, thanks to moleculer-db you can relate two differents databases to each other and they not even needs to be of the same type as long as the refs are matching, which is nice.

But here we are talking about virtuals, which is mongoose specific.
If you're using mongoose virtuals, you're referencing data that needs to be stored in the very same DBs. So There will not be some case of different DBs even if you're storing all of you data in mongo.

Important note on virtuals: they can be other things than basic references to a model, they can be computations base on references or computation without any reference:

e.g: Let's imagine that I have two models: Article and Fee.
Article contains a dutyfree price and a fee referenced by its ID.
So we are making a virtual called realPrice that is computed on the fly when I retrieve my article document because fee value can evolve. So I don't want to store a precomputed realPrice in the article to avoid a big cascade update when the fee changes.
First, the article and the fee must be stored in the same DB.
Second, even if my articles moleculer service and my fees moleculer service didn't share the same connection they need to know the mongoose model definition of each other for the virtual mechanism to work.
Finally, mongoose needs to populate the fee before being able to compute and give back the realPrice virtual.
In the moleculer-db case, virtuals will always be empty because article documents will be "POJOified" before calling the other services to get the referenced fee.
At that point the document is a POJO and mongoose will no longer be able to populate those virtuals.

So.. It's a mongoose specific case and as long as you are using virtuals, documents needs to be stored in the same DB and connections (if your using more than one) must know each other model definitions.

Be aware that in the mongoose case, that's the connection that knows about and enforce model definitions as opposed to SQL DBs that are enforcing data structures directly in the DB itself as table definition.
By design NoSQL dbs does not have any definition of the stored documents, thus collections may contains any kind of data structures and "population" between documents are generally resolved by using aggregation pipelines.

@Freezystem Freezystem deleted the patch-1 branch September 25, 2023 12:11
@thib3113
Copy link
Contributor

moleculer-db allows the user to define a connection per service, not necessarely one DB per service.

https://moleculer.services/docs/0.14/moleculer-db => the documentation talk about "one db per service" . But, yes it should allow to use the same DB for multiples services ( this is planned . my goal is to allow the original pattern, without blocking the other ) .

But here we are talking about virtuals, which is mongoose specific.
If you're using mongoose virtuals, you're referencing data that needs to be stored in the very same DBs. So There will not be some case of different DBs even if you're storing all of you data in mongo.

What I understand from virtuals, is that there are "virtual properties", populated from some configurations . and so, we can just map some of them to moleculer-db population ( so just replacing mongoose internal calls, by moleculer internal calls ), and so, no needs of storing the object in the same db . Everything will pass through moleculer, and so can use all the db you want .

About your example, in my opinion, you need to don't use it like that . But more with "moleculer" . If you want to do some calculs, you need to call an action, that will do it, and return the result .


So, ok, I understand all functions from virtuals are not handled . But, in my opinion, this adapter need to works more in "moleculer" way than in "mongoose" way .

@Freezystem
Copy link
Contributor Author

The example was fake and not a particularly good case indeed.
I'm using virtuals for some particular case but generally I will always the moleculer-db built-in population when I can because you can't search on a virtual property.

Virtual is like a subrequest in SQL, when you want to link one or several models to another one. Could be a simple join or a join with computation.

Typically, you have a user, that have cars. You don't want to reference all cars in the user document as the list will be hard to maintain. But you wanna be able to retrieve cars on demand if needed.
The virtual is good for that case and, as is, I can even pair it with the moleculer-db post population to ensure the user has the rights to read the related documents.
I know that it's more a SQL way to do relational and that it's not usually done like that in noSQL but it's quite elegant and effective to avoid repetitions and errors due to concurrency write/read.
Indeed, if I'm removing a car from the user, without virtuals I need to remove the car document and then search and remove this car from all the user documents that are listing it.
This search for deletion is expensive and in case of errors, inconsistencies may remain.
Virtuals are thus perfect for that use case: Once you delete the car, it will no longer be listed as any user car.

@thib3113
Copy link
Contributor

yes, ok I understand .

But, finally, what not using a moleculer-db with population ?
like, you have an user, that have cars, you can create an action that will do "return cars from user" . And so, load cars .

When not modifying the cars, the cache will handle it . if you update the cars of this user, you can update the cache ( in my moleculer, when I will update a cache, I will regenerate it at the end of the action ), when I update the cars, the next request will directly hit the cache, without calling mongo (and without knowing what is the DB handling the cars) .

or I don't understand ?

@Freezystem
Copy link
Contributor Author

Yes you're are right for the cache part.
In my case I'm handling relations/population on the mongoose side because my models are shared between a lot of differents nodejs applications and some are not running Moleculer.
I needed to centralized the way apps resolves relations/population and mongoose is great for that.

@thib3113
Copy link
Contributor

@Freezystem Hum ... yes ok I better understand ...

About, this my opinion will be the same "moleculer first" . Because this app is for moleculer .

But I understand the use-case ... So, because we will need to create a new major version (to use mongoose 7), what about an option like "replaceVirtualsRefById" default to true, and you can set to false to use in your use-case ?

so, if true, the virtual containing a ref will be replaced manually by the id, if false, it will call mongoose populate without modification ( like now ) ?

@Freezystem
Copy link
Contributor Author

Using a dedicated option is fine as long as I can still use virtuals in that way 😇
I was also working on implementing mongoose 7 in the adapter. But I'm missing time to finish so I'm glad you're working on it.
IMO the whole connection workflow should be rethink to use async/await mechanism of mongoose library and make the code more robust and readable. It will also avoid firing "false" errors on service restart when connection is already open.

@thib3113
Copy link
Contributor

IMO the whole connection workflow should be rethink to use async/await mechanism of mongoose library

Can you elaborate on it ? You are talking about this part

if (this.model) {
/* istanbul ignore next */
if (mongoose.connection.readyState == 1) {
this.db = mongoose.connection;
return Promise.resolve();
} else if (mongoose.connection.readyState == 2) {
conn = mongoose.connection.asPromise();
} else {
conn = mongoose.connect(this.uri, this.opts);
}
} else if (this.schema) {
conn = new Promise(resolve =>{
const c = mongoose.createConnection(this.uri, this.opts);
this.model = c.model(this.modelName, this.schema);
resolve(c);
});
}
? If yes, I will change it, because it doesn't allow to use the "one database per service" pattern ( it will use the global connection, and reuse it on the next service start ... event if the other service want to use another connection ) .

It will also avoid firing "false" errors on service restart when connection is already open.

hum, are your talking about what I fix in my last PR ? #367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Virtuals with reference are not populated when using mongoose adapter
3 participants