Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle cost surface creation from shapefile for project [MRXN23-91] [MRXN23-95] #1502

Merged
merged 6 commits into from
Sep 26, 2023

Conversation

alexeh
Copy link
Collaborator

@alexeh alexeh commented Sep 18, 2023

Substitute this line for a meaningful title for your changes

This PR has become a monster, let's discuss if I should split it.

Basically, we aim to reuse the project creation async workflow for when a project is created using a country or uploading a grid, so that we avoid race conditions, and (hopefully) do not add yet more complexity to the flow.

Overview

When a project is created, with shapefile, no-grid shapefile, or selecting admin-regions, a default cost surface should be created, and should be tied to the project planning unit creation workflow

Corresponding Plaaning Area, Project Planning Units and Cost Surface PU Data should be persisted in GEO

ApiDB.Cost Surface's MIN and MAX values should be updated accordingly once the project creation event is marked as finished

Designs

Testing instructions

Please explain how to test the PR: ID of a dataset, steps to reach the feature, etc.

Feature relevant tickets

Link to the related task manager tickets


Checklist before submitting

  • Meaningful commits and code rebased on develop.
  • If this PR adds feature that should be tested for regressions when
    deploying to staging/production, please add brief testing instructions
    to the deploy checklist (docs/deployment-checklist.md)
  • Update CHANGELOG file

@alexeh alexeh added the WIP Work in progress label Sep 18, 2023
@vercel
Copy link

vercel bot commented Sep 18, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marxan ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 26, 2023 2:51am

projectId: string,
costSurfaceId: string,
): Promise<Left<typeof jobSubmissionFailed> | Right<true>> {
throw new NotImplementedException();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this will be ever needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also applies to all the SetInitialProjectCostSurfaceStuff

max: 10,
});
const costSurface = await this.costSurfaceRepository.save(instance);
return this.calculateCost.forInitialCostSurface(projectId, costSurface.id);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, not sure given the path we have taken, this will ever be needed.

Comment on lines 61 to 69
await Promise.all(
puIds.map((puid) =>
costSurfaceRepository.update({ puid }, { costSurfaceId }),
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm most probably not understanding the whole process, I'm assuming this is part of creating a new cost surface from a shapefile

  • it looks like the cost_surface_pu_datas are created sometime before this because update is used instead of save (update doesn't create a new instance if it doesn't exist), but the cost_surface_id is NOT NULL on the DB, so to me it seems like this couldn't happen?
  • If this is from shapefile, I'm confused by the selection of pus to be used in the update query for the cost surface id (it selects all the pus from projects_pu for the given project, but not those relevant to the cost surface shapefile itself). This is probable a lack of context on my part.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely spotted. I will try to bullet point the ideas to be very explicit.

  1. We marked cost_surface_put_data.cost_surface_id as NOT NULL (probably a bad decision that we could rollback, because projects_pu.project_id can actually be NULL, and its left NULL until it gets updated for this case. However, for other cases project_id is set to planning_area_id until it gets updated. Yes, I know)

  2. This code is triggered after a shapefile upload with a grid (no grid is another story). So before this code is run we have:

  • planning_area
    -projects_pu
    -cost_surface_pu

Everything we do is to link the entities among both databases, hence the update/

  1. We select the all the PUs (project_pu.id) for the given project, and each of these PU has a cost, associated again to the given project. Since the cost is associated to a given project, and this project has at this point only one (the default) cost surface, all cost related to these PU, MUST be associated to this api.cost_surface

I hope this helps, we can of course sync later on if something is yet not clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we marked cost_surface_pu_data.cost_surface_id as NOT NULL

hm, yes, this is tricky. In the case of the nullable projects_pu.project_id column this was done because of the flow for setting project grids, which in some cases implies that we don't even have a project by the time we get to "create the grid for the project". There would have been other solutions, but we tried to find a pragmatic balance.

I suggest to do the same here. The general principle should be "don't make something nullable unless this is really unavoidable, realistically". If we can guarantee, without building a whole cathedral of complexity, that by the time we calculate the initial cost for PUs for a project we do have some kind of id (even if it's not yet the id of the "parent" cost surface in apidb - it could be the project_id, or maybe we could "just" explicitly set the id of the default cost surface to the same id as the related project, because in practice there's a 1:1 relationship), then we can keep cost_surface_pu_data.cost_surface_id NOT NULL

Comment on lines 54 to 56
const costSurfacePuRepo = manager.getRepository(
CostSurfacePuDataEntity,
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used for anything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I was thinking to add maybe a custom repo to encapsulate the query, but I don't want to hold this back any longer.

I removed the repo instance for now

Comment on lines 16 to 17
-- modifying this as I need the change for tests
UNIQUE (cost_surface_id, puid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment - this constraint is correct and should stay (we cannot have duplicate puids for a single cost surface)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this migration file was previously approved (and therefore executed everywhere). I have already spoken about that with @KevSanchez, so that either he will modify the constraint in his PR or we will push the change on a minimal PR and remove this change here.

I had to do temporarily do this so my tests reflect what is going to be reality, since all migrations are run in CI. But IRL my PR wouldn't work, we need a new migration reflecting the change.

Let's clarify this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, as we discusse we'll silently pretend that AddCostSurdaceApiDb1691980238543 never happened in local dev envs and on m23

btw since you're updating this migration, consider to quick-fix the name of the class and the file (Surdace -> Surface) - I only noticed this now (ok to keep it as is, ofc)

@@ -20,7 +20,7 @@ export class CostSurfacePuDataEntity {
(projectsPu: ProjectsPuEntity) => projectsPu.id,
)
@JoinColumn({ name: 'puid' })
projectsPu!: ProjectsPuEntity;
projectsPu?: ProjectsPuEntity;

@Column({ type: 'uuid', name: 'puid' })
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responding to the @todo comment from your much earlier PR: maybe we should rename the property to projectsPuId and the column to projects_pu_id? I would prefer to keep puid in general with a specific meaning, that is, the numeric (positive integer) id we attribute to each PU within a project (basically projects_pu.puid). I know we're not always consistent in this, but I recommend to avoid slipping in one more inconsistency here using puid to reference something that is not the numeric puid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with that, this was done as it is because it was specified in the relevant task for it, (in which I suggested that the naming might be confusing) but my understanding of the domain was poor enough to not to raise my voice at the time.

In order to modify this, we will need another migration, which should be added wherever see best, taking in account: #1502 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for adding to the confusion 😬

Comment on lines 61 to 69
await Promise.all(
puIds.map((puid) =>
costSurfaceRepository.update({ puid }, { costSurfaceId }),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we marked cost_surface_pu_data.cost_surface_id as NOT NULL

hm, yes, this is tricky. In the case of the nullable projects_pu.project_id column this was done because of the flow for setting project grids, which in some cases implies that we don't even have a project by the time we get to "create the grid for the project". There would have been other solutions, but we tried to find a pragmatic balance.

I suggest to do the same here. The general principle should be "don't make something nullable unless this is really unavoidable, realistically". If we can guarantee, without building a whole cathedral of complexity, that by the time we calculate the initial cost for PUs for a project we do have some kind of id (even if it's not yet the id of the "parent" cost surface in apidb - it could be the project_id, or maybe we could "just" explicitly set the id of the default cost surface to the same id as the related project, because in practice there's a 1:1 relationship), then we can keep cost_surface_pu_data.cost_surface_id NOT NULL

state: ProjectCostSurfaceState,
): Promise<void> {
await this.create({
data: {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could keep the parent class' signature for this method to keep the option to pass the context argument on to the event itself as data. Otherwise, we'd always have an empty data, which is ok for the ProjectCostSurfaceState.ShapefileSubmitted event, but would silently swallow the error caught in the parent catch block. I always favour sending as much diagnostic context as possible to API events especially for Failed events - it's a shame to throw away free information that could help troubleshooting issues in a running system!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as: #1502 (comment)

export class ProjectCostSurfaceApiEvents
extends ApiEventsService
implements ProjectCostSurfaceEventsPort {
private readonly eventsMap: Record<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly thrilled about this mapping - it's ok and no need to change it, but even from a semantic point of view, I am unconvinced by the need to have a concept of "state" (ProjectCostSurfaceState), which in my view is mostly a set of events in disguise of "state", and a further layer of indirection between what happens in the project cost surface handler and the API events. I'd just use the relevant API_EVENT_KINDS directly in the project cost surface handler, and pass them through in the event method in this class. But again, I can also see the rationale for your setup - I think it ends up being slightly wonky because of the suggested naming (the "state" part, for me), but this is quite subjective, I know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with this, I just tried to follow the shame approach that it's done for scenario__costSurface__

await this.projectsCrud.assignCreatorRole(
project.id,
info.authenticatedUser.id,
);

// TODO: How to handle left for Project? How is it handled by the async jobs triggered by actionAfterCreate?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question :/

we can handle left here by catching exceptions, but that's just for the sync part here.

for async jobs, if we want to care (maybe we should), we could add a boolean flag such as projects.ready defaulting to false at db layer, and whenever we get a GET request for a project,

  • if ready = false, check API events for async jobs and
    • if any has failed, clean up associated stuff and then delete the project (and return 404)
    • if any was submitted but didn't finish, return 404
    • if all async jobs have finished, set ready = true and return the project
  • if ready = true, return the project

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I have other pressuring items to work on, and there is still a work to do regarding including everything this PR does to the GC realm, I suggest to handle this in a separate PR

@@ -228,6 +234,9 @@ export class ProjectsCrudService extends AppBaseService<
await this.commandBus.execute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we can outright delete this actionAfterUpdate() method :/

@hotzevzl hotzevzl changed the title Feature/api/mrxn23-14 handle cost surface creation from shapefile for project handle cost surface creation from shapefile for project [MRXN23-91] [MRXN23-95] Sep 25, 2023
@alexeh alexeh force-pushed the feature/api/MRXN23-14-query-cost-surface-status branch from d40ac12 to fd481a5 Compare September 25, 2023 23:15
@alexeh alexeh force-pushed the feature/api/MRXN23-14-query-cost-surface-status branch from fd481a5 to 5c59b9f Compare September 25, 2023 23:16
@alexeh alexeh force-pushed the feature/api/MRXN23-14-query-cost-surface-status branch from 4c5c8e7 to cf7344d Compare September 26, 2023 02:49
@alexeh alexeh merged commit d9b02f7 into develop Sep 26, 2023
@alexeh alexeh deleted the feature/api/MRXN23-14-query-cost-surface-status branch September 26, 2023 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants