-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] ocean core next resync #835
[Core] ocean core next resync #835
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI doesn't and left some more comments
port_ocean/ocean.py
Outdated
interval_str = ( | ||
integration.get("spec", {}) | ||
.get("appSpec", {}) | ||
.get("scheduledResyncInterval") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have a code that turns the integration to an instance of integration, that way we don't need to actually use dictionary to get that key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your referring to ocean.config.integration
but it only contains data from the env variables config and integration identifier
port_ocean/ocean.py
Outdated
next_resync_date = now + datetime.timedelta(minutes=float(interval or 0)) | ||
next_resync = next_resync_date.now(datetime.timezone.utc).timestamp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets leave it in seconds and make the UI show it in minutes if we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYM?
I changed it to timestamp
to have better control over how we show it, and changed it to UTC so we won't have any timezone issues
c52cdde
to
22146da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets talk about the implementation
def should_update_resync_state(self) -> bool: | ||
return ocean.config.runtime == RuntimeType.Saas.value | ||
|
||
async def before_resync(self) -> None: | ||
if not self.should_update_resync_state(): | ||
return None | ||
|
||
now = datetime.datetime.now() | ||
try: | ||
integration = await ocean.port_client.get_current_integration() | ||
interval_str = ( | ||
integration.get("spec", {}) | ||
.get("appSpec", {}) | ||
.get("scheduledResyncInterval") | ||
) | ||
interval = convert_time_to_minutes(interval_str) | ||
self.resync_state["next_resync"] = calculate_next_resync(now, interval) | ||
except Exception: | ||
logger.exception("Error occurred while calculating next resync") | ||
return None | ||
|
||
async def after_resync(self) -> None: | ||
if not self.should_update_resync_state(): | ||
return None | ||
|
||
await ocean.port_client.update_resync_state(self.resync_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want this implemented a bit different
With a lifecycle context manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a great idea!
but I think it's out of this PR's scope and it will inflate this PR more than it should, WDYT?
port_ocean/utils/misc.py
Outdated
def convert_time_to_minutes(time_str: str) -> int: | ||
""" | ||
Convert a string representing time to minutes. | ||
:param time_str: a string representing time in the format "1h" or "1m" | ||
""" | ||
if time_str.endswith("h"): | ||
hours = int(time_str[:-1]) | ||
return hours * 60 | ||
elif time_str.endswith("m"): | ||
minutes = int(time_str[:-1]) | ||
return minutes | ||
else: | ||
raise ValueError("Invalid format. Expected a string ending with 'h' or 'm'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Just send the time to port and let it handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's inconsistent with the way we send out time in non-saas integrations
port_ocean/ocean.py
Outdated
)( | ||
lambda: threading.Thread( | ||
target=asyncio.run(execute_resync_all()) | ||
).start() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to implement with asyncio task instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But tasks needs to be awaited which is not compatible with the target type Callable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI its incorrect task is a background task and we are using it in multiple places in ocean
for now its not important enough
port_ocean/core/models.py
Outdated
Saas = "Saas" | ||
OnPrem = "OnPrem" | ||
|
||
|
||
Runtime = Literal["OnPrem", "Saas"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point
9092c8b
to
1434dcf
Compare
port_ocean/clients/port/client.py
Outdated
async def update_integration_state(self, state: dict[str, Any]) -> dict[str, Any]: | ||
logger.debug(f"Updating integration state with: {state}") | ||
response = await self.client.patch( | ||
f"{self.api_url}/integration/{self.integration_identifier}", | ||
headers=await self.auth.headers(), | ||
json={"state": state}, | ||
) | ||
handle_status_code(response) | ||
return response.json().get("integration", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add it an endpoint of its own, that way modifying to implementation in the future will be easier.
As well as passing the patch to state = null
won't override the last state ( As it is a computed thing by the integration )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a blocker?
is it mandatory for this PR, because if we do have time to invest in this infra then I'd rather use a whole other infra set to save this data in
running_task: Task[Any] = get_event_loop().create_task( | ||
self.events["on_resync"]({}) # type: ignore | ||
) | ||
signal_handler.register(running_task.cancel) | ||
|
||
await running_task | ||
await self._after_resync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when it fails? will it stay running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will not stay up and running, I think whatever error that will come up to this it will not exit the app.
But we need to take into account that if this request fails in polling event-listener we might run an un called for resync since the latest updatedAt
will not inline with the current one
83e5342
to
373e9e5
Compare
997cae6
to
6958ad1
Compare
The calculation is running in a non blocking way before the start of the resync to avoid the calculating the resync time
port_ocean/ocean.py
Outdated
# TODO: remove this once we separate the state from the integration | ||
self.last_integration_updated_at: str = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain why this is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, but this purely technical I do think the code is self explanatory once you look it up
# we use the last updated time of the integration config as the start time since in saas application the interval is configured by the user from the portal | ||
if not last_updated_saas_integration_config_str: | ||
logger.error("updatedAt not found for integration") | ||
return (None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the effect and whether there are any action items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, more detailed log for whether this was expected or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
port_ocean/clients/port/client.py
Outdated
|
||
async def update_integration_state(self, state: dict[str, Any]) -> dict[str, Any]: | ||
logger.debug(f"Updating integration state with: {state}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure that should_log
and should_raise
are parameters that can be passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
We don't need this to be changed, these parameters stay constant across event-listener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is code in the client class, therefor it should be an interface for other uses as well. and therefor each call to the client should decide whether it wants to raise the error or not
port_ocean/clients/port/client.py
Outdated
) | ||
handle_status_code(response, should_raise=False, should_log=True) | ||
if not response.is_error: | ||
logger.info("Integration state updated successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and use the should_log
here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you add the should_raise
and should_log
then decide if to log based on the parameter passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if response.is_success and should_log: | ||
logger.info("Integration state updated successfully") | ||
|
||
return response.json().get("integration", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return response.json().get("integration", {}) | |
return response.json()["integration"] |
async def _before_resync(self) -> None: | ||
""" | ||
Can be used for event listeners that need to perform some action before resync. | ||
""" | ||
await ocean.app.update_state_before_scheduled_sync() | ||
|
||
async def _after_resync(self) -> None: | ||
""" | ||
Can be used for event listeners that need to perform some action after resync. | ||
""" | ||
await ocean.app.update_state_after_scheduled_sync() | ||
|
||
async def _on_resync_failure(self, e: Exception) -> None: | ||
""" | ||
Can be used for event listeners that need to handle resync failures. | ||
""" | ||
await ocean.app.update_state_after_scheduled_sync(IntegrationStateStatus.Failed) | ||
|
||
async def _resync( | ||
self, | ||
resync_args: dict[Any, Any], | ||
) -> None: | ||
""" | ||
Triggers the "on_resync" event. | ||
""" | ||
await self._before_resync() | ||
try: | ||
await self.events["on_resync"](resync_args) | ||
await self._after_resync() | ||
except Exception as e: | ||
await self._on_resync_failure(e) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead use a context manager syntax?
@asynccontextmanager
async resync_life_cycle():
await ocean.app.update_state_before_scheduled_sync()
try:
yield
except Exception as e:
await ocean.app.update_state_after_scheduled_sync(IntegrationStateStatus.Failed)
raise e
else:
await ocean.app.update_state_after_scheduled_sync()
async def _before_resync(self) -> None: | |
""" | |
Can be used for event listeners that need to perform some action before resync. | |
""" | |
await ocean.app.update_state_before_scheduled_sync() | |
async def _after_resync(self) -> None: | |
""" | |
Can be used for event listeners that need to perform some action after resync. | |
""" | |
await ocean.app.update_state_after_scheduled_sync() | |
async def _on_resync_failure(self, e: Exception) -> None: | |
""" | |
Can be used for event listeners that need to handle resync failures. | |
""" | |
await ocean.app.update_state_after_scheduled_sync(IntegrationStateStatus.Failed) | |
async def _resync( | |
self, | |
resync_args: dict[Any, Any], | |
) -> None: | |
""" | |
Triggers the "on_resync" event. | |
""" | |
await self._before_resync() | |
try: | |
await self.events["on_resync"](resync_args) | |
await self._after_resync() | |
except Exception as e: | |
await self._on_resync_failure(e) | |
raise e | |
async def _resync( | |
self, | |
resync_args: dict[Any, Any], | |
) -> None: | |
""" | |
Triggers the "on_resync" event. | |
""" | |
async with resync_life_cycle(): | |
await self.events["on_resync"](resync_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of code here is irrelevant to ocean core and can be calculated on the port side
port_ocean/ocean.py
Outdated
)( | ||
lambda: threading.Thread( | ||
target=asyncio.run(execute_resync_all()) | ||
).start() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI its incorrect task is a background task and we are using it in multiple places in ocean
for now its not important enough
aware_date = datetime.datetime.fromisoformat(time_str) | ||
if time_str.endswith("Z"): | ||
aware_date = datetime.datetime.fromisoformat(time_str.replace("Z", "+00:00")) | ||
return aware_date.astimezone(datetime.timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aware_date = datetime.datetime.fromisoformat(time_str) | |
if time_str.endswith("Z"): | |
aware_date = datetime.datetime.fromisoformat(time_str.replace("Z", "+00:00")) | |
return aware_date.astimezone(datetime.timezone.utc) | |
return datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ").astimezone(datetime.timezone.utc) |
Running = "running" | ||
Failed = "failed" | ||
Completed = "completed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason those are lower case?
port_ocean/ocean.py
Outdated
self.initiated_at = datetime.datetime.now(tz=datetime.timezone.utc) | ||
|
||
async def _setup_scheduled_resync( | ||
# This is used to differ between integration changes that require a full resync and state changes | ||
# So that the polling event-listener can decide whether to perform a full resync or not | ||
# TODO: remove this once we separate the state from the integration | ||
self.last_integration_state_updated_at: str = "" | ||
|
||
def is_saas(self) -> bool: | ||
return self.config.runtime == Runtime.Saas | ||
|
||
def _calculate_next_scheduled_resync( | ||
self, | ||
interval: int | None = None, | ||
custom_start_time: datetime.datetime | None = None, | ||
) -> str | None: | ||
if interval is None: | ||
return None | ||
return get_next_occurrence( | ||
interval * 60, custom_start_time or self.initiated_at | ||
).isoformat() | ||
|
||
async def update_state_before_scheduled_sync( | ||
self, | ||
interval: int | None = None, | ||
custom_start_time: datetime.datetime | None = None, | ||
) -> None: | ||
def execute_resync_all() -> None: | ||
loop = asyncio.new_event_loop() | ||
asyncio.set_event_loop(loop) | ||
_interval = interval or self.config.scheduled_resync_interval | ||
nest_resync = self._calculate_next_scheduled_resync( | ||
_interval, custom_start_time | ||
) | ||
state: dict[str, Any] = { | ||
"status": IntegrationStateStatus.Running.value, | ||
"lastResyncEnd": None, | ||
"lastResyncStart": datetime.datetime.now( | ||
tz=datetime.timezone.utc | ||
).isoformat(), | ||
"nextResync": nest_resync, | ||
"intervalInMinuets": _interval, | ||
} | ||
|
||
integration = await self.port_client.update_integration_state( | ||
state, should_raise=False | ||
) | ||
if integration: | ||
self.last_integration_state_updated_at = integration["state"]["updatedAt"] | ||
|
||
async def update_state_after_scheduled_sync( | ||
self, | ||
status: Literal[ | ||
IntegrationStateStatus.Completed, IntegrationStateStatus.Failed | ||
] = IntegrationStateStatus.Completed, | ||
interval: int | None = None, | ||
custom_start_time: datetime.datetime | None = None, | ||
) -> None: | ||
_interval = interval or self.config.scheduled_resync_interval | ||
nest_resync = self._calculate_next_scheduled_resync( | ||
_interval, custom_start_time | ||
) | ||
state: dict[str, Any] = { | ||
"status": status.value, | ||
"lastResyncEnd": datetime.datetime.now( | ||
tz=datetime.timezone.utc | ||
).isoformat(), | ||
"nextResync": nest_resync, | ||
"intervalInMinuets": _interval, | ||
} | ||
|
||
integration = await self.port_client.update_integration_state( | ||
state, should_raise=False | ||
) | ||
if integration: | ||
self.last_integration_state_updated_at = integration["state"]["updatedAt"] | ||
|
||
async def _setup_scheduled_resync( | ||
self, | ||
) -> None: | ||
async def execute_resync_all() -> None: | ||
await self.update_state_before_scheduled_sync() | ||
logger.info("Starting a new scheduled resync") | ||
loop.run_until_complete(self.integration.sync_raw_all()) | ||
loop.close() | ||
try: | ||
await self.integration.sync_raw_all() | ||
await self.update_state_after_scheduled_sync() | ||
except Exception as e: | ||
await self.update_state_after_scheduled_sync( | ||
IntegrationStateStatus.Failed | ||
) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic does not belong in this class
it is irrelevant for the core
this feature is only a side effect to the resync no the main core feature
All remaining changes can be added after this feature is deployed
port_ocean/clients/port/client.py
Outdated
async def update_integration_state( | ||
self, state: dict[str, Any], should_raise: bool = True, should_log: bool = True | ||
) -> dict[str, Any]: | ||
if should_log: | ||
logger.debug(f"Updating integration state with: {state}") | ||
response = await self.client.patch( | ||
f"{self.api_url}/integration/{self.integration_identifier}/state", | ||
headers=await self.auth.headers(), | ||
json=state, | ||
) | ||
handle_status_code(response, should_raise, should_log) | ||
if response.is_success and should_log: | ||
logger.info("Integration state updated successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def update_integration_state( | |
self, state: dict[str, Any], should_raise: bool = True, should_log: bool = True | |
) -> dict[str, Any]: | |
if should_log: | |
logger.debug(f"Updating integration state with: {state}") | |
response = await self.client.patch( | |
f"{self.api_url}/integration/{self.integration_identifier}/state", | |
headers=await self.auth.headers(), | |
json=state, | |
) | |
handle_status_code(response, should_raise, should_log) | |
if response.is_success and should_log: | |
logger.info("Integration state updated successfully") | |
async def update_integration_resync_state( | |
self, state: dict[str, Any], should_raise: bool = True, should_log: bool = True | |
) -> dict[str, Any]: | |
if should_log: | |
logger.debug(f"Updating integration state with: {state}") | |
response = await self.client.patch( | |
f"{self.api_url}/integration/{self.integration_identifier}/resync-state", | |
headers=await self.auth.headers(), | |
json=state, | |
) | |
handle_status_code(response, should_raise, should_log) | |
if response.is_success and should_log: | |
logger.info("Integration resync state updated successfully") |
Description
What:
Why:
How:
Type of change
Please leave one option from the following and delete the rest: