You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a failover occurs, @jd0-sag explained me (and hence I coded that) that addNode() calls are made on the new active with the PlatformEntity false isActive set to false (when replaying the tree), meaning that passive entities are added in the tree, then some addNode will be called again with this time PlatformEntity flag isActive set to true, when active entities will be created by the new active.
This works fine with passthrough. We receive for the same entity 2 addNode, first one with isActive=false (when replaying the tree) then second one (isActive=true) when new active entity is created.
With Galvan, this does not work.
We receive the replay, but we do not receive events after active entities are created.
So server crashes because we cannot record clients fetching entities that are not in the topology tree because we didn't receive any addNode() call for them.
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [], platform)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], clients)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], entities)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.management.client.ManagementAgentEntityManagementAgent)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.management.client.ManagementAgentEntity', consumerID=2, name='ManagementAgent'})
.monitoring.MonitoringServiceProvider - [2] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/clients)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=4, name='pet-clinic/clients'})
.monitoring.MonitoringServiceProvider - [4] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.tms.client.TmsAgentEntityFailoverIT)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.tms.client.TmsAgentEntity', consumerID=1, name='FailoverIT'})
.monitoring.MonitoringServiceProvider - [1] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/pets)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=3, name='pet-clinic/pets'})
.monitoring.MonitoringServiceProvider - [3] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], state)
.monitoring.TopologyService - [0] serverStateChanged(testServer1, PASSIVE)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], fetched)
// we do not have additional addNode() with isActive=true, like Jeff explained me and like it is on passthrough
Here are the complete server logs with all M&M traces
I developed based on the discussions I had with @jd0-sag and the understanding that there should be 2 addNode() calls: one for the replay of the passive entity in the new active (isActive=false) and another call once the active entity is created (isActive=true).
So if there is a change of behavior, I need to know asap to change our implementation, and also how I can make the difference in the case the active entity failed to be created.
With the behavior i am expecting, addNode() is called for the replay, then if the active entity fails to be created, I won't have any addNode() with the active entity, so it's expected.
But with the current behavior, if i consider that the replay is also the addition of the new active entity in the tree, it means that we will see in the topology an active entity whereas it has not yet been created, and could fail.
When a failover occurs, @jd0-sag explained me (and hence I coded that) that addNode() calls are made on the new active with the PlatformEntity false isActive set to false (when replaying the tree), meaning that passive entities are added in the tree, then some addNode will be called again with this time PlatformEntity flag isActive set to true, when active entities will be created by the new active.
This works fine with passthrough. We receive for the same entity 2 addNode, first one with isActive=false (when replaying the tree) then second one (isActive=true) when new active entity is created.
With Galvan, this does not work.
We receive the replay, but we do not receive events after active entities are created.
So server crashes because we cannot record clients fetching entities that are not in the topology tree because we didn't receive any addNode() call for them.
Here are the complete server logs with all M&M traces
server0.stdout.log.txt
server1.stdout.log.txt
The text was updated successfully, but these errors were encountered: