Entity creation event not send to M&M after a failover #405

mathieucarbou · 2016-12-16T23:51:09Z

When a failover occurs, @jd0-sag explained me (and hence I coded that) that addNode() calls are made on the new active with the PlatformEntity false isActive set to false (when replaying the tree), meaning that passive entities are added in the tree, then some addNode will be called again with this time PlatformEntity flag isActive set to true, when active entities will be created by the new active.

This works fine with passthrough. We receive for the same entity 2 addNode, first one with isActive=false (when replaying the tree) then second one (isActive=true) when new active entity is created.

With Galvan, this does not work.

We receive the replay, but we do not receive events after active entities are created.

So server crashes because we cannot record clients fetching entities that are not in the topology tree because we didn't receive any addNode() call for them.

.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [], platform)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], clients)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], entities)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.management.client.ManagementAgentEntityManagementAgent)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.management.client.ManagementAgentEntity', consumerID=2, name='ManagementAgent'})
.monitoring.MonitoringServiceProvider - [2] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/clients)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=4, name='pet-clinic/clients'})
.monitoring.MonitoringServiceProvider - [4] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.tms.client.TmsAgentEntityFailoverIT)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.tms.client.TmsAgentEntity', consumerID=1, name='FailoverIT'})
.monitoring.MonitoringServiceProvider - [1] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/pets)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=3, name='pet-clinic/pets'})
.monitoring.MonitoringServiceProvider - [3] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], state)
.monitoring.TopologyService - [0] serverStateChanged(testServer1, PASSIVE)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], fetched)
// we do not have additional addNode() with isActive=true, like Jeff explained me and like it is on passthrough

Here are the complete server logs with all M&M traces

server0.stdout.log.txt

server1.stdout.log.txt

The text was updated successfully, but these errors were encountered:

mathieucarbou · 2016-12-16T23:53:55Z

Note: This is blocking our Failover tests

I developed based on the discussions I had with @jd0-sag and the understanding that there should be 2 addNode() calls: one for the replay of the passive entity in the new active (isActive=false) and another call once the active entity is created (isActive=true).

So if there is a change of behavior, I need to know asap to change our implementation, and also how I can make the difference in the case the active entity failed to be created.

With the behavior i am expecting, addNode() is called for the replay, then if the active entity fails to be created, I won't have any addNode() with the active entity, so it's expected.

But with the current behavior, if i consider that the replay is also the addition of the new active entity in the tree, it means that we will see in the topology an active entity whereas it has not yet been created, and could fail.

Thanks!

CC @anthonydahanne

myronkscott · 2016-12-17T18:49:03Z

I believe what you describe is the desired sequence. I'll look into why the correct sequence is not being reported as soon as possible.

mathieucarbou assigned myronkscott Dec 16, 2016

mathieucarbou added bug question labels Dec 16, 2016

mathieucarbou mentioned this issue Dec 16, 2016

Issue #191 : HA and Failover Galvan tests Terracotta-OSS/terracotta-platform#243

Merged

jd0-sag closed this as completed in eed1037 Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity creation event not send to M&M after a failover #405

Entity creation event not send to M&M after a failover #405

mathieucarbou commented Dec 16, 2016

mathieucarbou commented Dec 16, 2016 •

edited

Loading

myronkscott commented Dec 17, 2016

Entity creation event not send to M&M after a failover #405

Entity creation event not send to M&M after a failover #405

Comments

mathieucarbou commented Dec 16, 2016

mathieucarbou commented Dec 16, 2016 • edited Loading

myronkscott commented Dec 17, 2016

mathieucarbou commented Dec 16, 2016 •

edited

Loading