Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity creation event not send to M&M after a failover #405

Closed
mathieucarbou opened this issue Dec 16, 2016 · 2 comments
Closed

Entity creation event not send to M&M after a failover #405

mathieucarbou opened this issue Dec 16, 2016 · 2 comments
Assignees

Comments

@mathieucarbou
Copy link
Member

When a failover occurs, @jd0-sag explained me (and hence I coded that) that addNode() calls are made on the new active with the PlatformEntity false isActive set to false (when replaying the tree), meaning that passive entities are added in the tree, then some addNode will be called again with this time PlatformEntity flag isActive set to true, when active entities will be created by the new active.

This works fine with passthrough. We receive for the same entity 2 addNode, first one with isActive=false (when replaying the tree) then second one (isActive=true) when new active entity is created.

With Galvan, this does not work.

We receive the replay, but we do not receive events after active entities are created.

So server crashes because we cannot record clients fetching entities that are not in the topology tree because we didn't receive any addNode() call for them.

.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [], platform)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], clients)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], entities)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.management.client.ManagementAgentEntityManagementAgent)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.management.client.ManagementAgentEntity', consumerID=2, name='ManagementAgent'})
.monitoring.MonitoringServiceProvider - [2] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/clients)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=4, name='pet-clinic/clients'})
.monitoring.MonitoringServiceProvider - [4] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.tms.client.TmsAgentEntityFailoverIT)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.tms.client.TmsAgentEntity', consumerID=1, name='FailoverIT'})
.monitoring.MonitoringServiceProvider - [1] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform, entities], org.terracotta.management.entity.sample.client.CacheEntitypet-clinic/pets)
.monitoring.TopologyService - [0] serverEntityFailover(testServer1, PlatformEntity{isActive=false, typeName='org.terracotta.management.entity.sample.client.CacheEntity', consumerID=3, name='pet-clinic/pets'})
.monitoring.MonitoringServiceProvider - [3] onEntityDestroyed()
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], state)
.monitoring.TopologyService - [0] serverStateChanged(testServer1, PASSIVE)
.monitoring.IStripeMonitoringPlatformListenerAdapter - [0] addNode(testServer1, [platform], fetched)
// we do not have additional addNode() with isActive=true, like Jeff explained me and like it is on passthrough

Here are the complete server logs with all M&M traces

server0.stdout.log.txt

server1.stdout.log.txt

@mathieucarbou
Copy link
Member Author

mathieucarbou commented Dec 16, 2016

Note: This is blocking our Failover tests

I developed based on the discussions I had with @jd0-sag and the understanding that there should be 2 addNode() calls: one for the replay of the passive entity in the new active (isActive=false) and another call once the active entity is created (isActive=true).

So if there is a change of behavior, I need to know asap to change our implementation, and also how I can make the difference in the case the active entity failed to be created.

With the behavior i am expecting, addNode() is called for the replay, then if the active entity fails to be created, I won't have any addNode() with the active entity, so it's expected.

But with the current behavior, if i consider that the replay is also the addition of the new active entity in the tree, it means that we will see in the topology an active entity whereas it has not yet been created, and could fail.

Thanks!

CC @anthonydahanne

@myronkscott
Copy link
Member

I believe what you describe is the desired sequence. I'll look into why the correct sequence is not being reported as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants