What should I do if my node does not compute an attribute after an upgrade?

Description

In rare cases after a distant migration, some components in charge of computing attributes may fail to initialize.

You can suspect this situation if you experience one of the following:

  • In the Monitoring screen
    • A computing has had no computed intervals for a long time.
    • The number of live tasks keeps increasing without any progress in computed intervals
  • Log produces stacktraces similar to:
Stacktrace
2017-08-03 07:51:15,732 [Domain-calcium-119-asynchronousEventInterceptorHandlers.1558592883293997057] ERROR calcium.asynchronousEventInterceptorHandlers.1558592883293997057 - Interceptor initialization aborted at registrationTime=SnapshotTime{transactionTime=1559509168204957697, validTime=2017-02-16T16:50:39.917Z}
java.lang.NullPointerException: Name is null
at java.lang.Enum.valueOf(Enum.java:236)
at com.systar.carbon.semantic.model.SourceType.valueOf(SourceType.java:6
at com.systar.carbon.semantic.model.IndicatorData.<init>(IndicatorData.java:38)
at sun.reflect.GeneratedConstructorAccessor145.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.systar.cobalt.mapper.impl.snapshot.mapper.QueryBeanMapper.listBeansFromParams(QueryBeanMapper.java:248)
at com.systar.cobalt.mapper.impl.snapshot.mapper.QueryBeanMapper.listBeans(QueryBeanMapper.java:98)
at com.systar.cobalt.mapper.impl.snapshot.mapper.cache.CachedBeanMapper.listBeans(CachedBeanMapper.java:78)
at com.systar.cobalt.mapper.impl.snapshot.CachedBeanRepository.<init>(CachedBeanRepository.java:31)
at com.systar.cobalt.mapper.impl.snapshot.ModelSnapshotImpl.createBeanRepository(ModelSnapshotImpl.java:34)
at com.systar.cobalt.mapper.impl.snapshot.ModelSnapshotImpl.access$000(ModelSnapshotImpl.java:14)
at com.systar.cobalt.mapper.impl.snapshot.ModelSnapshotImpl$1.load(ModelSnapshotImpl.java:28)
at com.systar.cobalt.mapper.impl.snapshot.ModelSnapshotImpl$1.load(ModelSnapshotImpl.java:25)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4881)
at com.google.common.cache.ForwardingLoadingCache.getUnchecked(ForwardingLoadingCache.java:51)
at com.systar.tau.util.cache.CacheUtil$LoadingCacheImpl.getUnchecked(CacheUtil.java:26)
at com.systar.tau.util.collection.FetchMaps$LoadingCacheFetchMap.fetch(FetchMaps.java:203)
at com.systar.cobalt.mapper.impl.snapshot.ModelSnapshotImpl.getRepository(ModelSnapshotImpl.java:45)
at com.systar.krypton.scheduler.impl.trigger.ComputingTriggerInterceptorFactory.completeConf(ComputingTriggerInterceptorFactory.java:278)
at com.systar.krypton.scheduler.impl.trigger.ComputingTriggerInterceptorFactory.create(ComputingTriggerInterceptorFactory.java:209)
at com.systar.calcium.impl.absorption.workflow.asynchronous.AsynchronousEventInterceptorHandler$DeploymentProcessor.createContext(AsynchronousEventInterceptorHandler.java:478)
at com.systar.calcium.impl.absorption.workflow.asynchronous.AsynchronousEventInterceptorHandler$DeploymentProcessor.run(AsynchronousEventInterceptorHandler.java:430)
at java.lang.Thread.run(Thread.java:748)

Glossary

The following defines some useful terms:

  • interceptor – Simplified name of Asynchronous event interceptor. This component is in charge of listening to events and triggering actions.
    Interceptors can have different responsibilities: computing, pre-computing, availabilities, etc.
  • message – Sent to an interceptor. A message contains the events registered by the interceptor (transaction message) or drives the interceptor lifecycle (destroy and update message).
  • pending messages – The messages stacked for the interceptor and which are not processed yet.
  • registration time – The transaction time (TT) or knowledge time when the interceptor was last updated. For more information about knowledge time, see the time concepts section of the Glossary.

Diagnostic and repair commands

To solve this issue, shell commands can help. 

Look for possible events stacking in the node

Execute the displayPendingMessage command.

g! displayPendingMessage
displayPendingMessage
InterceptorHandlers size=68
interceptor id: 1558592883293997057, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', pending messages: 368793
interceptor id: 1558592883294287873, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', pending messages: 335263
interceptor id: 1558592883294171137, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', pending messages: 167778

Analyze the problem

Execute the analyseInterceptorWithPendingMessages command.

g! analyseInterceptorWithPendingMessages
analyseInterceptorWithPendingMessages
interceptor id: 1558592883293997057, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', transaction messages: 368795, state: NOT DEPLOYED, registrationTime: SnapshotTime{transactionTime=1559509168204957697, validTime=2017-02-16T16:50:39.917Z}, first update registrationTime: SnapshotTime{transactionTime=1574710699623448578, validTime=2017-08-03T11:52:30.305Z}
interceptor id: 1558592883294287873, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', transaction messages: 335265, state: NOT DEPLOYED, registrationTime: SnapshotTime{transactionTime=1558592883278217217, validTime=2017-02-06T14:06:39.094Z}, first update registrationTime: SnapshotTime{transactionTime=1574710699623448578, validTime=2017-08-03T11:52:30.305Z}
interceptor id: 1558592883294171137, factory name: 'ComputingTriggerAsynchronousInterceptorFactory', transaction messages: 167779, state: NOT DEPLOYED, registrationTime: SnapshotTime{transactionTime=1558592883278217217, validTime=2017-02-06T14:06:35.937Z}, first update registrationTime: SnapshotTime{transactionTime=1574710699623448578, validTime=2017-08-03T11:52:30.305Z}

Fix the non deployed interceptors

If some interceptors are in the "NOT DEPLOYED" state,  execute the fixAllInterceptorNotDeployed command to fix all interceptors at once.

g! fixAllInterceptorNotDeployed
fixAllInterceptorNotDeployed
Interceptor 1558592883293997057 registration time updated to SnapshotTime{transactionTime=1574711228104704002, validTime=2017-08-03T12:01:01.266Z}
Interceptor 1558592883294287873 registration time updated to SnapshotTime{transactionTime=1574711228104704002, validTime=2017-08-03T12:01:01.596Z}
Interceptor 1558592883294171137 registration time updated to SnapshotTime{transactionTime=1574711228104704002, validTime=2017-08-03T12:01:01.667Z}


After executing all these commands, if you use the displayPendingMessage command again, you should see the number of pending messages decrease for every interceptors.

Advanced commands

The following commands change the data read by interceptors, use with caution.

Update interceptor registration time

To fix just one interceptor in the "NOT DEPLOYED" state, execute the changeInterceptorRegistrationTime command with the parameters:

  • interceptor id
  • transaction time Use first update registrationTime as transaction time retrieved with the analyseInterceptorWithPendingMessages command

g! changeInterceptorRegistrationTime 1558592883293997057 1574711228104704002
changeInterceptorRegistrationTime 1558592883293997057 1574711228104704002
Interceptor 1558592883293997057 registration time updated to SnapshotTime{transactionTime=1574711228104704002, validTime=2017-08-03T12:04:35.449Z}

Drop interceptor pending messages

You can also drop the pending messages for a given transaction time with the command dropInterceptorPendingMessage with the parameters:

  • interceptor id
  • transaction time – All messages with a lower transaction time will be dropped.

! dropInterceptorPendingMessage 1558592883293997057 1574711228104704002
dropInterceptorPendingMessage 1558592883293997057 1574711228104704002
Interceptor 1558592883293997057 drop messages older than 1574711228104704002

Related Links