Since upgrading to NPM 11.5.2 last week we've had intermittent, and serious, problems with the information service crashing.
I've uploaded diagnostics, and opened two cases with support (one actually as a system down -- but no call-back, and I've phoned and left messages, or been diverted into a voicemail box. Leapfile is uploading at 1.4Mbps, which is a little frustrating on a 1Gbps network connection.)
2015-10-12 13:07:14,833 [4] ERROR SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.LimitationSnapshotService - (null) System.Data.SqlClient.SqlException (0x80131904): The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__0.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.ProjectOp.<GetEnumeratorInternal>d__0.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.PhysicalQueryPlan.<GetEnumerator>d__0.MoveNext()
at SolarWinds.InformationService.Core.InternalQueryResultReader.MoveNext()
at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerMemberFetcher.GetMembers(ICollection`1 definitions, IDictionary`2 members, Int32 containerId, Boolean clearDefinitonsAndRetryOnError)
at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerMemberFetcher.GetMembers(IEnumerable`1 containerIds)
at SolarWinds.Data.Providers.Orion.Containers.DataProvider.MemberCache.GetMembers(IEnumerable`1 containerIds, ContainerCacheContext context)
at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerTable.CreateScanTooples(ICollection`1 containers, IToople tuple, String[] columnProperties, ContainerCacheContext context)
at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerTable.CreateFilteredRelation(IToopleFactory toopleFactory, IStorageEntityReference entityRef, IEnumerable`1 properties, IExpression filter)
at SolarWinds.Data.Providers.Orion.OrionDataProvider.CreateFilteredRelation(IStorageEntityReference entityRef, IEnumerable`1 properties, IExpression filter)
at SolarWinds.Data.Query.PhysicalQueryPlan.FilteredProviderScanOp.CreateRelation(IDataProvider provider, ICollection`1 properties)
at SolarWinds.Data.Query.PhysicalQueryPlan.ProviderScanOp.<GetEnumeratorInternal>d__0.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.ProjectOp.<GetEnumeratorInternal>d__0.MoveNext()
at SolarWinds.Data.Query.PhysicalQueryPlan.PhysicalQueryPlan.<GetEnumerator>d__0.MoveNext()
at SolarWinds.InformationService.Core.InternalQueryResultReader.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at SolarWinds.Data.Providers.Orion.Common.EnumerableExtensions.MemoizedEnumerable`1.<GetEnumerator>d__2f.MoveNext()
at SolarWinds.Data.Providers.Orion.Common.EnumerableExtensions.Enumerated[T](IEnumerable`1 enumerable)
at SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.DAL.LimitationSnapshotDAL.GetAllLimitationEntities(LimitationInfo limitation, String entityType)
at SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.LimitationSnapshotService.GetLimitationItems(IDictionary`2 result, LimitationInfo limitation, String entityType)
ClientConnectionId:0e5cc627-55c6-4e94-925c-ae2a96c4e8da
Error Number:8623,State:1,Class:16 executed with:
limitationId:13, entityType:Orion.Groups
As I was writing this it looks like there is process generating snapshots of limitations? (is this new in 11.5.2?)
This error corrupts SWIS (I think it loses the database connection handle, and subsequent queries on the handle fail, which leads to other oddities); eventually SWISv3 crashes and restarts ... This has been going on all day...
It looks to me like the 'group of groups' type of limitation is breaking SWIS when that periodic recalc is performed :-(
Suggestions?