Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix runtime filtering with --threads:on #45

Closed
wants to merge 7 commits into from
Closed

Conversation

stefantalpalaru
Copy link
Contributor

@stefantalpalaru stefantalpalaru commented Mar 18, 2019

  • the topics registry is now a threadvar instead of a global one
  • raw pointers are no longer being used to bypass the GC
  • the test suite is now running all tests with --threads:on
    which means the single test doing that in the past can be safely
    removed

- the topics registry is now a threadvar instead of a global one
- raw pointers are no longer being used to bypass the GC
- the test suite is now running all tests with --threads:on
  which means the single test doing that in the past can be safely
  removed
@stefantalpalaru
Copy link
Contributor Author

  • fix topicsMatch() logic when no logging scopes are present. New test added.

@zah
Copy link
Contributor

zah commented Mar 19, 2019

This is an improvement over the status quo because it removes the gcsafe warnings, but I think we need a global topics registry - it would be unexpected for the users that functions like setLogLevel affect only the current thread.

Also, topicStateIMPL should not change. It's done the way it is for optimisation purposes.

@stefantalpalaru
Copy link
Contributor Author

stefantalpalaru commented Mar 19, 2019

I think we need a global topics registry

How would we implement that, now that Araq wants to make {.gcsafe.} blocks run only on the main thread? nim-lang/RFCs#142

topicStateIMPL should not change. It's done the way it is for optimisation purposes.

Readability and correctness are more important than performance.

Also, can we not ignore and forget this PR for the next 3 months? I need it now, to get runtime log level selection in Nimbus.

- mention that log levels available at runtime in this section are
  limited by a compile time setting in the section above
@stefantalpalaru
Copy link
Contributor Author

  • README: mention that log levels available at runtime in this section are limited by a compile time setting in the section above

@zah
Copy link
Contributor

zah commented Mar 19, 2019

Can you clarify what other solutions were attempted? In order of preference, I can imagine some other approaches:

  1. Make the topic registry thread-safe with a lock that guards only the topicStatesTable field. In particular, topicStateIMPL and topicsMatch should not obtain this lock. At the moment this may require some gcsafe overrides, but since you are discussing how we'll handle future versions of Nim, I can tell you that accessing a properly guarded sequence won't be considered a gcsafe violation in the future.

  2. Provide a minimal patch that solves the problem for Nimbus with a gcsafe override. We are not using multiple threads at the moment and the proper thread-safe implementation can wait.

@stefantalpalaru
Copy link
Contributor Author

Provide a minimal patch that solves the problem for Nimbus with a gcsafe override.

There are 2 different problems solved for Nimbus here: runtime log level setting with --threads:on and that runtime log level being applied when no logging scopes are set.

We are not using multiple threads at the moment and the proper thread-safe implementation can wait.

Nimbus is compiled with --threads:on.

Make the topic registry thread-safe with a lock that guards only the topicStatesTable field.

I totally would, but gcsafe checks ignore this completely and there's no interest upstream to change this. With {.gcsafe.} blocks being targetted for an even more restrictive change, I'm reluctant to use them at all.

I can tell you that accessing a properly guarded sequence won't be considered a gcsafe violation in the future

You're an optimist. I don't think Araq is interested in / understands the need to implement inter-process communication through shared memory.

@zah
Copy link
Contributor

zah commented Mar 19, 2019

You're an optimist. I don't think Araq is interested in / understands the need to implement inter-process communication through shared memory.

Regarding this problem, I happen to know for sure what his future plans are. We have discussed this at length on multiple occasions (the last time was just 2 days ago).

@zah
Copy link
Contributor

zah commented Mar 19, 2019

Nimbus is compiled with --threads:on.

Yes, but what I meant is that proper synchronization is not required yet, because there aren't multiple threads in practice. With gcsafe overrides and accessing shared values through pointers instead of directly, you should be able to make the compiler happy.

@stefantalpalaru
Copy link
Contributor Author

stefantalpalaru commented Mar 19, 2019

accessing shared values through pointers instead of directly

Why would we use raw pointers as our first option in a GC-ed language?

@arnetheduck
Copy link
Member

can't the gcsafe override be put somewhere in chronicles, so we don't have to litter the other code?

an alternative that I've used on occasion is to have a (thread-safe) queue for logging through which everything passes: log lines, settings updates etc. this tends to be good for latency on the logging side, but requires some maintenance to process the queue (on a thread or in the event loop, for example).

@stefantalpalaru
Copy link
Contributor Author

  • topics registry: globals, locks and gcsafe blocks

- new FileOutput.open() variant for existing file descriptors
@stefantalpalaru
Copy link
Contributor Author

  • thread safety, using a global lock that's acquired once per logging statement
  • new FileOutput.open() variant for existing file descriptors

This last change allows overriding a "textblocks[file]" sink with stdout or stderr:

  if len(conf.debug.logFile) != 0:
    discard defaultChroniclesStream.output.open(conf.debug.logFile, fmAppend)
  else:
    discard defaultChroniclesStream.output.open(stdout)

(A workaround for the lack of dynamic sinks, of course.)

except KeyError:
topic = Topic(state: Normal, logLevel: NONE)
registerTopic(topicName, topic)
return topic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wanted to spend some time with the code myself to help you finish this, but I'll have to provide some comments only.

Please try to get back to the previous version of topicStateIMPL. This would allow you to greatly reduce the need for locking in the rest of the code. The simple idea of topicStateIMPL is that each topic receives a global variable with a statically known address and most of code that needs to access and manipulate the state of a particular topic can work with a simple pointer to this address. Please note that the state variable does not contain garbage collected types and manipulating the state is just flipping an enum value, which is perfectly thread-safe.

setTopicState("literal_name", ...) can be translated to a write to the known location - no locking required.

The only place where we need locking is the registration of new topics (which happens at the start of the program before any threads are spawned). The more dynamic API (getTopicState, setTopicState(non-static string)) was added only to simplify the way the chronicle_tail tool works (because the tail tool doesn't known the names of topics in advance). Only this "dynamic" API needs locking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to get back to the previous version of topicStateIMPL.

I don't think the old version is correct. All the corresponding entries in the global topic registry were overwritten with Topic(state: Normal, logLevel: NONE) instances, deleting previous non-default values for those topics.

Please note that the state variable does not contain garbage collected types and manipulating the state is just flipping an enum value, which is perfectly thread-safe.

You can't just assume that with modern architectures. You either do explicit locking or you use atomic operations: https://github.com/nim-lang/Nim/blob/master/lib/system/atomics.nim

setTopicState("literal_name", ...) can be translated to a write to the known location - no locking required.

Until it's executed at the same time from different threads. It's better to be defensive here and assume the worst about the users of your API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a logging library. Extremely precise cache-coherent behavior of setTopicState is not really that important. Eventual consistency is good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventual consistency is good enough.

We were not getting it at all.

topicsMatch() received a list of Topic objects created by topicStateIMPL() - all of them having a Normal state and a log level of NONE. Then topicsMatch() checked their states and log levels like there could be any other values in there.

Somewhere along the line, the logic was scrambled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the commit that messed up the logic: f929f30#diff-7810f6d32e19cbc385dbe25d827e77f9L133

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the old code, topicsMatch received pointers to global objects that were "created" at the begging of the program. If you were always seeing NONE and Normal values for their fields, this is another bug that needs to be investigated.

@@ -282,19 +296,12 @@ template log*(stream: type,
template logFn(name, severity) {.dirty.} =
template `name`*(eventName: static[string],
props: varargs[untyped]) {.dirty.} =

bind logIMPL, bindSym, brForceOpen
logIMPL(instantiationInfo(), activeChroniclesStream(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I have used these slightly repeated bodies is to shave few nanoseconds from the compile times. Haven't tried to measure the effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to shave few nanoseconds from the compile times

Readability before nano-optimisations ;-)

chronicles.nim Outdated
@@ -255,6 +262,13 @@ macro logIMPL(lineInfo: static InstInfo,
code.add newCall("logAllDynamicProperties", Stream, record)
code.add newCall("flushRecord", record)

code = quote do:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this is controlled by a compile-time option. Also, we can use one lock per stream I think (or should it be even per-sink/per-output?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this is controlled by a compile-time option.

Done. All locking is disabled if threading is off.

Also, we can use one lock per stream I think (or should it be even per-sink/per-output?).

The finer-grained your locking, the more runtime overhead you have, with no actual benefit. Then you have to worry about different threads being able to interlace their setProperty() calls (or any other sub-record component).

This way, with one lock per log call, you're sure your record will reach the output in one piece and the wait should be short enough to not matter.

Worth revisiting with some actual instrumentation data on a system with 32+ cores and see what happens when they all wait on the same lock to do their logging. At that point, we're going to ask Jacek about logging buffers and dedicated logging threads.

registry.topicStatesTable[name] = topic

proc topicsMatch*(logStmtLevel: LogLevel, logStmtTopics: openarray[Topic]): bool =
lockRegistry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't lock either. All the information needed by the function is passed in as parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the information needed by the function is passed in as parameters.

It still reads from 2 global variables that may be changed at the same time from other threads.

Copy link
Contributor

@zah zah Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, ok, I'll think about this. It's OK to merge with locking for now

@stefantalpalaru
Copy link
Contributor Author

  • don't do any locking if threading is disabled

except KeyError:
topic = Topic(state: Normal, logLevel: NONE)
registerTopic(topicName, topic)
return topic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the old code, topicsMatch received pointers to global objects that were "created" at the begging of the program. If you were always seeing NONE and Normal values for their fields, this is another bug that needs to be investigated.

registry.topicStatesTable[name] = topic

proc topicsMatch*(logStmtLevel: LogLevel, logStmtTopics: openarray[Topic]): bool =
lockRegistry:
Copy link
Contributor

@zah zah Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, ok, I'll think about this. It's OK to merge with locking for now

@@ -255,6 +262,15 @@ macro logIMPL(lineInfo: static InstInfo,
code.add newCall("logAllDynamicProperties", Stream, record)
code.add newCall("flushRecord", record)

when compileOption("threads"):
Copy link
Contributor

@zah zah Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've changed my mind. We don't need locking here at all. Instead, we can unconditionally use the BufferedOutput type when compiling with threads: on. Then, the setProperty statements will populate a local buffer that will be flushed in the end with a single operation. Only the actual output device needs to be thread-safe, which is already the case for stdout.write.

The above can be achieved by modifying the function below:
https://github.com/status-im/nim-chronicles/blob/master/chronicles/log_output.nim#L155

zah added a commit that referenced this pull request Mar 24, 2019
@zah
Copy link
Contributor

zah commented Mar 24, 2019

Closing in favor of #46

@zah zah closed this Mar 24, 2019
@zah zah deleted the runtime branch March 24, 2019 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants