-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve NewEpochState
logging
#5854
Conversation
af9f3bd
to
746038c
Compare
handler outputFp (AnyNewEpochState sbe nes) = do | ||
handleException . liftIO $ do | ||
appendFile outputFp $ "#### BLOCK ####" <> "\n" | ||
appendFile outputFp $ BSC.unpack (shelleyBasedEraConstraints sbe $ encodePretty nes) <> "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is flushing a concern for our logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I was using the epoch state logs to investigate what happened after the failure, because the slots are short in the testnet tests. But if we'd like a live view, it would be better to flush, to not get partial JSON.
|
||
-- In order to create a diff of the epoch state logging file contents | ||
-- we must do so when the resources are deallocated (see runInBackground) | ||
-- and therefore when the log file is no longer in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant by "the file is no longer in use". Does it mean no longer being written to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you waiting on 228
to complete? I would use a synchronisation primitive of some kind. My primitive of choice is STM, but I think an MVar would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go for an MVar
.
Edit: This wasn't possible. See my most recent commit.
handler :: FilePath -> AnyNewEpochState -> StateT Int IO LedgerStateCondition | ||
handler outputFp (AnyNewEpochState sbe nes) = do | ||
handleException . liftIO $ do | ||
appendFile outputFp $ "#### BLOCK ####" <> "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this tag a json too? I'm thinking about the use case, where we could filter the json logs through jq
, so a non-json line would break jq afair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but that would also break because of encodePretty
. It would be nice to have an option to get only JSON lines here, so it could be consumed by jq
. Consider my comment as a potential improvement outside of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, jsonl
is better than json
.
-- TODO: I hate this. Is there a better way to do this without | ||
-- reaching for concurrency primitives? | ||
isFileReadableLoop :: FilePath -> IO Bool | ||
isFileReadableLoop fp = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appendFile
s in 248 and 249 are constantly opening and closing the files so I don't think this will work - I think you might get inbetween accidentally. I second John's advice https://github.com/IntersectMBO/cardano-node/pull/5854/files#r1611865524
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some changes. Let me know what you think.
:: String | ||
-> [(Text, Text)] | ||
epochStateBeforeAfterValues logFileContents = | ||
let allEpochStates = filter (/= "") . Text.splitOn "#### BLOCK ####" $ Text.pack logFileContents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: move the block tag to a variable and allow reusing it. Both usage places are 40 lines apart, so it'd be easy to get overlooked by someone making changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather do this in a follow up PR. I want to get the concurrency stuff right first.
isFileReadableLoop logFile >>= \case | ||
False -> error "isFileReadableLoop: Impossible" | ||
True -> do | ||
logFileContents <- IO.readFile logFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logFileContents <- IO.readFile logFile | |
logFileContents <- Text.readFile logFile |
You're using Text
later anyway
then "No changes in epoch state" | ||
else ppDiff diffResult | ||
|
||
instance (L.EraTxOut ledgerera, L.EraGov ledgerera) => ToJSON (L.NewEpochState ledgerera) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we later upstream this to ledger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that 👍
-- In order to create a diff of the epoch state logging file contents | ||
-- we must do so when the resources are deallocated (see runInBackground) | ||
-- and therefore when the log file is no longer in use. | ||
void . H.evalM $ allocate (pure ()) $ \_ -> do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also catch AsyncCancelled
in the handler
and act upon it instead. That may be easier than installing cleanups in MonadResource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I found out that technically allocate
is the wrong thing to use here because we are forking a thread. See my most recent changes + comments.
appendFile outputFp $ "#### BLOCK ####" <> "\n" | ||
appendFile outputFp $ show anyNewEpochState <> "\n" | ||
pure ConditionNotMet | ||
handler :: FilePath -> AnyNewEpochState -> StateT Int IO LedgerStateCondition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the Int
in the state? It appears unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Having epoch states as as json will be definitely more useful.
I think this may be problematic: https://github.com/IntersectMBO/cardano-node/pull/5854/files#r1613083518 . This also may be an easier alternative: https://github.com/IntersectMBO/cardano-node/pull/5854/files#r1613104145
ignoreException :: E.SomeException -> Bool | ||
ignoreException e = | ||
case E.fromException e of | ||
Just (MuxError errType _) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be wrapped in ExceptT
exception in foldEpochState
. Now it seems that exceptions processing is fragmented - some are caught here, some are caught there and wrapped in ExceptT
, which seems unintuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me what you want here? We need to ignore the MuxBearerClosed
which occurs because the node is foldEpochState
is connected to is shut down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like foldEpochState
to be returning an error in ExceptT
or a synchronous exception - not both of them. We're catching IOExceptions
there already for example.
where | ||
cleanUp :: H.Async a -> IO () | ||
cleanUp a = H.cancel a >> void (H.link a) | ||
runInBackground runOnException act = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function body is too convoluted now.
- The constraint
MonadResource m
is redundant. You're runningResourceT
in line 76, which means that any resource allocations here are freed in 76. - You're actually not allocating any resources with
ResourceT
here. TherunOnException
andact
are inIO
, so you can't useResourceT
from this function. Why it's needed here? It does not help with concurrency. resourceForkWith f act
- this looks like an incorrect use to me.f
is a forking function,act
is an action which that forking function gets as an argument (wrapped inbracket
andResourceT
cleanup stuff) and executes.f
is executed masked, butact
is executed unmasked. So, youract
does nothing, which reducesresourceForkWith
tobracket_
, because you're not usingMonadResource
here either. So why not just usebracket_
andasync
in the first place?- What happens to the executed thread when the parent thread completes? Does it get re-parented to the test suite thread? It seems to me that it will live as long as the node, which results in
foldEpochState
throwing. So if you don't want to kill the thread, you can just fork the logger and forget, relying on that it will quit on broken node connection. linkOnly ignoreException r
- so this will fail the test case on all other exceptions thanMuxBearerClosed
. What scenarios do you have in mind here?E.onException
catches synchronous and asynchronous exceptions here. I think you should catch only synchronous (usingonException
fromControl.Exception.Safe
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is true, removed.
2/3: We can and should use bracket
👍 . Initially I was trying to share the file handle between the threads and so I reached for resourceForkWith
. However we are simply waiting for another thread to be finished with the resource (i.e we are not interleaving).
-
In our use case when the parent thread completes our background process throws an exception because the node it was connected too shuts down. In the context of how we use
runInBackground
we can fork and forget because an exception will be thrown fromfoldEpochState
. -
This is correct. I didn't have any specific scenarios in mind but if an unexpected exception is thrown we shouldn't ignore it.
-
This is also a valid suggestion in our context. The parent thread will get terminated so we always expect our thread to throw a synchronous exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I responded in slack. This currently works, lets get it merged and you can investigate using bracket
in a follow up PR.
runInBackground act = void . H.evalM $ allocate (H.async act) cleanUp | ||
where | ||
cleanUp :: H.Async a -> IO () | ||
cleanUp a = H.cancel a >> void (H.link a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replying to: #5854 (comment)
I think allocate (H.async act)
is correct here. Resources are cleaned in LIFO order, so this should ensure that the thread gets killed before resources it uses are freed.
0d8b075
to
7c1b7e8
Compare
the recommended concurrency primitive the library exposes Update `runInBackground` with the ability to execute an IO action when the background thread is terminated due to an unexpected exception
7c1b7e8
to
37bc79b
Compare
Description
Improvements
startLedgerNewEpochStateLogging
is now pretty JSONNewEpochState
(i.e per block or at the turn of the epoch) to /logs/ledger-new-epoch-state-diffs.logPrior output format of generated logs (calling
show
after each block application/epoch transition):Pretty JSON output improvement:
New epoch state diff output:
Checklist
See Runnings tests for more details
CHANGELOG.md
for affected package.cabal
files are updatedhlint
. See.github/workflows/check-hlint.yml
to get thehlint
versionstylish-haskell
. See.github/workflows/stylish-haskell.yml
to get thestylish-haskell
versionghc-8.10.7
andghc-9.2.7
Note on CI
If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.