Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in a software clock control DUT #654

Open
wants to merge 6 commits into
base: staging
Choose a base branch
from

Conversation

rslawson
Copy link
Collaborator

This PR adds in a new SwCc DUT based on the work done in my two other PRs. As of the current PR it mostly passes CI except for linting, but that's okay since there's still work to be done.

As for what work is still yet to be done:

  • @lmbollen suggested that I add in UGN stability testing to this test as well. It was in some earlier versions of my work, which can be seen in the (awful) history of my rs/measure-sw-cc branch. The only other DUT this appears in is FullMeshSwCc.
  • Some work must be done to figure out why I need the weird hack around masking syncIn to IlaPlotSetup (see line 908 and lines 937-947 as of time of writing) in order to avoid two syncRst asserts, which then causes captureCond to become Calibrate :: CaptureCondition for a second time, which is problematic.
  • Commented-out legacy code needs to be removed.
  • Formatting needs to be fixed (thanks CI).

@rslawson
Copy link
Collaborator Author

I should also probably report that somewhere in the combined total of the work from the three PRs (between this one and the two it depends on) the clock update frequency variability mentioned in issue #587 seems to have disappeared. Looking at the callistoSwIla ILA data, every test reports a minimum update period of 307 cycles/2.456 μs and a maximum update period of 309 cycles/2.472 μs for a variability of 16 ns. This is consistent across all boards within all topologies. The only big unknown left is software reframing timing, but that should be easy enough to get data on with just a single change to the DUT in another branch.

@rslawson
Copy link
Collaborator Author

As per the discussion in the meeting earlier today, the syncIn masking hack will stay in place until the testing framework changes in the pre-process branch are merged.

The issue is, as far as I can tell, that the syncOut signal from FPGA 7 continues to propagate to the other boards until it is reset by the HITL VIO providing it with new test data. The other boards, are still receiving this on syncIn, causing them to begin the test early. This results in the plotting ILA entering the calibrating state twice per test, which is a problem for cc-plot.

The current hack solution in this branch looks for the syncIn signal to be unchanging for 25ms or longer, which is interpreted as FPGA 7 being reset. After this the syncIn signal is unmasked and allowed to propagate through to the test synchronisation circuitry for the duration of the test. However, through proper usage of the pre-test, monitor, and post-test functions in the HITL tests it should be possible to force all of the synchronisers into reset in the post-test phase. This would ensure there would be no sync signal on syncIn until all of the boards have been given test data in the following pre-test phase.

@rslawson
Copy link
Collaborator Author

Adding to work to be done:
Figure out a way to make sure that each node cannot run a diff clock against another node that has not left clock control reset.

@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from 89ab9a9 to 24817b5 Compare October 17, 2024 20:53
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from 10ad426 to d2a0587 Compare October 22, 2024 09:23
@rslawson rslawson changed the base branch from rs/ilaPlot to staging October 22, 2024 09:30
@rslawson rslawson changed the base branch from staging to rs/ilaPlot October 22, 2024 09:32
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 5 times, most recently from 79591c5 to 550add3 Compare October 23, 2024 11:37
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 5 times, most recently from b8eff1d to 10b23cf Compare October 28, 2024 23:57
@rslawson
Copy link
Collaborator Author

Also yes, I promise I'll clean up this commit history before asking for review.

@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 4 times, most recently from 28c07d4 to 444d9aa Compare October 30, 2024 12:06
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 2 times, most recently from 94cc1bd to a039e5a Compare October 30, 2024 18:11
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from 3237e82 to e57b585 Compare November 4, 2024 15:32
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from e57b585 to b559c96 Compare November 6, 2024 15:30
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from b559c96 to 49972a1 Compare November 7, 2024 15:43
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from 49972a1 to 38be85a Compare November 11, 2024 12:35
Base automatically changed from rs/ilaPlot to staging November 15, 2024 08:25
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 2 times, most recently from f1740f4 to ed7574e Compare November 18, 2024 09:56
@rslawson
Copy link
Collaborator Author

The reframe enable config option currently doesn't seem to do anything. Currently debugging that in rs/swCcTopologiesTest-rfen.

@rslawson
Copy link
Collaborator Author

As mentioned, reframing is currently not working and I don't believe there's a quick path to getting it to work. As such, I'll make another PR at a later time to fix that, but the testing framework for enabling it will remain in place for the time being.

@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 2 times, most recently from 3361c2f to c51feff Compare November 21, 2024 11:08
Comment on lines 173 to 176
export RUNREF="${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
cabal run -- bittide-tools:cc-plot ${{ github.run_id }}:hwCcTopologyTest hitl-topology-plots
export TMP="${{ needs.bittide-instances-hardware-in-the-loop-test-matrix.outputs.report_kind }}"
cabal run -- bittide-tools:cc-plot ${{ github.run_id }}:$TMP hitl-topology-plots

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? If you want to define an environment variable you should at least give it a descriptive name

Comment on lines 185 to 188
mkdir -p plot-sources
mv hitl-topology-plots plot-sources/hwCcTopologyTest
export TMP="${{ needs.bittide-instances-hardware-in-the-loop-test-matrix.outputs.report_kind }}"
mv hitl-topology-plots plot-sources/$TMP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for here

should_generate_report=$(jq 'map(select(.top == "hwCcTopologyTest")) | length > 0' checks.json)
should_generate_report=$(jq 'map(select(.top == "hwCcTopologyTest" or .top == "swCcTopologyTest")) | length > 0' checks.json)
if [ "${should_generate_report}" == 'true' ]; then
tmp=$(jq 'map(select(.top == "hwCcTopologyTest")) | length > 0' checks.json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descriptive variable names :)

@@ -0,0 +1,24 @@
-- SPDX-FileCopyrightText: 2022 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024 :)

@@ -0,0 +1,149 @@
-- SPDX-FileCopyrightText: 2022 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024 :)

Comment on lines 45 to 57
data SwControlConfig dom mgn fsz where
SwControlConfig ::
( KnownNat mgn
, KnownNat fsz
, 1 <= fsz
, KnownDomain dom
) =>
Signal dom Bool ->
SNat mgn ->
SNat fsz ->
SwControlConfig dom mgn fsz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily specialized for software, is it? Could you add some more haddock comments?

Comment on lines 301 to 304
othersNotInCCReset = go <<$>> transceivers.rxDatas
where
go (Just val) = msb val == high
go _ = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
othersNotInCCReset = go <<$>> transceivers.rxDatas
where
go (Just val) = msb val == high
go _ = False
othersNotInCCReset = maybe False (\val -> msb val == high) <$> transceivers.rxDatas

Copy link
Collaborator Author

@rslawson rslawson Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to do <<$>> (since transceivers.rxDatas is Vec n (Signal rx (Maybe (BitVector 64)))), but otherwise yeah I like how you wrote that.

Comment on lines +306 to +305
othersNotInCCResetSync = zipWith go othersNotInCCReset transceivers.rxClocks
where
go sig rxClk = xpmCdcSingle rxClk sysClk sig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this slightly shorter

Suggested change
othersNotInCCResetSync = zipWith go othersNotInCCReset transceivers.rxClocks
where
go sig rxClk = xpmCdcSingle rxClk sysClk sig
othersNotInCCResetSync = zipWith go transceivers.rxClocks othersNotInCCReset
where
go rxClk = xpmCdcSingle rxClk sysClk

( ilaConfig
$ "trigger_0"
:> "capture_0"
:> "probe_milliseconds"
$ "trigger_fdi_0"
:> "capture_fdi_0"
:> "probe_fdi_milliseconds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why _fdi?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have no idea. I swear I copied that from another test but now I can't find where.

Comment on lines 57 to 88
callistoSwClockControl ::
forall nLinks eBufBits dom margin framesize.
( KnownDomain dom
, KnownNat nLinks
, KnownNat eBufBits
, 1 <= nLinks
, 1 <= eBufBits
, nLinks + eBufBits <= 32
, 1 <= framesize
, 1 <= DomainPeriod dom
) =>
Clock dom ->
Reset dom ->
Enable dom ->
SwControlConfig dom margin framesize ->
Signal dom (BitVector nLinks) ->
Vec nLinks (Signal dom (RelDataCount eBufBits)) ->
Signal dom (CallistoResult nLinks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exported function missing documentation

Copy link
Contributor

@martijnbastiaan martijnbastiaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. I'm really looking forward to ditching all the different instances.

bittide/src/Bittide/ClockControl/Callisto/Types.hs Outdated Show resolved Hide resolved

calibratedClockShift = capturedOnce
where
-- Possibly overkill.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this comment? (In the comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually I think it's worth bringing this up here since I wanted to ask if there was a good way to latch a value exactly once on a given signal. I think what I wrote there might be a bit over-engineered, and I'm also not sure if there's just a utility function somewhere that I overlooked.

Circuit (CSignal dom a) (Vec n (CSignal dom a))
csDupe = Circuit $ \(m, _) -> (pure (), repeat m)

cSigMap ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cSigMap ::
cSignalMap ::

Abbreviations take brain cycles to decode :-)

Comment on lines 15 to 43
import Clash.Explicit.Prelude hiding (PeriodToCycles)

-- import qualified Clash.Explicit.Prelude as E
import Clash.Prelude (withClockResetEnable)

import Language.Haskell.TH (runIO)
import System.FilePath

import Bittide.CircuitUtils
import Bittide.ClockControl (RelDataCount)
import Bittide.ClockControl.Callisto.Types (CallistoResult (..), ReframingState (Done))
import Bittide.ClockControl.DebugRegister (
DebugRegisterCfg (DebugRegisterCfg),
debugRegisterWb,
)
import Bittide.ClockControl.Registers (ClockControlData (..), clockControlWb)
import Bittide.DoubleBufferedRam (ContentType (Blob), InitialContent (Reloadable))
import Bittide.ProcessingElement (PeConfig (..), processingElement)
import Bittide.ProcessingElement.Util (memBlobsFromElf)
import Bittide.SharedTypes (ByteOrder (BigEndian))

import Project.FilePath

import Protocols
import Protocols.Idle

import Clash.Cores.Xilinx.Ila (Depth (..), IlaConfig (..), ila, ilaConfig)
import Data.Maybe (fromMaybe, isJust)
import VexRiscv
Copy link
Contributor

@martijnbastiaan martijnbastiaan Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Qualified deps" => we can do this with the latest fourmolu fourmolu/fourmolu#403. But, these changes haven't made it into a release yet. Something to keep in mind.

Edit: I was replying to Lucas here, but it seemed to be grouped in with my review..

Comment on lines 186 to 187
, reframingEnabled :: Bool
-- ^ Whether or not to run this test with reframing enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue we need to figure out how to properly test reframing and link it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made #693, will link in the comment.

where
go ugn = ugnStable
where
ugn' = bitCoerce <$> ugn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ugn' = bitCoerce <$> ugn
ugn1 = bitCoerce <$> ugn

Use number postfixes please. Ticks don't really scale and are easy to miss.


type WaitCycles dom hold prd = PeriodToCycles dom (prd - hold)

data ToFincFdecState' dom hold min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear what the ' is supposed to communicate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there because I forgot you can import Foo hiding (bar). Will fix.

| Idle
deriving (Generic, NFDataX, Eq)

speedChangeToFincFdec' ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Typically ' is used for "strict" versions of a function, but that's not the case here.

syncNodePrevEnteredReset =
sticky sysClk testReset syncNodeEnteredReset

syncIn' = mux syncNodePrevEnteredReset syncIn (pure True :: Signal Basic125 Bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syncIn' = mux syncNodePrevEnteredReset syncIn (pure True :: Signal Basic125 Bool)
syncIn1 = mux syncNodePrevEnteredReset syncIn (pure True :: Signal Basic125 Bool)

@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch from c51feff to a9ba56c Compare December 4, 2024 10:14
@rslawson rslawson force-pushed the rs/swCcTopologiesTest branch 2 times, most recently from 8747cd4 to 74540bb Compare December 4, 2024 14:58
Trying to change sync node reset detection to work better for FPGA 7.

Fixes from rebasing on `rs/ilaPlot`.

Forgot to sticky the CC output (again).

Sending data over links to see what comes out on the other side.

Adjusting ready detection/reporting.

Using the MSB of link data to indicate not in CC reset.

Fixes(?) to reset paths, ready reporting, and CC reset detection.

Attempting to cut off testing sooner.

Cutting off earlier was not the answer. Capturing calib vals differently now.

UGN TESTING IS SO BACK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement fast/efficient execution deadlines in software
3 participants