-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ethernet core #8
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Jasmijn Bookelmann <[email protected]> Co-authored-by: Cato van Ojen <[email protected]> Co-authored-by: MatthijsMu <[email protected]> Co-authored-by: Jasper Laumen <[email protected]> Co-authored-by: Mart Koster <[email protected]> Co-authored-by: Bryan Rinders <[email protected]> Co-authored-by: Daan Weessies <[email protected]> Co-authored-by: Rowan Goemans <[email protected]>
, (Maybe ArpResponse, (Maybe IPv4Address, Df.Data ArpLite))) | ||
-- User issues a lookup request. We don't have a timeout, because the ARP table should | ||
-- always respond within a reasonable time frame. If not, there is a bug in the ARP table. | ||
arpManagerT AwaitLookup{..} (Just lookupIPv4, arpResponseIn, Ack readyIn, _) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why this _awaitTransmission
is necessary? If you are in the AwaitLookup
state and you receive a Just lookupIPv4
why can't you just forward that directly to the ARP table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do forward the IP address unconditionally? _awaitTransmission
is needed to be able to keep driving the same ARP request to the transmitter if the transmitter asserts backpressure.
It makes no difference for a single entry ARP table which has one clock cycle latency, which is why the old version worked in hardware. But with a multi-entry table you can have a longer latency. And that's where the notorious bug that stopped the multi-entry stack from working came from.
I.e. it sent an ARP request to the transmitter, the transmitter did nothing with it because he wasn't ready, but we already moved to the waiting state.
-- ^ My MAC Address | ||
-> Signal dom (IPv4Address, IPv4Address) | ||
-- ^ My IP address and the subnet | ||
-> Circuit (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite)) (PacketStream dom dataWidth (IPv4Address, UdpHeaderLite)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure anymore of providing the UDP circuit as a parameter. Because it's very limiting in what you can then do with it.
In GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24#155 I have rewritten it so you have more flexibility. I would like your and others opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like
Circuit
(PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthRx 1 ())
(PacketStream dom dataWidth (IPv4Address, UdpHeaderLite), PacketStream domEthTx 1 ())
as a type for the UDP core. It clearly conveys that outgoing packets are either derived from incoming packets (i.e. ARP, UDP echo) or are coming from the outside (i.e. DHCP discovery). And that not all incoming packets are replied to.
The current version seems to assume that all incoming packets need to be replied to. Which is obviously not the case for anything but echos. So, let's change it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GiPHouse/qbaylogic-clash-based-macipudp-stack-spring24@868c010#diff-d4db0745d4f48da89a4b824cae23aa6a8d92447fa14b8f93220f4d6c538c5ce7 I did basically that for someone who wanted to use clash ethernet as a core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Only thing that I added is that you really need a fifo after the IP depacketizer.
A UDP echo stack can now be expressed as:
udpEcho = circuit $ \phyIn -> do
phyIn' <- unsafeRgmiiRxC ... -< phyIn
udpOut' <- mapMeta (\(ip, hdr) -> (ip, hdr{_udplDstPort = _udplSrcPort hdr, _udplSrcPort = _udplDstPort hdr})) -< udpOut
(udpOut, toTxPhy) <- fullStackC ... < (udpOut', phyIn')
rgmiiTxC ... -< toTxPhy
(KnownNat dataWidth) => | ||
(1 <= dataWidth) => | ||
Circuit (PacketStream dom dataWidth ()) (PacketStream dom dataWidth ()) | ||
preambleStripperC = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure anymore that an incoming ethernet packet will always have the full preamble since it's partially used for clock recovery. @DigitalBrains1 Do you know this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is true, then we may want to investigate stripping the preamble in the Ethernet domain. It should be simple enough to not hurt timings that much, and I suspect even putting registers between the component will be cheaper than doing it for generic dataWidth
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly sure a receiver can discard the preamble; not sure whether in whole or if there is a minimum number of bit times it needs to pass on for valid reception. I've quickly browsed through 802.3 and found the following things:
802.3-2018 section 4.2.5 Preamble generation
In a LAN implementation, most of the Physical Layer components are allowed to provide valid output some number of bit times after being presented valid input signals. Thus it is necessary for a preamble to be sent before the start of data, to allow the PLS circuitry to reach its steady state.
I think this says a receiver is allowed to only start outputting the preamble some number of bit times after the preamble starts, and the preamble is there to absorb the difference.
4.1.2.1.2 Reception without contention
The Physical Layer passes subsequent bits up to the MAC sublayer, where the leading bits are discarded, up to and including the end of the preamble and Start Frame Delimiter.
Not really conclusive, but I feel it leaves the door wide open to say the number of leading bits is inconsequential.
In section 4.2.9 Frame reception, IEEE 802.3 provides pseudo-Pascal programs to show what a working implementation of a CSMA/CD Media Access sublayer could look like. In it's BitReceiver
process, which produces received frames, it strips off stuff by invoking another procedure; the call is documented:
PhysicalSignalDecap; {Skip idle and extension, strip off preamble and sfd}
The procedure itself is only described:
procedure PhysicalSignalDecap;
begin
{Receive one bit at a time from physical medium until a valid sfd is detected, discard bits and return}
end; {PhysicalSignalDecap}
Nowhere does it say anything about a check on the length of the preamble in the receiver.
Without referring to any standard, it also just makes sense. The preamble is, among other things, meant for synchronisation. In the first part of the preamble, you haven't synchronised yet, so you can't output it because it is still gibberish in some sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote two new versions of the preamble stripper that allow for a variable-length preamble in bytes. I believe it is reasonable to assume that the SFD is byte-aligned.
The first version just forwards fragments upon encountering the SFD:
data PreambleStripperState
= ValidateSfd
| Forward
deriving (Generic, NFDataX, Show, ShowX)
preambleStripperC ::
forall dom.
HiddenClockResetEnable dom =>
Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ())
preambleStripperC = fromSignals (mealyB go ValidateSfd)
where
go ValidateSfd (Just PacketStreamM2S{..}, _) =
(nextSt, (PacketStreamS2M True, Nothing))
where
nextSt
| isNothing _last && head _data == 0xD5 = Forward
| otherwise = ValidateSfd
go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn))
where
nextSt
| isJust (_last transferIn) && _ready bwdIn = ValidateSfd
| otherwise = Forward
go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))
This version happily accepts packets with a longer preamble than normal just like the procedure described in the comment above. And prays that faulty packets are picked up by the CRC check.
Or we can go with a version that's a little stricter (but also slightly more expensive):
data PreambleStripperState
= ValidateSfd {_counter :: Index 8}
| Forward
| Drop
deriving (Generic, NFDataX, Show, ShowX)
preambleStripperC ::
forall dom.
HiddenClockResetEnable dom =>
Circuit (PacketStream dom 1 ()) (PacketStream dom 1 ())
preambleStripperC = fromSignals (mealyB go (ValidateSfd 0))
where
go ValidateSfd{..} (Just PacketStreamM2S{..}, _) =
(nextSt, (PacketStreamS2M True, Nothing))
where
nextSt
| isNothing _last && head _data == 0xD5 = Forward
| isNothing _last && _counter == maxBound = Drop
| otherwise = ValidateSfd (satSucc SatWrap _counter)
go Forward (Just transferIn, bwdIn) = (nextSt, (bwdIn, Just transferIn))
where
nextSt
| isJust (_last transferIn) && _ready bwdIn = ValidateSfd 0
| otherwise = Forward
go Drop (Just transferIn, _) = (nextSt, (PacketStreamS2M True, Nothing))
where
nextSt
| isJust (_last transferIn) = ValidateSfd 0
| otherwise = Drop
go st (Nothing, _) = (st, (PacketStreamS2M True, Nothing))
Which drops packets if the SFD was not detected in the first 8 bytes.
I personally prefer the latter, what do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel more for the former. I don't feel the extra circuitry adds a useful enough feature.
Also, in the latter, it seems to me you can replace satSucc SatWrap _counter
by just succ _counter
. It always starts at 0 and never progresses beyond maxBound
.
[edit] I would not oppose the latter if that's what you want anyway [/edit]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes? We cannot say that some bits are invalid. This is not possible with AXI Stream either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we might also commit a version that requires byte alignment now, and create an issue noting this requirement and asking anyone who discovers they have reception issues to please inform us with details about their setup.
I'm not convinced this is a good solution, but personally I'm willing to accept this. I can't speak for other people.
[edit]
To clarify, when I posted this reply, GitHub did not show me the previous message, so I didn't ignore it, I was unaware of it.
[/edit]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look what our forefathers did (liteeth). And what they do two things:
- Make preamble enabled or disabled, interestingly as a package deal with CRC https://github.com/enjoy-digital/liteeth/blob/master/liteeth/mac/core.py#L215.
- Check the last n-bits, where n is the datawidth, of the preamble + SFD (https://github.com/enjoy-digital/liteeth/blob/master/liteeth/mac/preamble.py#L77)
Also by googling I did find a phy that could return non-byte aligned shortened preamble + SFD: https://e2e.ti.com/support/interface-group/interface/f/interface-forum/1233993/dp83848c-preamble-sfd-length-is-shortage. Personally I think this should be made a phy specific problem. It's trivial to detect an SFD when you know you get n-bits every cycle. But hard when you have already adapted it to a byte packetstream with completely unknown alignment.
Personally considering the state currently I would keep the SFD check byte aligned for now. We only have an RGMII phy anyway. Make a note that explains the above story and once the next Phy is actually implemented we fix the story for that phy.
Point of note is that I did run the older colorlight board RGMII Phy in 100mbit mode using liteeth in the past without issues. That means the SFD was byte aligned. The 8.2v rev of the PCB has a phy that does not support 100mbit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you sensibly handle a non byte-aligned SFD with a streaming protocol that only allows you to enable full bytes?
I did not know about which place in the stack you were talking about. MII is a nibble-width protocol, and it's somewhat of a common unit in Ethernet. I thought you meant whether the nibbles of the SFD were byte-aligned, i.e., that the first nibble is always at an even number of cycles from the first where RX_DV is asserted (0 cycles, i.e., first nibble is the SFD, or 2, or ...).
It's a pity I misunderstood you. I spent a fair amount of time wading through 802.3 to check your assumption; turns out my assumption was the faulty one.
If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream
. It doesn't seem to make sense to not do this.
Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand Rowan correctly, I agree that it should be byte-aligned before you turn it into a PacketStream. It doesn't seem to make sense to not do this.
Yes this is essentially what I meant. If you wanted a generic component you'd want something like:
sfdDetector :: Signal (Maybe (BitVector n)) -> Signal (Maybe (BitVector n))
Which eats the input until a SFD is detected and then forwards. Similar to the original component except not yet in a byte aligned packet stream. And you also don't need to care about backpressure at this stage.
Doesn't RGMII strictly define byte alignment? I.e., they either say the lower nibble is on the falling clock edge and the upper nibble is on the rising edge, or they say the exact reverse but still fully specify it? I'm not going to dive into another standard right now...
Yes RGMII is a nice case. You get a full byte every cycle. I just checked and it specifies:
Multiplexing of data and control information is done by taking advantage of both edges of the reference clocks and sending the lower 4 bits on the rising edge and the upper 4 bits on the falling edge. Control signals can be multiplexed into a single clock cycle using the same technique.
and
This interface can be used to implement the 10/100 Mbps Ethernet Media Independent Interface (MII) by reducing the clock rate to 25MHz for 100Mbps operation and 2.5MHz for 10Mbps. The TXC will always be generated by the MAC and RXC will be generated by the PHY. During packet reception, the RXC may be stretched on either the positive or negative pulse to accommodate the transition from the free running clock to a data-synchronous clock domain. When the speed of the PHY changes, a similar stretching
of the positive or negative pulses is allowed. No glitching of the clocks are allowed during speed transitions.
This interface will operate at 10 and 100Mbps speeds exactly the same way it does at Gigabit speed with the exception that the data may be duplicated on the falling edge of the appropriate clock. The MAC will hold TX_CTL low until it has ensured that it is operating at the same speed as the PHY.
So if I interpret it correctly even in 10/100 Mbps mode it will always full transmit bytes.
It was only used to be able to test the ARP stack in hardware when we did not have IP yet.
Major documentation was missing for the ICMP module. Especially the fact that the checksum adjustment breaks for very specific (unlikely to happen in practice) input packets.
Mostly includes improvements in packet generation, and reuse of models and generators. Aside from that, most test files have also been formatted with fourmolu.
Now also recognizes packets of which some of the preamble is missing. The SFD is still required to be byte-aligned.
Also removed some superfluous constraints and made the timer of the ARP manager less granular. This caused the manual test to fail, because its domain is too slow. The manual test was not any better than a hardware test anyway, so I removed it. Once support for ReqResp tests arrives from clash-protocols, we can add a proper test.
ARP is all done and open for further review. Only thing missing is aborting upon receiving backpressure from the ARP receiver. The problem with that is that it does not run at full throughput currently, because that saves resources. I could generalize |
f75ceaa
to
477189d
Compare
Co-authored-by: Rowan Goemans <[email protected]> See the discussion in #8 (comment) for more details.
See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <[email protected]>
Split IPv4 checksum improvements to #14. |
Only the padding inserter and frame check sequence inserter needed adjustments. The tests are adjusted to use the new packet generation method.
PacketStream was recently updated to not require null bytes to be set to 0x00 anymore. The padding inserter and fcs validator require this, so they needed a small change. Many other components are now more efficient because of this relaxation, though. Relevant commit: clash-lang/clash-protocols@2e9557d
There is now a separate type for subnet masks to avoid confusion. Also added several utility functions for address classification.
Instead of multiplexing the input we can leverage clock enables to save resources and improve timing. Also for some reason, internetChecksum did not reuse onesComplementAdd.
See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <[email protected]>
See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <[email protected]>
See the discussion in #8 (comment) for more details. Co-authored-by: Rowan Goemans <[email protected]>
This PR contains an Ethernet II + IPv4 + UDP core.
Still TODO:
Mac.hs
andIP.hs
modulesIcmp.hs
andIPPacketizers.hs
partitionS
toClash.Protocols.Df
: Add variants of Df functions that handle function arguments wrapped in signals clash-protocols#107