Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed and redefined total peering time (%<tt) #1828

Closed
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
b2193b4
Fix proxy processing time for pinned connections
eduard-bagdasaryan Mar 24, 2024
2313710
Moved startPeerClock()/stopPeerClock() into constructor/destructor
eduard-bagdasaryan Mar 25, 2024
8688bb5
Removed an excessive stopPeerClock() from FwdState::retryOrBail()
eduard-bagdasaryan Mar 25, 2024
9ee4c5a
Drop nil checks for always-present TunnelStateData::request
eduard-bagdasaryan Mar 28, 2024
c834331
Documented the effects of secondary activities and re-forwarding
rousskov Mar 28, 2024
a71a763
Using Stopwatch for totalResponseTime() calculation
eduard-bagdasaryan Apr 4, 2024
182b682
Removed an excessive stopPeerClock()
eduard-bagdasaryan Apr 8, 2024
a2c176f
Adjusted documentation
eduard-bagdasaryan Apr 9, 2024
c2637ca
Fixed unit tests
eduard-bagdasaryan Apr 9, 2024
94bb99b
Overcome the unassigned ALE totalResponseTime()
eduard-bagdasaryan Apr 11, 2024
0a2b9b4
A couple of fixes
eduard-bagdasaryan Apr 14, 2024
9e192dc
Fixed LFT_TOTAL_SERVER_SIDE_RESPONSE_TIME calculation
eduard-bagdasaryan Apr 16, 2024
c8ac2b7
Renamed startPeerClock()/stopPeerClock() and adjused documentation
eduard-bagdasaryan Apr 16, 2024
5fb6a31
Removed the branch-added tunneled flag
eduard-bagdasaryan Apr 16, 2024
86033b5
Request is always non-nil in TunnelStateData::retryOrBail()
eduard-bagdasaryan Apr 16, 2024
9340dac
Removed a garbage line
eduard-bagdasaryan Apr 16, 2024
8092565
Ensure that TunnelStateData calls hier.stopPeering() exactly once
eduard-bagdasaryan Apr 17, 2024
a693219
Removed the peeringTime_.running() assurance from TunnelStateData
eduard-bagdasaryan Apr 17, 2024
b9347ee
Added an ActivityTimer wrapper for Stopwatch
eduard-bagdasaryan Apr 18, 2024
8c86590
Polished and removed a stale TODO
eduard-bagdasaryan Apr 18, 2024
80eed21
Addressed an ActivityTimer TODO
eduard-bagdasaryan Apr 19, 2024
a9c2c01
Use an overloaded chrono::duration operator %
eduard-bagdasaryan Apr 19, 2024
d4b420d
fixup: Polished ActivityTimer documentation
rousskov May 31, 2024
fe72ba9
fixup: Simplified TunnelStateData::peeringTimer initialization
rousskov May 31, 2024
fa32752
Addressed HierarchyLogEntry::totalPeeringTime lifetime concerns
rousskov May 31, 2024
95c3e8a
fixup: Polished ActivityTimer::paused name/docs, added description
rousskov May 31, 2024
48b31b2
fixup: Simplified ActivityTimer code by making the class less general
rousskov May 31, 2024
f3ee47b
fixup: Fixed %<tt documentation typo
rousskov May 31, 2024
95f59c5
fixup: Polished branch-added comments
rousskov May 31, 2024
8755344
fixup: Restored branch-removed assertions
rousskov May 31, 2024
2ccfd0a
Merged master into branch to get newer/required CI tests
rousskov May 31, 2024
02cf233
fixup: Avoid "peering" in %<tt brief description
rousskov Jun 10, 2024
c7236dc
fixup: Clarify special cases using paragraphs
yadij Jun 10, 2024
4443bdf
fixup: Fix example in re-forwarding case docs
rousskov Jun 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 26 additions & 7 deletions src/FwdState.cc
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,8 @@ FwdState::FwdState(const Comm::ConnectionPointer &client, StoreEntry * e, HttpRe
waitingForDispatched(false),
destinations(new ResolvedPeers()),
pconnRace(raceImpossible),
storedWholeReply_(nullptr)
storedWholeReply_(nullptr),
peeringTimer(r)
{
debugs(17, 2, "Forwarding client request " << client << ", url=" << e->url());
HTTPMSGLOCK(request);
Expand Down Expand Up @@ -186,6 +187,8 @@ FwdState::stopAndDestroy(const char *reason)

cancelStep(reason);

peeringTimer.stop();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would prefer to rely exclusively on RAII (provided by PeeringActivityTimer) instead of sprinkling code with these stop() calls. Unfortunately, FwdState objects are reference-counted, and there is no strong/obvious/maintained guarantee that a FwdState object does not survive long after stopAndDestroy() has been called. Thus, we should call stop() explicitly. RAII will still kick in if we miss a destruction code path, of course.


PeerSelectionInitiator::subscribed = false; // may already be false
self = nullptr; // we hope refcounting destroys us soon; may already be nil
/* do not place any code here as this object may be gone by now */
Expand Down Expand Up @@ -259,8 +262,6 @@ FwdState::completed()

flags.forward_completed = true;

request->hier.stopPeerClock(false);

if (EBIT_TEST(entry->flags, ENTRY_ABORTED)) {
debugs(17, 3, "entry aborted");
return ;
Expand Down Expand Up @@ -780,8 +781,6 @@ FwdState::retryOrBail()
// TODO: should we call completed() here and move doneWithRetries there?
doneWithRetries();

request->hier.stopPeerClock(false);

if (self != nullptr && !err && shutting_down && entry->isEmpty()) {
const auto anErr = new ErrorState(ERR_SHUTTING_DOWN, Http::scServiceUnavailable, request, al);
errorAppendEntry(entry, anErr);
Expand Down Expand Up @@ -1128,8 +1127,6 @@ FwdState::connectStart()
err = nullptr;
request->clearError();

request->hier.startPeerClock();

const auto callback = asyncCallback(17, 5, FwdState::noteConnection, this);
HttpRequest::Pointer cause = request;
const auto cs = new HappyConnOpener(destinations, callback, cause, start_t, n_tries, al);
Expand Down Expand Up @@ -1572,3 +1569,25 @@ ResetMarkingsToServer(HttpRequest * request, Comm::Connection &conn)
Ip::Qos::setSockNfmark(&conn, conn.nfmark);
}

/* PeeringActivityTimer */

// The simple methods below are not inlined to avoid exposing some of the
// current FwdState.h users to a full HttpRequest definition they do not need.

PeeringActivityTimer::PeeringActivityTimer(const HttpRequestPointer &r): request(r)
{
Assure(request);
timer().resume();
}

PeeringActivityTimer::~PeeringActivityTimer()
{
stop();
}

Stopwatch &
PeeringActivityTimer::timer()
{
return request->hier.totalPeeringTime;
}

37 changes: 37 additions & 0 deletions src/FwdState.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,40 @@ void ResetMarkingsToServer(HttpRequest *, Comm::Connection &);

class HelperReply;

/// Eliminates excessive Stopwatch pause() calls in a task with multiple code
/// locations that pause a stopwatch. Ideally, there would be just one such
/// location (e.g., a task class destructor), but current code idiosyncrasies
/// necessitate this state. For simplicity sake, this class currently manages a
/// Stopwatch at a hard-coded location: HttpRequest::hier.totalPeeringTime.
class PeeringActivityTimer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have actually started with a much more generic ActivityTimer class (even suitable for src/base/), but ran into concerns that TunnelStateData creates its ActivityTimer data member with an address of clientRequest->request->hier->totalPeeringTime, where the first element of that chain -- clientRequest -- has shorter lifetime than TunnelStateData itself. That initial attempt "worked" because TunnelStateData also keeps a refcounted copy of clientRequest->request, "protecting" the raw pointer to totalPeeringTime stored inside ActivityTimer, but the secret relationship felt too fragile.

PR's PeeringActivityTimer is simple and reliable (no raw pointers). It can be generalized if we need to support other HierarchyLogEntry or even ALE members (making code more complex).

{
public:
PeeringActivityTimer(const HttpRequestPointer &); ///< resumes timer
~PeeringActivityTimer(); ///< \copydoc stop()

/// pauses timer if stop() has not been called
void stop()
{
if (!stopped) {
timer().pause();
stopped = true;
}
}

private:
/// managed Stopwatch object within HierarchyLogEntry
Stopwatch &timer();

/// the owner of managed HierarchyLogEntry
HttpRequestPointer request;

// We cannot rely on timer().ran(): This class eliminates excessive calls
// within a single task (e.g., an AsyncJob) while the timer (and its ran()
// state) may be shared/affected by multiple concurrent tasks.
/// Whether the task is done participating in the managed activity.
bool stopped = false;
};

class FwdState: public RefCountable, public PeerSelectionInitiator
{
CBDATA_CHILD(FwdState);
Expand Down Expand Up @@ -212,6 +246,9 @@ class FwdState: public RefCountable, public PeerSelectionInitiator
/// Whether the entire reply (including any body) was written to Store.
/// The string literal value is only used for debugging.
const char *storedWholeReply_;

/// Measures time spent on selecting and communicating with peers.
PeeringActivityTimer peeringTimer;
};

class acl_tos;
Expand Down
18 changes: 4 additions & 14 deletions src/HierarchyLogEntry.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#ifndef SQUID_SRC_HIERARCHYLOGENTRY_H
#define SQUID_SRC_HIERARCHYLOGENTRY_H

#include "base/Stopwatch.h"
#include "comm/Connection.h"
#include "enums.h"
#include "hier_code.h"
Expand Down Expand Up @@ -37,22 +38,10 @@ class HierarchyLogEntry
/// Call this after each peer socket write(2), including failed ones.
void notePeerWrite();

/// Start recording total time spent communicating with peers
void startPeerClock();
/**
* Record total time spent communicating with peers
* \param force whether to overwrite old recorded value if any
*/
void stopPeerClock(const bool force);

/// Estimates response generation and sending delay at the last peer.
/// \returns whether the estimate (stored in `responseTime`) is available.
bool peerResponseTime(struct timeval &responseTime);

/// Estimates the total time spent communicating with peers.
/// \returns whether the estimate (stored in `responseTime`) is available.
bool totalResponseTime(struct timeval &responseTime);

public:
hier_code code;
char host[SQUIDHOSTNAMELEN];
Expand All @@ -70,13 +59,14 @@ class HierarchyLogEntry
Comm::ConnectionPointer tcpServer; ///< TCP/IP level details of the last peer/server connection
int64_t bodyBytesRead; ///< number of body bytes received from the next hop or -1

/// cumulative time spent (so far) communicating with all peers (see %<tt)
Stopwatch totalPeeringTime;

private:
void clearPeerNotes();

timeval firstConnStart_; ///< first connection use among all peers
struct timeval peer_last_read_; ///< time of the last read from the last peer
struct timeval peer_last_write_; ///< time of the last write to the last peer
struct timeval totalResponseTime_; ///< cumulative for all peers
};

#endif /* SQUID_SRC_HIERARCHYLOGENTRY_H */
Expand Down
31 changes: 27 additions & 4 deletions src/cf.data.pre
Original file line number Diff line number Diff line change
Expand Up @@ -4995,10 +4995,33 @@ DOC_START
[http::]<pt Peer response time in milliseconds. The timer starts
when the last request byte is sent to the next hop
and stops when the last response byte is received.
[http::]<tt Total time in milliseconds. The timer
starts with the first connect request (or write I/O)
sent to the first selected peer. The timer stops
with the last I/O with the last peer.
[http::]<tt Total peering time in milliseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"peering time" is not a defined concept, and is quite obscure - not being exclusive to peers, nor requiring their use at all. AFAIK other documentation calls this "forwarding" not "peering", so:

Suggested change
[http::]<tt Total peering time in milliseconds.
[http::]<tt Total time in milliseconds spent forwarding to origin servers or peers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squid documentation and code uses "peer" to mean "origin server or cache_peer". The term "peering" itself is also used. I think it is a better approach than spelling out "origin server or cache_peer" or forgetting one or the other. All of this is inconsistent and poorly defined, of course, but this PR is not making things worse. Arguably, it even improves the situation a bit.

Said that, I have changed that brief <tt description to use the terminology you prefer: "Total time spent forwarding to origin servers or cache_peers (milliseconds)." I hope this addresses your concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that were true, you would not have insisted on me writing "server or peer" in all previous PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex: I have changed that brief <tt description to use the terminology you prefer: "Total time spent forwarding to origin servers or cache_peers (milliseconds)." I hope this addresses your concern.

Amos: If that were true, you would not have insisted on me writing "server or peer" in all previous PRs.

I do not know what "it" refers to, but can we focus on moving forward this stalled PR rather than on my alleged past actions in other PRs, please? AFAICT, this change request has been addressed two months ago, using phrasing nearly identical to the one you have suggested. Please resolve this specific change request and, if possible, dismiss your negative PR review.


The timer starts when Squid decides to forward the request (to
an origin server or cache_peer) and peer selection begins. The
timer stops when relevant forwarding activities (including any
retries) end.

Between those two timer events, Squid may perform DNS lookups,
query external ACL helpers, adapt responses using pre-cache
RESPMOD services, and participate in other concurrent
secondary activities. Most secondary activities increase
peering time. In some cases, a secondary activity may start
before the timer starts or end after the timer stops, leading
Comment on lines +5006 to +5011
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is essentially saying that this timer follows wall-time clock, not a CPU-cycles clock. This list of things that can happen parallel or in sequence with an HTTP response arriving is already quite long and likely to have bitrot quite easily.
Can you come up with a simpler way to convey the concept without listing specifics of other Squid feature implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is essentially saying that this timer follows wall-time clock, not a CPU-cycles clock

I strongly disagree. The text gives the reader a good idea of what activities are included in %<tt measurements. Many admins do not realize that what Squid does when peering. Many admins do not realize that many of the peering activities may happen in parallel with other peering activities. Most do not realize that a secondary activity such as a DNS lookup may start before and/or end after peering. Specific examples help with all of that (and more).

This list of things that can happen parallel or in sequence with an HTTP response arriving is already quite long and likely to have bitrot quite easily.

The list has 3 specific examples. I disagree that three is "quite long". There is no significant danger of bitrot here because those three examples are very unlikely to go away, and the list is explicitly not exclusive.

Can you come up with a simpler way to convey the concept without listing specifics of other Squid feature implementations?

The current version is the best I can come up with. Its (moderate, IMHO) level of complexity is necessitated by the complexity of the thing being measured; a simpler version is already available as a brief <%tt description. Specific examples is a good think in this context IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody likes the text, but it's a text and thus better than nothing.
I'll take a stab at improving it in a separate PR eventually, and we can discuss things there.

to misleading results of simple computations like %<tt - %dt.
Comment on lines +5010 to +5012
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A secondary activity may start before the transaction peering timer when a transaction is joining an already pending secondary activity. For example, an HTTP request handled by FwdState (%<tt timer) may join an already ongoing DNS lookup for the requested domain name (%dt timer).

Similarly, a secondary activity may continue beyond a peering transaction phase. For example, a DNS A lookup may still be pending while the transaction proceeds with the DNS AAAA lookup result received earlier (due to Happy Eyeballs optimizations).

These natural imperfect/unpredictable concurrent activity/transaction overlaps imply that %<tt - %dt math does not always work as more "primitive" sequential processing models may suggest. In some cases, it may even yield negative results. We cannot really "fix" that, but we can make %<tt much simpler by not trying to exclude the "initial" DNS lookups from it (like the official code does, leading to admin confusion and Squid bugs). This PR adjusts %<tt definition and code accordingly.


If this logformat %code is used before its timer starts, the
corresponding measurement has no value (and the %code expands
to a single dash ("-") character). If this code is used while
its timer is running, the time spent so far is used as the
measurement value.
yadij marked this conversation as resolved.
Show resolved Hide resolved

When Squid re-forwards the request (e.g., due to an HTTP 304
response indicating that the cached response is stale), the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small problem:

  • We document elsewhere (eg FAQ) that Squid does not follow 3xx status from servers, and
  • 304 status indicates that response is fresh

Can you point me at the logic where this example is happening in latest Squid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reforwarding cases correspond to two processMiss() calls in clientReplyContext::handleIMSReply(). There is also an XXX there that (if addressed) would probably result in another processMiss() call. However, the example text does not match those cases. I fixed that bad example.

timer may restart. In this case, the new measurement is added
to the value accumulated from previous forwarding attempts.
The time interval between forwarding attempts is not added
to the final result.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR preserves Squid behavior documented in this paragraph. The cumulative nature of %<tt stats across these cache refreshing attempts was not documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW; I believe it was clearly implied by the old text stating the value as covering time from start of I/O with peer A to end of I/O with peer B.

I do not think we need to go into the level of internal logic this 3rd paragraph has. All reasons for any re-forwarding, re-tries, and redirection are covered by and/or already alluded to by the new paragraph 2 statement "(including any retries)"

Copy link
Contributor Author

@rousskov rousskov Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW; I believe it was clearly implied by the old text stating the value as covering time from start of I/O with peer A to end of I/O with peer B.

This text is not about switching from peer A to peer B (i.e. retrying peers using one FwdState instance). It is about re-forwarding requests using two FwdState instances. In some of those two FwdStates cases, Squid will use peer A, then peer B, and then peer A again and peer B again. The original description did not document what would happen in those cases. An admin could guess correctly, but they had similar if not greater chances of guessing wrong.

I do not think we need to go into the level of internal logic this 3rd paragraph has. All reasons for any re-forwarding, re-tries, and redirection are covered by and/or already alluded to by the new paragraph 2 statement "(including any retries)"

I think we do need to detail that because paragraph 2 only documents that the timer starts at A and stops at B. That documentation is not wrong (for some definition of a "retry"1), but it does not cover what happens when there are two pairs of A and B events (A1 and B1 plus A2 and B2). Interpreting those two pairs as A1 and B2 would be wrong -- the timer is not running between B1 and A2.

If you would like, we can replace "starts" and "stops" in paragraph 2 with "starts (or resumes)" and "pauses", but I am not sure it will improve things -- I think it is best to keep that initial paragraph simple and document special cases later. And even if we do make those replacements in paragraph 2, I would still keep this paragraph (in some shape of form) to explain what is going on in those special (but not so rare in collapsed forwarding environments) re-forwarding cases. Said that, if you insist, I will remove this paragraph. Your call.

Footnotes

  1. Related Squid terminology is inconsistent, but we tend to call attempts within a single FwdState instance "retries" (e.g., connect_retries and retry_on_error).


Squid handling related format codes:

Expand Down
1 change: 0 additions & 1 deletion src/clients/Client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,6 @@ Client::serverComplete()
}

completed = true;
originalRequest()->hier.stopPeerClock(true);

if (requestBodySource != nullptr)
stopConsumingFrom(requestBodySource);
Expand Down
13 changes: 10 additions & 3 deletions src/format/Format.cc
Original file line number Diff line number Diff line change
Expand Up @@ -635,9 +635,16 @@ Format::Format::assemble(MemBuf &mb, const AccessLogEntry::Pointer &al, int logS
break;

case LFT_TOTAL_SERVER_SIDE_RESPONSE_TIME: {
struct timeval totalResponseTime;
if (al->hier.totalResponseTime(totalResponseTime)) {
outtv = totalResponseTime;
// XXX: al->hier.totalPeeringTime is not updated until prepareLogWithRequestDetails().
// TODO: Avoid the need for updates by keeping totalPeeringTime (or even ALE::hier) in one place.
const auto &timer = (!al->hier.totalPeeringTime.ran() && al->request) ?
al->request->hier.totalPeeringTime : al->hier.totalPeeringTime;
Comment on lines +638 to +641
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing this TODO in this PR would create a lot of out-of-scope noise. That refactoring is also likely to require making some difficult design decisions and may result in other bug fixes or surprises. We should do it in a dedicated PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some cases where the al->heir is set without al->request being set?

AFAIK we should not have hierarchy contact without a request message, and the ALE should be setup with that. Perhapse it is possible to improve this code by just accessing the al->request->heir record and treating al->heir as deprecated ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some cases where the al->heir is set without al->request being set?

I am not aware of any.

AFAIK we should not have hierarchy contact without a request message, and the ALE should be setup with that. Perhapse it is possible to improve this code by just accessing the al->request->heir record and treating al->heir as deprecated ?

I hope there are better ways to address this problem than "treating ALE::hier as deprecated" (which,
AFAICT, means ignoring ALE::hier when logging %<tt).

The corresponding official code uses ALE::hier instead of ALE::request. Doing the opposite without serious analysis/tests/etc. is risky (and this PR does not take that risk). ALE::hier is used in several code contexts, including contexts unrelated to peering time. Ignoring it here may introduce bugs and/or more inconsistencies. Let this problem be handled in a dedicated PR (which will focus on addressing these issues properly rather than taking unnecessary risks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is the right place to deduplicate al->request->heir and al->heir.
There is a problem, and it should be tackled somewhere else (hopefully eventually)

if (timer.ran()) {
using namespace std::chrono_literals;
const auto duration = timer.total();
outtv.tv_sec = std::chrono::duration_cast<std::chrono::seconds>(duration).count();
const auto totalUsec = std::chrono::duration_cast<std::chrono::microseconds>(duration);
outtv.tv_usec = (totalUsec % std::chrono::microseconds(1s)).count();
Comment on lines +645 to +647
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code converts an std::chrono::duration to struct timeval. Eduard has evaluated runtime efficiency of this code and confirmed that its optimized performance is equivalent to traditional manual division using magic constants. We will submit a dedicated followup PR offering a reusable version of this conversion, replacing this and other then-current code doing this conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO code should not be doing this conversion at all. In here at least it should be displaying the duration object, not a timeval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that no conversion to timeval should be necessary, but improving Format::assemble() to handle std::chrono durations without such conversion should be done in a PR dedicated to that enhancement. This PR does not even introduce an std::chrono conversion to Format::assemble() -- it already exists in official Format::assemble() code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd resist scope creep here.
We are in the process of adding std::chrono and we are not consistent about it at all.
If @rousskov can, as he mentioned, add the facility to support it in a separate PR, I'm happy with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will submit a dedicated followup PR offering a reusable version of this conversion

Done in #1915.

doMsec = 1;
}
}
Expand Down
39 changes: 0 additions & 39 deletions src/log/access_log.cc
Original file line number Diff line number Diff line change
Expand Up @@ -187,12 +187,6 @@ HierarchyLogEntry::HierarchyLogEntry() :
store_complete_stop.tv_usec =0;

clearPeerNotes();

totalResponseTime_.tv_sec = -1;
totalResponseTime_.tv_usec = 0;

firstConnStart_.tv_sec = 0;
firstConnStart_.tv_usec = 0;
}

void
Expand Down Expand Up @@ -241,26 +235,6 @@ HierarchyLogEntry::notePeerWrite()
peer_last_write_ = current_time;
}

void
HierarchyLogEntry::startPeerClock()
{
if (!firstConnStart_.tv_sec)
firstConnStart_ = current_time;
}

void
HierarchyLogEntry::stopPeerClock(const bool force)
{
debugs(46, 5, "First connection started: " << firstConnStart_ <<
", current total response time value: " << (totalResponseTime_.tv_sec * 1000 + totalResponseTime_.tv_usec/1000) <<
(force ? ", force fixing" : ""));
if (!force && totalResponseTime_.tv_sec != -1)
return;

if (firstConnStart_.tv_sec)
tvSub(totalResponseTime_, firstConnStart_, current_time);
}

bool
HierarchyLogEntry::peerResponseTime(struct timeval &responseTime)
{
Expand Down Expand Up @@ -288,19 +262,6 @@ HierarchyLogEntry::peerResponseTime(struct timeval &responseTime)
return true;
}

bool
HierarchyLogEntry::totalResponseTime(struct timeval &responseTime)
{
// This should not really happen, but there may be rare code
// paths that lead to FwdState discarded (or transaction logged)
// without (or before) a stopPeerClock() call.
if (firstConnStart_.tv_sec && totalResponseTime_.tv_sec == -1)
stopPeerClock(false);

responseTime = totalResponseTime_;
return responseTime.tv_sec >= 0 && responseTime.tv_usec >= 0;
}

static void
accessLogRegisterWithCacheManager(void)
{
Expand Down
1 change: 0 additions & 1 deletion src/tests/stub_access_log.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ HierarchyLogEntry::HierarchyLogEntry() STUB
void HierarchyLogEntry::notePeerRead() STUB
void HierarchyLogEntry::notePeerWrite() STUB
bool HierarchyLogEntry::peerResponseTime(struct timeval &) STUB_RETVAL(false)
bool HierarchyLogEntry::totalResponseTime(struct timeval &) STUB_RETVAL(false)

ping_data::ping_data() :
n_sent(0),
Expand Down
29 changes: 19 additions & 10 deletions src/tunnel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ class TunnelStateData: public PeerSelectionInitiator
/// over the (encrypted, if needed) transport connection to that cache_peer
JobWait<Http::Tunneler> peerWait;

/// Measures time spent on selecting and communicating with peers.
PeeringActivityTimer peeringTimer;

void copyRead(Connection &from, IOCB *completion);

/// continue to set up connection to a peer, going async for SSL peers
Expand Down Expand Up @@ -310,7 +313,7 @@ TunnelStateData::serverClosed()
{
server.noteClosure();

request->hier.stopPeerClock(false);
peeringTimer.stop();

finishWritingAndDelete(client);
}
Expand Down Expand Up @@ -386,14 +389,25 @@ TunnelStateData::deleteThis()
delete this;
}

// TODO: Replace with a reusable API guaranteeing non-nil pointer forwarding.
/// safely extracts HttpRequest from a never-nil ClientHttpRequest pointer
static auto &
guaranteedRequest(const ClientHttpRequest * const cr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of places in Squid code where we store a raw pointer and use it for configuring other members. Some of them assert that the pointer is not nil (and violate data member initialization guidelines to inject those assertions). Some just assume that the pointer is not nil (and occasionally crash).

I think we can do better. STL offers some relevant guidance with throwing (but slower) std::optional::value() and std::vector::at() interfaces, but raw pointers will need a different solution. An investigation of various options and a dedicated PR is warranted here. That PR will, among other things, allow us to remove this guaranteedRequest() helper and simplify/improve constructors like TunnelStateData constructor below (that contains similar assertions).

{
Assure(cr);
Assure(cr->request);
return *cr->request;
}

TunnelStateData::TunnelStateData(ClientHttpRequest *clientRequest) :
startTime(squid_curtime),
destinations(new ResolvedPeers()),
destinationsFound(false),
committedToServer(false),
n_tries(0),
banRetries(nullptr),
codeContext(CodeContext::Current())
codeContext(CodeContext::Current()),
peeringTimer(&guaranteedRequest(clientRequest))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify a little (eliminating referencing/dereferencing) by making guaranteedRequest() return a pointer: its very name states that there is an existing request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could, but I did not want a function that guarantees request existence to return a pointer to that existing request: If the object is guaranteed to exist, its accessor should return a reference rather than a pointer. Either way, this code is problematic as detailed in #1828 (comment). I hope it will go away with changes discussed in that change request.

Said that, if others insist, I will refactor the helper function to remove this &.

{
debugs(26, 3, "TunnelStateData constructed this=" << this);
client.readPendingFunc = &tunnelDelayedClientRead;
Expand Down Expand Up @@ -477,8 +491,7 @@ TunnelStateData::retryOrBail(const char *context)

/* bail */

if (request)
request->hier.stopPeerClock(false);
peeringTimer.stop();

// TODO: Add sendSavedErrorOr(err_type type, Http::StatusCode, context).
// Then, the remaining method code (below) should become the common part of
Expand Down Expand Up @@ -1403,8 +1416,7 @@ TunnelStateData::sendError(ErrorState *finalError, const char *reason)
{
debugs(26, 3, "aborting transaction for " << reason);

if (request)
request->hier.stopPeerClock(false);
peeringTimer.stop();

cancelStep(reason);

Expand Down Expand Up @@ -1438,15 +1450,12 @@ TunnelStateData::cancelStep(const char *reason)
void
TunnelStateData::startConnecting()
{
if (request)
request->hier.startPeerClock();

assert(!destinations->empty());
assert(!transporting());

delete savedError; // may still be nil
savedError = nullptr;
request->hier.peer_reply_status = Http::scNone; // TODO: Move to startPeerClock()?
request->hier.peer_reply_status = Http::scNone;

const auto callback = asyncCallback(17, 5, TunnelStateData::noteConnection, this);
const auto cs = new HappyConnOpener(destinations, callback, request, startTime, n_tries, al);
Expand Down
Loading