-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How Paper input matches the code state s(t)? #141
Comments
multi-video sim is for agents that can generalize to videos with different numbers and different level of bitrate encoding. It looks to me in the above writing your understanding of the code and the paper is correct. |
I have upgraded the code to work on Can you also explain the main components in the objective function?
|
Thanks again for upgrading the codebase. The training wall time really depends on your physical hardware. You can monitor the learning curve and see when the performance on validation set is stabilized. To determine if the model is converged, you can use some heuristic like relative performance didn't improve much for the past xxx iteration or something. At our time, we just eyeballed it. The main objective is just the policy gradient expression (the expression after the gradient operator). It's basically log pi_t * (R_t - baseline_t) + entropy regulator, sum over the training batch. Hope these help. |
Dear Hongzi,
I was trying to figure out the matching between the RL agent's state s(t) in the code and the input info in the paper.
Input: After the download of each chunk t, Pensieve’s learning
agent takes state inputs
st = (xt, τt, nt, bt ,ct ,lt)
to its neural networks.xt
is the network throughput measurements for the past k video chunks;
τt
is the download time of the past k video chunks,
nt
is a vector of m available sizes for the next video chunk;bt
isthe current buffer level;
ct
is the number of chunks remaining in the video; andlt
is the bitrate at which the last chunk was downloaded.First of all, which code package we need to look at, multi-
video_sim
orsim
?When I look at
sim
, I see indef agent
that the input state isCould you please illustrates the matching, and the actor & critic networks (figure 5) if possible?
The text was updated successfully, but these errors were encountered: