Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine RetrieveStreamInfo() and RetrieveStreamData() into RetrieveStream() #97

Open
orao opened this issue Oct 8, 2016 · 1 comment

Comments

@orao
Copy link
Collaborator

orao commented Oct 8, 2016

The RetrieveStreamInfo() function generates a sequence of commits which the RetrieveStreamData() function then reads. The Git file IO here is small but still unnecessary as all of the information needed by RetrieveStreamData() is available during the execution of RetrieveStreamInfo(). They were separated originally because of the fact that they operate on two distinct Git refs which have completely different file contents. Using git worktree command we could remove the need for this separation and process them one after the other.

This solution would be pretty good although placing these two items in a kind of pipeline may be an even better way of approaching it. Having two separate processes would ensure that we parallelize the retrieval of the meta data and actual stream contents and maximize the Accurev servers processing by operating more than one operation at a time.

@orao orao added this to the v0.7 milestone Oct 8, 2016
@ghost
Copy link

ghost commented Oct 10, 2016

Hence the proposal is to make the ac2git.py script capable of being invoked by itself with some special arguments such that:

  1. The user invokes ac2git.py. We will refer to this instance as ac2git main process.
  2. The ac2git main process invokes ac2git.py which will only execute RetrieveStreamInfo() but modify what is printed on stdout to be simply a transaction number followed by a new line and an EOF string for when it completes. We will call this ac2git info process.
  3. The ac2git main process invokes ac2git.py again and pipes the stdout of ac2git info process to the stdin of this new ac2git data process. The ac2git data process will read stdin as a new line separated list of transactions, terminated by an EOF string.
  4. The ac2git main process will gather the information from the two child processes and print a sensible progress message to the user.

The effect of this is:

  1. The file IO from the sequential process that is currently implemented is removed completely and replaced by IPC via pipes.
  2. Independently invoking several Accurev commands simultaneously (in two independent processes) should increase our utilization of the processing power of the Accurev server.
  3. The network throughput limitations remain, but can be eliminated by running directly on the accurev server.
  4. Local file IO throughput limitations remain
  5. We can do more useful processing between network and file IO calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant