-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added onlick for organization icons and limit in github data fetcher #217
base: master
Are you sure you want to change the base?
Conversation
@saiteja13427 I think this one should block until the one you raised is merged. |
@ashutosh1919 I think this is a better approach than my approach, though I will review it properly once and then see which one do we go with. |
@KasRoudra I have tried testing the git script, I ran Did you test that script properly or am i doing something wrong? |
@saiteja13427 You aren't doing anything wrong. That's the correct behaviour of code. Currently the limit isn't sharp enough. If you give a number of 101-199, it will give you 100 results. For 201-299 you will get 200 results. Hope you got the point. I've just implemented core logic focusing on data fetching. However, I will try fix the limit issue in next commit. Thanks for your review |
@KasRoudra Yeah sure, do fix this and raise a commit! Let us make it flexible with any number as limit. |
@saiteja13427 I've done the commit. Take a test! |
@KasRoudra Tested It. I ran Also for orgs and issues if i give a limit more than the number of issues and orgs i have, It doesn't fetch anything at all. This need to be corrected. I hope you understood both the points! |
@saiteja13427 The first issue you mentioned isn't reproducible for me. You can check the code
Even if you skip a value, that will automatically take a value of 100. Maybe that issue was a temporary github bug. However, from the last commit limits larger than data length will be ignored. Let me know if you find any other issue |
Yup, now it seems to work well. Basically, both the issues are linked that's why the first issue was not reproducible for you. @ashutosh1919 I have checked this PR. Please do have a look and then we can merge it. Thanks, @KasRoudra for the changes. |
@KasRoudra @saiteja13427 I think it should depend upon the already produced data. So, what i expected originally was this one: we first need to see whether we have any PRs which are fetched. If we have PRs, then we need to find PRs which are not available. For example, if you run github_data_fetcher.mjs -pl 120, then it should fetch upto 120 PRs which are not already available in website local data. We can distinguish what is available and what is not from the PR id. It is okay to make multiple API calls internally to fetch more than 100 PRs. Same thing should happen to issues and orgs. What do you both think about this expected behaviour? |
@ashutosh1919 That will be nice feature but I do not get the point of using local data. The local data will be also available in github database and during api calls they will be automatically fetched. So why should we read and append that local data? |
We are locally storing data in order to improve performance. Fetching data would improve latency issues. |
I think the behaviour should be like this. By default the script should fetch all the PRs, Issues and Orgs on the first time and from the second run, it should just append the remaining to local data. On top of it, we can also give a limit feature where we will do the above thing only but cut the data to the specific limit number of PRs, orgs or issues so that users can have a specific number of latest contributions if they want. What do you think @KasRoudra @ashutosh1919 |
@saiteja13427 If we fetch "all" PRs, fetched data will automatically contain local data and therefore we don't need to read or append local ones. @ashutosh1919 suggests that we should only fetch those PRs which aren't available locally. Currently, I don't have idea how to fetch those accurately. If the user don't use a limit, we can save |
@KasRoudra no need to get confused. You don't need to worry about cursors. For each script run, we need to fetch all PRs/issues/orgs irrespective of the arguments it generates. And all means All even if the number of PRs are >>100. So once we get the data, then we can compare it with id of PR. Also, github API doesn't show data which is more than 1 year old IMO. So, that is why we are appending the new data and not replacing it. |
Alright so, let us make it clear. So, we are going with the approach of appending. When a user runs the script, if he doesn't have old data, then we just get everything. If the user has data, then we just get all data and keep on appending the data until the Id matches the latest data-id from the old data. |
So, our next goal is now clear. We can already get all PRs, issues and orgs. We also have limiting feature. The only remaining feature is appending local data. I think @saiteja13427 can do it better as he has already worked on it. I suggest merging this one for now and then @saiteja13427 should raise a new PR for that feature |
@ashutosh1919 Any decision on the pending pull requests? |
@saiteja13427 @KasRoudra what is the status of this PR? I think it is much needed feature and we should get it done with this one |
@ashutosh1919 this got left out actually. Let me test this once again and we will merge this PR first. Then on top of it I will add the appending feature we had discussed about! |
Summary of changes
Details on fetcher
Now
git_data_fetcher.mjs
accepts four arguments.--pr-limit(-pl)
for pull requests,--issue-limit(-il)
for issues,--org-limit(-ol)
for organizations and--limit(-l)
for all. Example:node git_data_fetcher.mjs -l 200
.Until the limit matches we keep requesting to github api and append the data altogether. So, user contributed organizations will properly appear now.