-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research similar projects #18
Comments
Internet Archive's Wayback Machine From #9:
|
Klaxon Description: You list websites you want monitored and Klaxon will visit them and, if they change, email you what's different. It saves you having to reload dozens of links yourself every day. |
List of Similar projects - Website Change Monitoring / Notification
|
Diffbot : A tool for web data extraction |
@daas-ankur-shukla - Well, this adds a whole new dimension to the problem. |
Yes. After going through the website and its different products Diffbot, Crawbot and Custom-made APIs, and Article,Discussion,Image,Product and Video APIs, its believable how the bot returns back structured data. In order to understand "Mechanics of How they do it",I went through the Diffbot official Github repo here - https://github.com/diffbot, unfortunately all they have to offer is Client (Ruby, JS), documentation of API calls, etc. So all in all despite being a promising product, it remains a "product to be purchased" (after the 14 day trial ofcourse) |
Another monitoring one... Description: Intro Screencast
Deployment: Docker / Heroku / Local ubuntu or debian |
Have tried setting the Huginn on Fedora 25, i3 (4-core 4GB DDR3). |
Here is another one: Features:
Deployment: Local install |
@dcwalk - I installed Diff Engine on my machine and it's quite simple to use. |
Glad to have these projects listed and thoughts on use/disadvantages here! |
Thinking of making a detailed study and comparative analysis of each of these projects for the benefit of everyone. What do you reckon? Useful or not needed? @dcwalk |
Hello! I ve tried really all of them with antoher docezens from internet, and no one is working properly. I tried to monitor this site: http://www.yapo.cl/chile/vehiculos?ca=15_s&l=0&cmn=&st=s for creating alerts for specific model but nothing works... really i tried wachete, versionista, webchangedetection, webwatcher, visual ping, follow that page, changedetection, distill, onewebchange, changetower,.etc.... only thewebwatcher.com work partially. |
also...trackly, watchthatpage, still trying |
A new one which looks intersting-- |
As per the above linked issue, I've created this as an example of the sort of public resource we could maintain with the larger community in this space: https://github.com/patcon/awesome-website-change-monitoring I've added the tools from this thread, and some others that were linked from the diffengine readme (incl newsdiffs, a mozilla project). I've also created a copy of @mhucka's web archiving spreadsheet, stubbing out more in-depth info, and linked it from the awesome-list repo. If we're down with this approach, thinking next steps could be:
Would this be a path that made sense to people? Again, the main perk would be that it's a simple and useful collaborative resource that could be a social hack to help us intersect more with communities in the same space :) |
I really dig the idea of this @patcon! My one thought is how does the awesome list not over-duplicate the spreadsheet? I have a link dump of papers on this issue (comparing web archivers) just point me to where they should be! (sorry, out of sync email checking :)) |
Metamorphosis Foundation in Macedonia has developed Time Machine: a website to track where a news article has originated and it was copied by other outlets. They also track changes on a given website. Website: http://timemachine.truthmeter.mk/ It is quite custom so I think it can be hard to reuse it, but I'm leaving it here for reference. |
@KrzysztofMadejski nice! EDIT: EDIT: Added to list (edgi-govdata-archiving/awesome-website-change-monitoring#2) and spreadsheet. |
@dcwalk The thinking was that it might be easier to ask maintainers to point their READMEs to an awesome list, but you're right that it is a bit wierd to have it in two places. Maybe the repo could be a thin README for the spreadsheet itself, and just point there, with a nice screenshot like @mhucka did in the research repo? The trade-off is that now we don't have a clear process to say to project maintainers "let's look after this resource together", because editing google spreadsheets is much more opaque than submitting pull requests. (Who made the change? Was it a new person who we should reach out to? Does the tool even fit?) I like the pretty of the google spreadsheet, but maybe it should just be a CSV in github repo. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions. |
I feel it would be a loss not to surface all these resources even if it's not something that we can actively maintain. Will throw into |
I agree, but we absolutely should not have an issue doing that work for us—it’s effectively invisible here. This should be a doc in the repo. |
In our 2017-03-11 Dev standup, the question was raised about what comparable projects are out there. We should compile a list and pay attention to their features/implementation specifics.
@ambergman mentioned Klaxon as a one
This could also be a great first-timer issue: we could collect those projects and document important details?
The text was updated successfully, but these errors were encountered: