-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix search #14
Fix search #14
Conversation
@@ -108,12 +108,19 @@ def get_connector(config): | |||
|
|||
async def RequestUrl(config, init): | |||
logme.debug(__name__ + ':RequestUrl') | |||
csrf_token = random.randbytes(16).hex() # Looks like any random string works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, randbytes is Python 3.9+ (late 2020). No idea if you want to support older versions, in that case it can be made to work, I guess hardcoding a fixed string might even be feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd like to run the code on Python 3.6. Seems as if there would be enough options available though ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely feasible.
It's just that Python 3.6 is already 7 years old, and already EOL. 3.7 is going to be EOL'd in 6 weeks according to https://devguide.python.org/versions/. Debian Bullseye (current stable) ships 3.9, and I usually find that to be a reasonable reference to set the cutoff point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinqLover Python 3.6 has already reached end of support, and Python 3.7 reaches end of support on 2023-06-27 (1 month 17 days away). TWINT should not care about Python versions that have reached end of support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point 👍
7540876
to
9f52444
Compare
Tests are now capable of passing on this branch. The first two commits (including #8) take care of fixing bugs that already prevented tests from working, independently of Twitter's latest changes. |
The search endpoint asks for it.
That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address? |
No idea yet, but we run a twint job every 12 hours on github actions (https://github.com/catgirl-v/cubari/actions), so we'll find out soon enough.
It's working so far. |
Is working for you guys? In my case this error is popping up, any advice? "ConnectionError: Access forbidden, try passing --auth-token." |
Yes, it's working. I'm gonna need more details to help you. Did you in fact pass a valid authentication cookie as per the op? If so, please post minimum example that reproduces the problem. |
Do I need to pass a valid authentication cookie, how so? I just use the changes in this pr and try to execute my previous code the that error message popped up. How can I do what you recommed? |
Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op. |
Brilliant solution, works just fine. Thanks. |
My bad, I though that csrf_token = random.randbytes(16).hex() was it but I need to replace it with my auth token witch I get from Firefox browser, right? because I did make the change and I'm still having the same error ("ConnectionError: Access forbidden, try passing --auth-token."). Maybe am I doing something wrong? Some help would be nice please :) |
No, you don't have to modify the code. Pass the token with the CSRF is unrelated, it's just that both changes were required to actually get it to work. |
I have my code implemented in AWS Lambda with twint's library as a layer. I update the lib and set the env variable as mentioned but I still having the same error. Locally, I'm getting the same result, if you could I would love to have some help :) [CRITICAL] 2023-05-12T20:53:44.334Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data' |
Thank you for the fix @9ary, works great! 😃 Tiny request, is it possible to add a wait time to prevent rate limits? Looks like ap.add_argument("--min-wait-time", type=float, default=15,
help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints") |
For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint. @leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread. |
Makes sense, thanks! |
I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches. module 'random' has no attribute 'randbytes' |
You have to use python 3.9 or above. It's mentioned in some of the early comments |
Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function? i'm very new at this. How do i pass the auth token? |
Login to twitter on Firefox -> developer tools -> storage -> Auth token Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN It should then run. Good luck |
Thanks it's working now. |
Hey @batmanscode Would you mind testing my code and tell me if you are getting the same error message? I'm trying to run it in jupyternotebook and then in AWS Lambda. `import twint os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() I'm getting this error: Hope you can give a hand! Thanks in advanced |
I can only find Authentication tokens, and they're found in the developer portal, I didn't see any 'developer tools' or 'storage' on Firefox. Which of them Should I use? |
@JoelBird hopefully this is detailed enough:
|
Hi @9ary, thanks for the fix. But for now, using the command line, only the -u parameter works, the search parameter -s isn't work. Any idea why it isn't. I'm trying to debug it here. I'm getting CRITICAL:root:twint.run:Twint:Feed:noData'data' with |
I'm having issues of Rate Limit exceeded? How do i fix this? what should i keep looping to override this? |
I'm not sure, sorry. I'm having the same issue :( |
For those that this is working for, would someone be able run through
Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21 |
It's working fine on my end. Output: |
Thanks @corpuzdonn, maybe it's a token issue from my end I attempted a huge scrape (4 weeks via search terms) and that got rate limited. Maybe that token wasn't valid after that Have you tried long scrapes? I saw there's a time out parameter but even setting that very high didn't work for me |
Would you mind packling up your twint library and share it to us, please! I might be doing something wrong because I have just tried it and I got the same result : CRITICAL:root:twint.run:Twint:Feed:noData'data' I don't think that this message is related to the auth token, it has to be something else... |
I am actually getting the following below all of a sudden. Did something change? CRITICAL:root:twint.get:User:Expecting value: line 1 column 1 (char 0) |
Most likely the latest Twitter changes require more API calls to be authenticated. Our scripts broke too but I'm currently on vacation. I'll have a look in a few days. |
Has there been any updates. Idk if there was but my output has become: CRITICAL:root:twint.run:Twint:Feed:noData'globalObjects' |
The search endpoint returns 404, it looks like they've finally killed it off. This means twint will need to be reworked to use the graphql API, which is a lot more work than I'm willing to put in personally. |
I see. It's ok. Will find alternative solutions. Thanks for your hard work! |
Search now requires being logged in + a CSRF token.
This PR adds a CLI flag to provide an authentication cookie (must be obtained by logging in with a browser, in Firefox the cookie can be found in the developer toolbox under the storage tab).
It looks like a randomly generated CSRF token works, so no complicated mechanism is required to obtain one.
Fixes #11.
Fixes #13.