-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved search keyword encoding with support for exact phrase #80
Comments
Great suggestion for improved usability. |
- For Indeed and Monster, the query string was not properly encoded when a quoted phrase with spaces in-between words were provided. The fix was to encode all spaces with the proper character(+/-). This issue and fix also applied to city names. - For GlassDoorStatic, the query string was encoded for a URL and returned improper results. Since this class searches using a JSON payload, the solution was to combine the keywords with a space instead. -The old query construction function was moved from GlassDoorBase to GlassDoorDynamic to prevent the dynamic scraper class from breaking. Fixes issues PaulMcInnis#80.
* Fixes search issues due to bugs in search query encodings - For Indeed and Monster, the query string was not properly encoded when a quoted phrase with spaces in-between words were provided. The fix was to encode all spaces with the proper character(+/-). This issue and fix also applied to city names. - For GlassDoorStatic, the query string was encoded for a URL and returned improper results. Since this class searches using a JSON payload, the solution was to combine the keywords with a space instead. - The old query construction function was moved from GlassDoorBase to GlassDoorDynamic to prevent the dynamic scraper class from breaking. - Fixes issues #80. * Radius function cleanup * Cleaning and networking code adjustments - Removed unused requests imports - Changed URL strings that had http in them to https - Set provider header dictionary as the default headers on the provider's session object. Setting headers on the actual post/get method call is only necessary for temporarily overriding the session headers on an individual request. - Adjusted search_page_for_job_soups method for GlassDoorStatic class so that it uses GET instead of POST. Sending payload data when we already have the search page URL is unnecessary and can lead to bot detection measures activating more frequently. * Updated indeed test URL - Updated test URL to test for https instead of http * Fixes to asynchronous parsing code - Previously futures would be deleted whether they finished parsing or not. - Added code to delete the HTML page after it's parsed. - Added code to log any errors during blurb retrieval and parsing. * Version bump
Hi, @akifusenet, I just added a commit that should fix this issue. Could you pull the latest commit and let us know if it fixed the problem? |
Assigning to myself because I need to port this fix to new master |
I am thinking it might be wiser if we provide a search config parameter such as |
I think the simplest way would be to split the search url into two parts: The latter can be simplified and clarified by using the |
Issue Template
Description
For example on indeed when you want to search for an exact phrase (multiple words) as keyword you put this phrase between double quotes.
When I want to use this feature on funnel it removes the double quotes and it returns wrong results.
Steps to Reproduce
Expected behavior
Normally when you write this keywords on indeed website this is the URL that is generated:
https://www.indeed.com/jobs?q=%22data+distribution+service%22&l=Saratoga%2C+CA&radius=25
Actual behavior
But funnel generates this url:
getting indeed page 0 : http://www.indeed.com/jobs?q=Data Distribution Service&l=Saratoga%2C+CA&radius=25&limit=50&filter=0&start=0
Environment
*Windows 10 Home
The text was updated successfully, but these errors were encountered: