Skip to content

Commit

Permalink
Updated entity URL fetcher algorithm to stop more consistently
Browse files Browse the repository at this point in the history
  • Loading branch information
dev-aravind committed Dec 19, 2024
1 parent ee0a203 commit 6d0b2ef
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions src/lib/entity_fetcher.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,14 @@ def self.fetch_entity_urls(page_url, entity_identifier, is_paginated, fetch_enti
href = entity["href"]
entity_urls << (href.start_with?('http') ? href : base_url + (href.start_with?('/') ? href : "/#{href}"))
end

break if entity_urls.length == number_of_entities || page_number.nil?

entity_urls = entity_urls.uniq
if entity_urls.length == number_of_entities || page_number.nil?
puts "All entity URLs have been successfully fetched. Total entities: #{entity_urls.length}."
break
end
page_number += offset
end
entity_urls.uniq
entity_urls
end

def self.fetch_entity_urls_headful(url, headers)
Expand All @@ -55,7 +57,7 @@ def self.fetch_entity_urls_headful(url, headers)
if retry_count < max_retries
retry
else
puts "Max retries reached. Unable to fetch the content for page #{page_number}."
puts "Max retries reached. Unable to fetch the content for page #{url}."
puts e.message
end
end
Expand Down

0 comments on commit 6d0b2ef

Please sign in to comment.