-
Notifications
You must be signed in to change notification settings - Fork 4
Class Documentation
Main classes inherited by all crawler classes.
Constructor for the class.
Arguments -
-
name
: Name of crawler. -
start_url
: Base URL of the website.
Base class : BaseCrawler
Crawler for websites Hindi Lyrics, Smriti and Lyrics Masti.
Constructor for the class.
Arguments -
-
name
: Name of crawler. -
start_url
: Base URL of the website. -
list_of_url
: List of URL(s) to start with. -
number_of_threads
: Number of threads to use to crawl.
Worker methods. Gets task from task_queue
and does corresponding job.
Arguments -
-
thread_id
: Assigned ID of thread.
Method called from derived classes to start the crawling process.
Method called from threader if task type is to get songs from movie. Gets all songs from a movie and saves all song from it in the database.
Arguments -
-
thread_id
: Assigned ID of thread. -
url
: URL for the movie. -
movie
: Name of movie.
Method called from threader if task type is to get movies from a page. Get movies from a webpage.
Arguments -
-
thread_id
: Assigned ID of thread. -
url
: URL of page.
User overrides this method to get list of movies with URL in following format -
[
('link1', 'movie1'),
('link2', 'movie2'),
]
Arguments -
-
raw_html
: Raw HTML code of the page.
User overrides this method to get list of songs with URL from a movie page in following format -
[
('link1', 'song1'),
('link2', 'song2'),
]
Arguments -
-
raw_html
: Raw HTML of the page.
User overrides this method to get details for a song from raw html in followinf format -
(
'lyrics',
[
'singer1',
'singer2',
],
[
'director1',
'director2',
],
[
'lyricist1',
'lyricist2',
]
)
Arguments -
-
raw_html
: Raw HTML of song page.
Base Class : BaseCrawler
Crawer for Az Lyrics.
Constructor of the class.
Arguments -
Same as that for CrawlerType0
.
Same as that for CrawlerType0
.
Same as that for CrawlerType0
.
Method called from threader if task type is to get artists from a page. Gets artists from a page and puts each of them back in the task_queue
.
Arguments -
As usual.
Method called from threader if task type is to get songs for an artist. Gets all songs for an artist and their details, storing them in database.
Arguments -
As usual.
User overrides this method to get artists with URLs in following format -
[
('link1', 'artist1'),
('link2', 'artist2'),
]
Arguments -
As usual.
User overrides this method to get albums for an artist with all songs in it from artist's page in following format -
[
(
'album1',
[
('url1', 'song1'),
('url2', 'song2')
]
),
(
'album2',
[
('url3', 'song3'),
('url4', 'song4')
]
)
]
Arguemts -
As usual.
User overrides this method to get lyrics for the from raw HTML of the song.
Arguments -
As usual.
Base class : BaseCrawler
Crawler for website Metro Lyrics.
Constructor for the class.
Arguments -
As usual.
Same as that for CrawlerType0
.
Same as that for CrawlerType0
.
Same as that for CrawlerType1
.
Method called by threader when task type is to get artist songs. Gets all the songs for an artist and put each of them in the task_queue
.
Arguments -
As usual.
Method called by threader when task type is to get songs from an artist page.
Arguments -
As usual.
Method called by threader to get song details.
Arguments -
As usual.
User overrides this metho to get song details from raw HTML in following format -
(
'album',
'lyrics',
[
'lyricist1',
'lyricist2'
],
[
'other_artist1',
'other_artist2',
]
)
Arguments -
As usual.
User overrides this method to get artists with URLs from a page in following format -
[
('url1', 'artist1'),
('url2', 'artist2')
]
Argumets -
As usual.
User overrides this method to get all pages that contains songs by an artist in following format -
[
'url1',
'url2'
]
Arguments -
As usual.
User overrides this method to get list of songs with URLs in following format -
[
('url1', 'song1'),
('url2', 'song2'),
]
Arguments -
As usual.