Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New bio page #303

Open
GeorgeFive opened this issue Apr 12, 2023 · 21 comments
Open

New bio page #303

GeorgeFive opened this issue Apr 12, 2023 · 21 comments

Comments

@GeorgeFive
Copy link

GeorgeFive commented Apr 12, 2023

I've noticed that this has been going on for a few days now... maybe even a few weeks. Pieces of person data is just randomly not returning. Sometimes I get it, sometimes I don't. Refreshing will possibly give some pieces, possibly not. I don't see a rhyme or reason to it...?

I do not try to grab all data, but of the data I do grab....

Always works:
IMDb number
Name
Image

Sometimes works:
Birthday / born location
Died date / location / cause
Birth name
Nicknames

Test case: nm0001032
Name: Success!
Birth Name: Success!
Born: Failed!
Born Location: Failed!
Date Of Death: Success!
Location Of Death: Success!

Reload....

Name: Success!
Birth Name: Failed!
Born: Failed!
Born Location: Failed!
Date Of Death: Failed!
Location Of Death: Failed!

Reload....

Name: Success!
Birth Name: Success!
Born: Failed!
Born Location: Failed!
Date Of Death: Success!
Location Of Death: Success!

Reload....

Name: Success!
Birth Name: Success!
Born: Success!
Born Location: Success!
Date Of Death: Success!
Location Of Death: Success!

@duck7000
Copy link
Contributor

duck7000 commented Apr 12, 2023

Most likely imdb is started to update other parts/pages to the new ui style.
The last time they updated a few pages the behavior you describe lasted months (at least for me it did)

Or it is happening because the search function does not always return results. I noticed that if i search the same title shortly after each other repeatably the first or second part of the function is used. Apparently imdb blocks using the same function in a short time.

@GeorgeFive
Copy link
Author

They're definitely updating the bio pages. I had a little more time to look into the problem, and I've noticed that there's at least two versions of it...

Version 1 - the class properly grabs everything (old IMDb page)
Version 2 - the class grabs name and image properly, but chokes on everything else (that I'm trying to grab)
Version 3 (???) - I haven't seen this one live yet, but it's the only thing I can think of to explain the test cases where some data works (ie, birthname) but other data doesn't (birthday) in the same instance.

My regex skills are severely lacking, so I guess I'll leave my observations here and hope someone picks this up?

@GeorgeFive GeorgeFive changed the title Data Randomly Not Returning New bio page Apr 15, 2023
@Thomasdouscha
Copy link

Thomasdouscha commented Apr 15, 2023

Bio page start to get less and less data. In soon will be null i think. Seriously issue!

image

It was like this before,

image

@Thomasdouscha
Copy link

Now there is no issue. It is ok! It was old type of page. When it is revised to new type of page of imdb website. it works well enough.

@GeorgeFive
Copy link
Author

It's random which page you'll get. Next time you scan the page, it may not work again.

@duck7000
Copy link
Contributor

Jep same as the last time they changed imdb website..

@GeorgeFive
Copy link
Author

GeorgeFive commented Apr 16, 2023

I seem to remember there being some code in place last time to force the old version until the main code was updated. Anyone remember how to do that?

@GeorgeFive
Copy link
Author

I did a bit of a crash course in regex to get this going. I've fixed the following functions... they search for either the old bio page or the new one and will work with either. Works at the moment, subject to break whenever...

This can likely be done smarter, but hey, it works....

public function birthname()
{
    if (empty($this->birth_name)) {
        $this->getPage("Bio");
        if (preg_match("!Birth Name</td><td>(.*?)</td>\n!m", $this->page["Bio"], $match)) {
            $this->birth_name = trim($match[1]);
        } elseif (preg_match('|Birth name","htmlContent":"(.*?)"}|ims', $this->page["Bio"], $match)) {
            $this->birth_name = trim($match[1]);
        }
    }
    return $this->birth_name;
}


public function nickname()
{
    if (empty($this->nick_name)) {
        $this->getPage("Bio");
        if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
            $nicks = explode("<br>", $match[1]);
            foreach ($nicks as $nick) {
                $nick = trim($nick);
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
            $this->nick_name[] = trim($match[1]);
        } elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode(",", $match[1]);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        }
    }
    return $this->nick_name;
}


public function born()
{
    if (empty($this->birthday)) {
        if (preg_match('|Born</td>(.*)</td|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
            preg_match('|/search/name\?birth_year=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            $this->birthday = array(
              "day" => @$daymon[2],
              "month" => @$daymon[3],
              "mon" => @$daymon[1],
              "year" => @$dyear[1],
              "place" => @$dloc[1]
            );
        } elseif (preg_match('|Born</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name/\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
            preg_match('|/search/name/\?birth_year=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name/\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            $this->birthday = array(
              "day" => @$daymon[2],
              "month" => @$daymon[3],
              "mon" => @$daymon[1],
              "year" => @$dyear[1],
              "place" => @$dloc[1]
            );
        }

    }
    return $this->birthday;
}


public function died()
{
    if (empty($this->deathday)) {
        $page = $this->getPage("Bio");
        if (preg_match('|Died</td>(.*?)</td|ims', $page, $match)) {
            preg_match('|/search/name\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
            preg_match('|/search/name\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
            $this->deathday = array(
              "day" => @$daymonyear[3],
              "month" => @$daymonyear[4],
              "mon" => @$daymonyear[2],
              "year" => @$daymonyear[1],
              "place" => @trim(strip_tags($dloc[1])),
              "cause" => @$dcause[1]
            );
        } elseif (preg_match('|Died</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name/\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
            preg_match('|/search/name/\?death_date=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name/\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
            $this->deathday = array(
              "day" => @$daymonyear[3],
              "month" => @$daymonyear[4],
              "mon" => @$daymonyear[2],
              "year" => @$daymonyear[1],
              "place" => @trim(strip_tags($dloc[1])),
              "cause" => @$dcause[1]
            );
        }
    }
    return $this->deathday;
}

@GeorgeFive
Copy link
Author

GeorgeFive commented Apr 30, 2023

It's possible that a nickname may have quotes or a comma in it on IMDb. These would break the function. So....

public function nickname()
{
    if (empty($this->nick_name)) {
        $this->getPage("Bio");
        if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
            $nicks = explode("<br>", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                $nick = trim($nick);
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
            $match[1] = str_replace('\\"', "", $match[1]);
            $this->nick_name[] = trim($match[1]);
        } elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode("},{", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('/Nickname","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode("},{", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        }
    }
    return $this->nick_name;
}

@Thomasdouscha
Copy link

It does not work anymore. does not get data of person. Just name.

@GeorgeFive
Copy link
Author

I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.

@Thomasdouscha
Copy link

Thanks a lot!

İt does get info of age, birthname , date and place But it doesnot get height , spouses and biography.

@Thomasdouscha
Copy link

I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.

hello
firstly thanks about age birthday info. But it does not get height, spouses and biography. Can you help about it?

@Thomasdouscha
Copy link

@tboothman what about person class?

@Thomasdouscha
Copy link

Person Class is not urgent i think for you :(

@jcvignoli
Copy link

Hi @GeorgeFive! Any chance you also worked to update the bio() method?
You might have no good skills in regex, but they are better than mines, it seems!

@Thomasdouscha
Copy link

Hi @GeorgeFive i also need your asistance for the bio method spouse and height. Please if you have time ...

@Thomasdouscha
Copy link

@Thomasdouscha , @jcvignoli Take a look at my repo, i added back person and personSearch class https://github.com/duck7000/imdbphp6 I discussed it with GeorgeFive and agreed to add it back

Wonderful, tonight i am gonna check it !

@Thomasdouscha
Copy link

@duck7000 Unfortunately it does not work. Because of differences arrays.
$this->spouses[] = array(
'imdb' => $mid,
'name' => $name,
'from' => $from,
'to' => $to,
'comment' => $comment,
'children' => (int)$children
); this is what i have.
and yours,
$this->spouses[] = array(
'imdb' => $imdbId,
'name' => $name,
'from' => $fromDate,
'to' => $toDate,
'comment' => $comments
);
there is an issue of children. One parameter is missing

@duck7000
Copy link
Contributor

duck7000 commented Aug 5, 2023

I have combined comment and children because i think it all is a comment.
If you want it separated you can do this in your program or use the comment field and remove field children from your program.
This is a small issue that you can easily fix yourself.

You have to remember that my version is different/stripped down and i only added most (not all) methods from person class on request as i don't use it myself.

And please don't comment here on methods used in my version, start a new issue at my version. This way comments are mixed up and confusing to others.

@Thomasdouscha
Copy link

Thomasdouscha commented Aug 5, 2023

I already tried to fix as you said. But i had another new issues and gave up.
Yes you are right. İ wil make a comment next time in your page. And i know your version you coded for yourself. You support us many times for issues. I appreciated it Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants