Skip to content

Welcome to the 2000s Movie Database, the dataset contains 2100 films released between 2000 and 2009. Data points include title, genre, year, language and country of production, content rating, duration, aspect ratio, director, cast, budget, box office, number of reviews (by critics and users) and IMDB score.

Notifications You must be signed in to change notification settings

cla-cif/movie-DB-2000s

Repository files navigation

Header

Welcome to the 2000s Movie Database, the dataset contains 2100 films released between 2000 and 2009. Data points include title, genre, year, language and country of production, content rating, duration, aspect ratio, director, cast, budget, box office, number of reviews (by critics and users) and IMDB score.

The Heroku-based command line interface (CLI) allows the user to browse the dataset and retrieve statistics, rankings and specific information. The instructions are extremely simply written and require only a minimum of interaction to achieve the desired result.

Here is the live version

Content

How it's done
How it works
Features
User Stories
Testing
Technologies
Deployment
Credits

How it's done

How I developed this project: my story told through the Design Thinking Process

This project is inspired to the five Stages of Design Thinking and its further development will stricly follow the same principles.

Define

I'm a film blogger. I want to write about what ingredients make a film successful, and I want to do this by analysing film-related data from the last few decades. I also want to be able to explore and query my database whenever I write an article. But I don't know how to get meaningful information from my database.

I turned this problem into an opportunity.
I tried to understand why this problem is important to the blogger by getting to the heart of the matter and developing a targeted solution from there, keeping this question in mind: "What solves the problem according to the blogger's needs and goals?" I shared the blogger's vision and hit their needs right where they needed to be addressed.
To do this, I researched film blogs and conducted interviews that led to the creation of a potential user persona.

Empathise

This program is coded thinking at the potential needs of film bloggers in their thirties/forties with intermediate to low IT skills who want to gain insights into their personalised film database and tell their followers about it.

Let us call our blogger Nastya.

  • She wants to set up her database on her own with information about films that is relevant to her.
  • The database should be hosted on a software Nastya is familiar with.
  • She wants to gain insight into this data through an "old-fashioned" and easy-to-use interface.
  • Nastya wants a developer to code a program to elaborate the data and turn it into the information she is looking for and can access through the interface.
  • She has followers to whom she needs to deliver content, so it is important that the developer doesn't break this chain of expectations. Therefore, she is keen to find a developer with whom she can establish a close working relationship.

Ideate

The brainstorming phase followed by research, challenges and discussions lead to the integration of the following tools to find a solution to the stated problem.

  • Python and its libraries are ideal for working with small dataframes.
  • The Heroku-based app provides an easy-to-use solution with an old-fashioned interface. (What is Heroku)
  • Google Sheets can be shared, are intuitive and easy to edit.
  • GitHub is the go to solution for software development.

Prototype

And that is how I arrived at the current prototype. It is a scaled-down version of the main idea with a demo of the potential features:
SEARCH the database by keywords and get pre-calculated DATA.

  • The functions have been developed with the user's goal in mind and designed so that the user does not get lost when using the programme.
  • The prompts are obvious, short and direct, but a HELP option is always available.

Other important points are:

  • Consistency of the displayed messages in colour and language.
  • Constant availability of information and functions.
  • Creation of a recursive architecture to avoid dead ends.
  • Handling of invalid inputs and the avoidance of unexpected behaviour.

My personal challenge was to put myself in the shoes of the user: What is clear, obvious and self-evident to the developer may not be to the user.

Test

I brought together people who matched the persona aka our film blogger Nastya, wich have different IT abilities.
I presented the prototype to them and asked them to comment and raise questions while using the app.
I listened to their comments, observed their reactions, took notes and showed my appreciation for their feedback.

I then used their feedback to go to-and-fro the Design Thinking stages.

The guiding principles

This app, created with Heroku, offers simple functions but is well structured to facilitate further development, expansion of the dataset and troubleshooting.

I believe that in development, work is better than rework. Adding features according to the client's and team's inputs is more efficient and time-wise than Removing programme features that the developer has spent time on but the client doesn't actually need.

My goal is to meet the clients' needs by taking their suggestions and frequently keeping in touch with them through chat, emails and video calls. This way I can make timely and frequent adjustments and fix problems as they arise. I believe in offering a tailor-made solution that adds value to the client.

My promise to the client is that I'll take care of all phases of development while striving for improvement. The client is on board with the developer through the plan > design > develop > test > release > review cicrle. I'm highly motivated to develop this project and want to assemble a team of goal-oriented, autonomous and empowered programmers.

Read more about the guiding principles of Agile Development

How it works

Welcome

  • The user is welcomed by a large title and a short message presenting the dataset and its main functionalities.
  • The app has two main features: display processed data and perform queries.
  • Side features are HELP and EXIT which can be invoked at any point by typing the desired functionality after any question (outside the search functionalities).
  • The first time the app is launched, the user is offered the choice to get HELP, EXIT the program or press the Enter key to continue (especially in case the user is already familiar with the app and wants skip the HELP section)
  • Throgh a series of questions, the user is lead to the desired output.
  • Each answer (input) from the user is verified. If the check fails, a message explaining the error is shown and the question is asked again. Error
  • After each result (output), the user is returned to the main question and can chose again how to explore the database (SEARCH/DATA). After output
  • The app won't terminate unless the user types exit, closes the window or refreshes the page.
  • The app is designed to avoid dead ends which will force the user to restart the app in order to continue. The user can always type a command.

Flowchart

A flowchart of the program's main process was created with Lucid.app. Flowchart

Data option

Data Option The option is available by typing data in response to this question Type SEARCH or DATA to explore the database: which will be asked after each output. Users are offered ten options with pre-calculated statistics and rankings to choose from.

  1. The average budget, score and duration of this films'decade.
  2. Number of films in each language.
  3. Number of films produced each year.
  4. The most prolific directors of the decade and their scores.
  5. Top 10 countries that produced films with the highest IMDB score.
  6. The 10 best films of the decade.
  7. The 10 worst films of the decade.
  8. The most profitable films in terms of return of investment.
  9. Top 10 box-office flops: the most unprofitable films.
  10. The content ratings and their average IMDB Score.

After the choice is validated and the output displayed, the user is returned to the main question and can chose again how to explore the database (SEARCH/DATA).

Search option

Search Option The option is available by typing search in response to this question Type SEARCH or DATA to explore the database: which will be asked after each output.

  • Users can browse the dataset searching by title, genre, actor and director and get info related to that entry.
  • Matching is also possible with partial text but limited to 10 results due to Heroku's terminal constraints (80 characters by 24 rows), so a targeted entry will yield accurate results.
  • Searching by title is the only query that returns all available information (genre, year, language and country of production, content rating, duration, aspect ratio, director, cast, budget, box office, number of reviews and IMDB score).
  • The other options, which are more likely to find multiple matches, display only the most relevant information (title, genre, director, cast and IMDB score) to improve readability given the aforementioned Heroku's terminal limitations mentioned above.
  • After the choice is validated and the output displayed, the user is returned to the main question and can chose again how to explore the database (SEARCH/DATA).

Exit option

Exit Option The exit option can be called by typing exit after each prompt (outside the search functions). The function prints the message Thank you! Goodbye! clear the screen and causes the program to quit after 3 seconds.
The exit function is not available in the search functions because it could be part of a name or title, therefore causing the app to quit and the result not to be shown. The app can be restarted by clicking Heroku's red button "RUN PROGRAM" above the terminal. Red button

Help option

Help Option The help option can be called by typing help after each prompt (outside the search functions). The function provides basic information about the dataframe and instructions about how to explore the program. The help function is not available in the search functions because it could be part of a name or title, therefore causing the app to quit and the result not to be shown. ('Help' is a movie from 2021 and 'The Help' is a movie from 2011). After the help text is displayed, the user is asked to press the Enter key to continue.

Features

All functions have a general purpose and can be applied to a similar dataset or, for this particular project, allow the current dataset to be extended with minimal further implementation.

Existing Features

  • The app is intuitive, the instructions are clear and simple, requiring minimal interaction from the user to achieve the result.
  • The text displayed on the black background of Heroku's CLI is legible and bright. The four colours (blue, yellow, red, white) are chosen consistently to differentiate instructions, messages, errors and outputs.
  • Input isn't case-sensitive, but output is consistently presented with the first letter capitalised.
  • The code is iterative so that users can perform multiple searches/actions without restarting the program.
  • From the Heroku app link, the program can be restarted any time by clicking the red "Run Program" button on the Heroku app page.
  • The app is not available for mobile and accessible from desktop only.

Future Features

Some potential features include:

  • Searches possible with two or more options at the same time (e.g.: search by genre AND actor, search by actor AND director).
  • A collection of films from the 90s and 10s to be added to the dataset.
  • Additional statistics and lists.
  • Deployment with Jupyter Lab to create meaningful istograms, distributions and charts.

Future features will be based on the users' requests and consequent necessities.

User Stories

The following user stories with their respective acceptance criteria and tasks are available on the Issues tab of this repository. The user stories were considered completed and subsequently closed.

Search actors and directors by name

Looking at our "persona" from the design thinking process, the following user story was the crucial point around which I created an efficient query. The acceptance criteria points have been addressed and documented in the following Fixed Issue section. The User Story is available here.

Clear and direct instructions

The instructions have been tailored looking at our "persona" with intermediate to low IT skills. All the tasks were accomplished and documented in the How it works and Features sections of this file. The User Story is available here.

Testing

I manually tested this project throughout the development process by doing the following:

  • I ran the code through the PEP8 linter.
  • Given invalid input and checked the logical and visual consistency of the error messages.
  • Entered substrings, extended ASCII characters, strings containing ' (apostrophe), lower and upper case letters.
  • Checked how many lines to display for better readability.
  • Tested colours and their consistnecy for better readability.

The user will test the program, just by using it and will be asked to provide feedback.

Issues

The program has so far proven to be free of arithmetic, syntax, resource, multi-threading and interfacing bugs. The program operates correctly and doesn't terminate abnormally. The following logical errors provided undesired output. While the output was consitent with the input, a much broader result was desired.

Fixed

  1. Matching is not possible with a partial string. e.g. the title must be complete, actor/director must searched by full name in order to display the desired result.

    • Solution:
      Implementation of a nested loop to work efficiently with a multi-dimensional data structure like this dataset. If the substring provided by the user was matched by iterating through the spreadsheet and its columns (this dataset is a list that contains other lists), boolean variable returns true and the output displayed.
  2. Extended ASCII characters (character code 128-255) present in some names couldn't be matched providing printable ASCII characters (character code 32-127).

    • Solution:
      In each search function (title, director, actor, genres) I created a copy of the dataframe and applied the normalize encode decode methods to the Series (Columns) I wanted to parse. I applied the unicodedata normalize to the user's input.
    • Explaination:
      In this way, strings with diacritics (extended ASCII) can be matched by typing the closest latin letter (printable ASCII). Normalization method decomposes a letter with diacritic into its equivalent in latin characters and its diacritic symbol. Additionally, similar names with different diacritcs such (Zoe/Zoë/Zoé) and (Chloe/Chloë/Chlöe/Chloé) will be matched in all of their forms. e.g., Input: "Zoe" Outup: "Zoe Saldaña", "Zoë Kravitz". Example of the output
  3. Entries with ' (apostrophe) are not matched by the queries.

    • Definition of the problem:
      Apostrophes are found in movie's titles as contractions or possessives and in names, as part of the name or quoting a nickname.
    • Context of the problem:
      In an attempt to match the user's input (wheter lowcase or uppercase) with the dataset entries' case, the .title() method was applied the user's input. Sample of dataset entries
    • Reason of the problem:
      Apostrophes act as word boundaries, this became evident applying the .title() method to the user's input.
      Example - which doesn't match (Ripley's) as present in the dataset:
       input = "ripley's"
       input = input.title() #Ripley'S
    
    • Attempt to solve the problem:
      I Harnessed the .title() method behaviour by passing the user's input as argument of a function that used a regex, as suggested here.
      • It failed because: It worked for the abovementioned example but not for names containing quoted nicknames like (Joanna 'JoJo' Levesque) and names like (Mo'Nique) or (DJ Pooh).
    • Solution:
      I applied the .lower() method to the user's input and to the copy of the dataframe in order for the query to make an exact comparison. The .lower() method also proved to be useful to match movie titles such as (Mission: Impossible II) or (Jurassic Park III) which otherwise would have escaped the query with the .title() method.
  4. Input ? in the search function resulted error and app crash.

    • Solution:
      The follwing message was shown Error : nothing to repeat at position 0 and no further action were possible through the app's CLI. The issue was fixed setting regex=False to the .contains() method. In this way, the input is considered as a literal string. Documentation is available here.
By fixing the above issue I've learnt more about:

Unicode normalization. The .lower() and .title() methods, the Regular Expressions, the lambda functions, the nature and behaviour of Python's Panda objects and practiced debugging by printing intermediate results.

Remaining

The terminal constraints don't allow to display large results and graphs. When the project will be subjected to further developments, a different deployment system may be taken into consideration.

Validator

Technologies used

The project is coded with Python and relies on pandas 1.4.2: to analyse data.

Languages used

Frameworks, Libraries & Programs Used

Deployment

The project is coded and hosted on GitHub and deployed with Heroku.

Creating the Heroku app

The steps needed to deploy this projects are as follows:

  1. Create a requirement.txt file in GitHub, for Heroku to read, listing the dependancies the program needs in order to run.
  2. push the recent changes to GitHub and go to your Heroku account page to create and deploy the app running the project.
  3. Chose "CREATE NEW APP", give it a unique name, and select a geographical region.
  4. From the Settings tab, configure the environment variables (config var section).
  5. Copy/paste the CREDS.json file, if the project has credentials, in the VALUE field, type CREDS in the corresponding KEY box, click the "ADD" button.
  6. Create another config var, set PORT as KEY and assign it the VALUE 8000.
  7. Add two buildpacks from the Settings tab. The ordering is as follows: heroku/python heroku/nodejs
  8. From the Deployment tab, chose GitHub as deployment method, connect to GitHub and select the project's repository.
  9. Click to "Enable Automatic Deploys " or chose to "Deploy Branch" from the Manual Deploy section.
  10. Wait for the logs to run while the dependencies are installed and the app is being built.
  11. The mock terminal is then ready and accessible from a link similar to https://your-projects-name.herokuapp.com/

Update APR 16, 2022

Extract from Heroku Incident 2413:
Based on Salesforce’s initial investigation, it appears that unauthorized access to Heroku's GitHub account was the result of a compromised OAuth token. Salesforce immediately disabled the compromised user’s OAuth tokens and disabled the compromised user’s GitHub account. Additionally, GitHub reported that the threat actor was enumerating GitHub customer accounts using OAuth tokens issued to Heroku’s OAuth integration dashboard hosted on GitHub. Since this issue arose and until furter notice or in case automatic deployments are not available for whatever reason, the steps to deploy the Heroku app are as follows:
Visual example of the following instructions can be found here.
Deploying your app to heroku:

  1. Login to heroku and enter your details. From GitPod bash, enter: command: heroku login -i
  2. Get your app name from heroku. command: heroku apps
  3. Set the heroku remote. (Replace <app_name> with your actual app name) command: heroku git:remote -a <app_name>
  4. Add, commit and push to github command: git add . && git commit -m "Deploy to Heroku via CLI"
  5. Push to both github and heroku
command: git push origin main
command: git push heroku main

In case the app needs API Keys, these additional steps have to be considered: MFA/2FA enabled?

  1. Click on Account Settings (under the avatar menu)
  2. Scroll down to the API Key section and click Reveal. Copy the key.
  3. Enter the command: heroku_config , and enter your api key you copied when prompted
  4. Complete the steps above, if you see an input box at the top middle of the editor... a. enter your heroku username b. enter the api key you just copied

Note: Thanks to Code Institute for providing the abovementioned Heroku app deployment steps.

Forking the Repository

By forking this GitHub Repository you make a copy of the original repository on our GitHub account to view and/or make changes without affecting the original repository. The steps to fork the repository are as follows:

  1. Locate this GitHub Repository of this project and log into your GitHub account.
  2. Click on the "Fork" button, on the top right of the page, just above the Settings.
  3. Decide where to fork the repository (your account for instance)
  4. You now have a copy of the original repository in your GitHub account.

Making a local clone

Cloning a repository pulls down a full copy of all the repository data that GitHub.com has at that point in time, including all versions of every file and folder for the project. The steps to clone a repository are as follows:

  1. Locate this GitHub Repository of this project and log into your GitHub account.
  2. Click on the "Code" button, on the top right of the page, next to the green "Gitpod" button.
  3. Chose one of the available options: Clone with HTTPS, Open with Git Hub desktop, Download ZIP.
  4. To clone the repository using HTTPS, under "Clone with HTTPS", copy the link.
  5. Open Git Bash. How to download and install.
  6. Chose the location where you want the repository to be created.
  7. Type:
    $ git clone https://github.com/cla-cif/movie-DB-2000s.git
    
  8. Press Enter, the following lines will appear and your repository is now created.
    Cloning into 'movie-DB-2000s'...
    remote: Enumerating objects: 257, done.
    remote: Counting objects: 100% (257/257), done.
    remote: Compressing objects: 100% (182/182), done.
    remote: Total 257 (delta 157), reused 158 (delta 72), pack-reused 0Receiving obj
    Receiving objects:  81% (209/257)
    Receiving objects: 100% (257/257), 54.76 KiB | 549.00 KiB/s, done.
    Resolving deltas: 100% (157/157), done.
    
  9. Click here for a more detailed explaination.

Credits

  • All content written by developer Claudia Cifaldi - cla-cif on GitHub.
  • The template used for this project belongs to CodeInstitute - GitHub and website.
  • The dataset is part of Kaggle's "The Movies Dataset" under CC0: Public Domain Licence.

Here is the live version

Link to top

About

Welcome to the 2000s Movie Database, the dataset contains 2100 films released between 2000 and 2009. Data points include title, genre, year, language and country of production, content rating, duration, aspect ratio, director, cast, budget, box office, number of reviews (by critics and users) and IMDB score.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published