Skip to content

A package that uses Selenium to check product availability for El Corte Ingles Webshop. Package can be edited to work for any webshop.

License

Notifications You must be signed in to change notification settings

Armandopdw/selenium-product-availability-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Selenium Product Availability Check

A package that uses Selenium to check product availability for El Corte Ingles Webshop. Package can be edited to work for any webshop.
Explore the docs »
· Report Bug · Request Feature

Table of Contents

About The Project

Product Name Screen Shot

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

This is an example of how to list things you need to use this package and how to install them.

  • wget
$ apt install wget
  • Google Chrome (Linux)
$ wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
$ dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

Installation

  1. Clone the repo
$ git clone https://github.com/Armandopdw/selenium-product-availability-check.git
  1. Install required Python packages
$ pip install .
  1. Enter configuration settings in config.py
# Navigate to chrome://version/ to see your Google Chrome Version
CHROME_VERSION = "85.0.4183.121"
# Possible options are "mac", "linux", "windows"
OS_NAME = "linux"
# After running main.py for the first time set to True
CHROMEDRIVER_DOWNLOADED = False
# Your email address
SENDER_EMAIL = "<[email protected]>"
# Email address of recepient
RECEIVER_EMAIL = "<[email protected]>"
# Product name that you are interested in
PRODUCT = "PS5"
# URL of Product (Currently only El Corte Ingles Canarias supported)
URL = "https://www.elcorteingles.es/canarias/videojuegos/A37046604/"
# Plain text for your email
PLAIN_TEXT = f"""\
Hi,
{PRODUCT} is finally back at El Corte Ingles!
You should go to: {URL}
Sent from automated Selenium Product Availability Check script.
"""
# Formatted HTML for your email
HTML = f"""\
<html>
<body>
    <p>Hi,<br>
    {PRODUCT} is finally back at El Corte Ingles<br>
    You should go to: <a href="{URL}">El Corte Ingles</a> <br>
    Sent from automated Selenium Product Availability Check script.
    </p>
</body>
</html>
"""
  1. Enter your email password in mail/pw/pw.txt
<Password>
  1. Run main.py
$ python3 main.py

Usage

After running main.py you will either receive an email that the product is available again, or nothing will happen. Complete package can be installed on a virtual machine (e.g. Compute Engine in Google Cloud Platform) for hourly check of availability. See below the step by step walkthrough to have this code run every hour on a Google Cloud Compute Engine:

Step 1: Create Compute Engine Instance

Standard settings will suffice For more info, please refer to the GCP Documentation

Step 2: Upload file to the compute engine instance

Upload by using the UI in the top right corner, or by uploading it to a GCP bucket and running the following command

gsutil cp gs://my-bucket/selenium-product-availability-check.zip .
apt install unzip
unzip selenium-product-availability-check.zip

Step 3: Unzip file & install packages

By default our Compute Engine will not have unzip installed, so it install this package and unzip the uploaded file

$ apt install unzip
$ unzip selenium-product-availability-check.zip
$ pip install .

Step 4: Install Chrome

Selenium requires Google Chrome to be installed, so run the following commands

$ apt install wget
$ wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
$ dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

Step 5: Set up Cron Job

If we want main.py to run hourly, you will need to set up a Cron Job. The following line will result in main.py to run every time the virtual machine is booted. Step 6 will explain how you can schedule the start and stop of your Virtual Machine. For more information check Documentation

crontab -e

Add the following line in the Crontab, this will make sure the script runs every hour on the first minute.

@reboot {user} python3 /home/{user-name}/main.py

Step 6: Set up Google Cloud Scheduler

To reduce the use of the virtual machine (and thereby the costs) you want to turn it off when it is not running the script. Therefore, you should use Google Cloud Scheduler. Follow the instructions in the following Documentation. The aforementioned instructions will start and stop an instance based on its labels. However, you want to specifically start and stop the relevant virtual machine. Therefore, you need to change the Node JS script slightly.

Obtain instance ID & instance zone

Go to your Compute Engine instance and note the Instance ID and zone.

Edit startInstancePubSub function

Edit the index.js of the startInstancePubSub function.

const Compute = require('@google-cloud/compute');
const compute = new Compute();

/**
 * Starts Compute Engine instances.
 *
 * Expects a PubSub message with JSON-formatted event data containing the
 * following attributes:
 *  zone - the GCP zone the instances are located in.
 *  id - the id of instances to start.
 *
 * @param {!object} event Cloud Function PubSub message event.
 * @param {!object} callback Cloud Function PubSub callback indicating
 *  completion.
 */
exports.startInstancePubSub = async (event, context, callback) => {
  try {
    const payload = _validatePayload(
      JSON.parse(Buffer.from(event.data, 'base64').toString())
    );
    const options = {filter: `id = ${payload.id}`};
    const [vms] = await compute.getVMs(options);
    await Promise.all(
      vms.map(async (instance) => {
        if (payload.zone === instance.zone.id) {
          const [operation] = await compute
            .zone(payload.zone)
            .vm(instance.name)
            .start();

          // Operation pending
          return operation.promise();
        }
      })
    );

    // Operation complete. Instance successfully started.
    const message = `Successfully started instance(s)`;
    console.log(message);
    callback(null, message);
  } catch (err) {
    console.log(err);
    callback(err);
  }
};

/**
 * Validates that a request payload contains the expected fields.
 *
 * @param {!object} payload the request payload to validate.
 * @return {!object} the payload object.
 */
const _validatePayload = (payload) => {
  if (!payload.zone) {
    throw new Error(`Attribute 'zone' missing from payload`);
  } else if (!payload.id) {
    throw new Error(`Attribute 'id' missing from payload`);
  }
  return payload;
};

Edit stopInstancePubSub function

Edit the index.js of the stopInstancePubSub function.

const Compute = require('@google-cloud/compute');
const compute = new Compute();

/**
 * Stops Compute Engine instances.
 *
 * Expects a PubSub message with JSON-formatted event data containing the
 * following attributes:
 *  zone - the GCP zone the instances are located in.
 *  id - the id of instances to stop.
 *
 * @param {!object} event Cloud Function PubSub message event.
 * @param {!object} callback Cloud Function PubSub callback indicating completion.
 */
exports.stopInstancePubSub = async (event, context, callback) => {
  try {
    const payload = _validatePayload(
      JSON.parse(Buffer.from(event.data, 'base64').toString())
    );
    const options = {filter: `id = ${payload.id}`};
    const [vms] = await compute.getVMs(options);
    await Promise.all(
      vms.map(async (instance) => {
        if (payload.zone === instance.zone.id) {
          const [operation] = await compute
            .zone(payload.zone)
            .vm(instance.name)
            .stop();

          // Operation pending
          return operation.promise();
        } else {
          return Promise.resolve();
        }
      })
    );

    // Operation complete. Instance successfully stopped.
    const message = `Successfully stopped instance(s)`;
    console.log(message);
    callback(null, message);
  } catch (err) {
    console.log(err);
    callback(err);
  }
};

/**
 * Validates that a request payload contains the expected fields.
 *
 * @param {!object} payload the request payload to validate.
 * @return {!object} the payload object.
 */
const _validatePayload = (payload) => {
  if (!payload.zone) {
    throw new Error(`Attribute 'zone' missing from payload`);
  } else if (!payload.id) {
    throw new Error(`Attribute 'id' missing from payload`);
  }
  return payload;
};

Testing Pub Sub --> Cloud Function

Pub Sub requires base64 encoded data when testing messages. Go to a base64-encoder and encode the following text

{"zone":"{ZONE}", "id":"{VIRTUAL MACHINE INSTANCE ID}"}

Using the base64 encoded string add the following message to your message for the Pub Sub topic

{"data":"{BASE64ENCODED"}

Add message to Pub Sub Topic

For the actual Cloud Scheduler you can use the following message:

{"zone":"{ZONE}", "id":"{VIRTUAL MACHINE INSTANCE ID}"}

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Armando Panman de Wit - [email protected]

Project Link: https://github.com/Armandopdw/selenium-product-availability-check

About

A package that uses Selenium to check product availability for El Corte Ingles Webshop. Package can be edited to work for any webshop.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages