This is our submission for the event Brain Dead, an event of Revelation'23. Revelation'23 is the offcial technical fest of the Department of Computer Science and Technology, IIEST Shibpur.
Problem Statement 1 : Analyze Placement Data
Problem Statement 2 : Detecting Emotional Sentiment in Cartoons
Please download the notebooks to view its contents as there is an issue with viewing Colab Notebooks on the GitHub website. But downloading works fine.
Challenge Description:
In this challenge, you are supposed to analyze the placement records of the students of a MBA college.
The dataset includes secondary and higher secondary school percentages and specializations. It also contains degree specialization, work experience, and the salary offered to the students. Your main task is to analyze the factors that affect the placement and salary of students.
Analyze the dataset and derive meaningful insights from the data.
Here are some examples that you might consider for your analysis:
- What are the factors affecting the placement of a student?
- Which degree specializations are in high demand in the industry?
- Does mba percentage matter in placement?
Present your analysis in a report. You can use charts, tables, and graphs to elaborate your analysis. You are not required to create any prediction model for this problem statement.
Dataset Link:
Dataset Description:
The dataset has 215 rows and 15 columns. Each row represents a student and his/her/their corresponding data.
The columns and their description:
👉 sl_no: Serial Number
👉 gender: Gender- Male='M',Female='F'
👉 ssc_p: Secondary Education percentage- 10th Grade
👉 ssc_b: Board of Education- Central/ Others
👉 hsc_p: Higher Secondary Education percentage- 12th Grade
👉 hsc_b: Board of Education- Central/ Others
👉 hsc_s: Specialization in Higher Secondary Education
👉 degree_p: Degree Percentage
👉 degree_t: Under-Graduation(Degree type)- Field of degree education
👉 workex: Work Experience
👉 etest_p: Employability test percentage ( conducted by the college)
👉 specialisation: Post Graduation(MBA)- Specialization
👉 mba_p: MBA percentage
👉 status: Status of placement- Placed/Not placed
👉 salary: Salary offered by corporate to candidates
If after downloading the dataset, there are only 10 columns, then select the drop-down marked with red in the figure below, and apply "Select All".
Tools used for analysis:
You are free to use any tool of your choice. But preferred tools include:
- MS Excel
- Tableau/ PowerBI, etc
- Jupyter Notebook/ Google Colab, Matplotlib.
Deliverables:
A complete report of the methodology employed in your work. The report should be concise. This report might include references, tables, figures, and results. The file format should be either a ppt or a pdf. If you are using tools like Excel, also mention the excel formulas that you used for the analysis.
Marking criteria:
Some of the metrics our team will use for evaluating your reports include:
- Writing Style, references, figures, etc.
- Dataset exploration
- Methods
- Results of analysis
- Discussion
Challenge Description:
Social media platforms are widely used by individuals and organizations to express emotions, opinions, and ideas. These platforms generate vast amounts of data, which can be analyzed to gain insights into user behavior, preferences, and sentiment. Accurately classifying the sentiment of social media posts can provide valuable insights for businesses, individuals, and organizations to make informed decisions.
To accomplish this task, a customized private cartoon dataset (original images) of social media posts has been provided, which contains labels for each post's emotion category, such as happy, angry, sad, or neutral.
The task is to build and fine-tune a machine-learning model that accurately classifies social media posts into their corresponding emotion categories, using synthetic images.
To achieve this, the following steps are required:
⭐ Generate synthetic images using any image generation techniques (e.g., GAN, Diffusion Models, Autoencoder Decoder) to augment the dataset and increase its size.
⭐ So for example, we use the images in the category of "happy" to synthetically generate similar images. Repeat the same for each category.
⭐ Use the original and synthetic images to build a machine-learning model that accurately classifies social media posts into their corresponding emotion categories.
⭐ Evaluate the performance of the model using appropriate metrics such as accuracy, precision, recall, and F1-score.
⭐ Compare the performance of the model when trained on the original dataset only, the synthetic dataset only, and the combination of both.
⭐ Analyze the results to determine the effectiveness of using synthetic images for improving classification accuracy.
The dataset consists of a diverse range of cropped cartoon face images. The data has been pre-processed and cleaned, but you can apply additional data cleaning or pre-processing techniques if necessary. You can use any machine learning or deep learning algorithm or technique of your choice to build and finetune your model, as long as it can accurately classify the posts into their corresponding emotion categories.
Based on a previous study, The performance accuracy of the best classification algorithms for emotion detection is 0.906. Your goal is to beat this using your models, but your model should not be overfitting or underfitting.
To evaluate the performance of your model, you will be using standard evaluation metrics such as accuracy, precision, recall, F1 score, and confusion matrix. The submission with the highest evaluation score will be declared the winner. The top submissions will also be invited to present their solutions and insights to the community.
Dataset Link: