The team project consists of two modules. Each module requires participants to apply the skills they have learned to date, and explore a dataset of their choosing. The first part of the team project involves creating a simple program with a database in order to analyze a dataset from an open source, such as Kaggle. In the second part of the team project, teams will come together again and apply the skills developed in each of the data science or machine learning foundations certificate streams. Teams will either create a data visualization or a machine learning model.
Participants will work in assigned teams of 4-5.
By the end of Team Project Module 1, participants will be able to:
- Resolve merge conflicts
- Describe common problems or challenges a team encounters when working collaboratively using Git and GitHub
- Create a program to analyze a dataset with contributions from multiple team members
By the end of Team Project Module 2, participants will be able to:
- Create a data visualization as a team
- Create a machine learning model as a team
Questions can be submitted to the #cohort-3-help channel on Slack
-
Technical Facilitator:
- Phil Van-Lane(he/him) [email protected]
-
Learning Support Staff:
- Taneea Agrawaal (she/her) [email protected]
- Farzaneh Hashemi (she/her ) [email protected]
- Tong Su (she/her) [email protected]
Each Team Project module will include two live learning sessions and one case study presentation. During live learning sessions, facilitators will introduce the project, walk through relevant examples, and introduce various team skills that support project success. The remaining time will be used for teams to assemble and work on their projects, as well as get help from the facilitator or the learning support to troubleshoot any issues a team may be encountering.
Work periods will also be used as opportunities for teams to collaborate and work together, while accessing learning support.
Day 1 | Day 2 | Day 3 | Day 4 | Day 5 |
---|---|---|---|---|
Live Learning Session | Live Learning Session | Case Study | Work Period | Work Period |
- Participants are expected to attend live learning sessions and the case study as part of the learning experience. Participants are encouraged to use the scheduled work period time to complete their projects.
- Participants are encouraged to ask questions and collaborate with others to enhance learning.
- Participants must have a computer and an internet connection to participate in online activities.
- Participants must not use generative AI such as ChatGPT to generate code to complete assignments. It should be used as a supportive tool to seek out answers to questions you may have.
- We expect participants to have completed the onboarding repo.
- We encourage participants to default to having their camera on at all times, and turning the camera off only as needed. This will greatly enhance the learning experience for all participants and provides real-time feedback for the instructional team.
|-- data
|---- processed
|---- raw
|---- sql
|-- reports
|-- src
|-- README.md
|-- .gitignore
|-- data
|---- processed
|---- raw
|---- sql
|-- experiments
|-- models
|-- reports
|-- src
|-- README.md
|-- .gitignore
- Data: Contains the raw, processed and final data. For any data living in a database, make sure to export the tables out into the
sql
folder, so it can be used by anyone else. - Experiments: A folder for experiments
- Models: A folder containing trained models or model predictions
- Reports: Generated HTML, PDF etc. of your report
- src: Project source code
- README: This file!
- .gitignore: Files to exclude from this folder, specified by the Technical Facilitator
- Angel
- Alison Wu
- Ernani Fantinatti
- Fredy Rincón
- James Li
This diagram provides a detailed overview of an E-commerce Customer Behavior Dataset project workflow. Here’s a step-by-step explanation of each component:
- Team Members:
- The project team consists of five members:
- Angel Yang
- Alison Wu
- Ernani Fantinatti
- Fredy Rincón
- James Li
- Data Source:
- The dataset, "E-commerce Customer Behavior - Sheet1.csv," is sourced from Kaggle, a well-known platform for data science competitions and datasets.
- Local Database:
- The dataset is ingested into a local SQLite database. The main table created from the CSV has 11 columns and 350 rows.
- The DB Browser tool is used to visualize and manage the SQLite database.
- Database Schema:
-
The database contains several tables, such as:
- E_Comm_Customer_Behavior
-
Manually added tables to improve our dataset:
- Generations
- ecommerce_sales
- income_by_city
- kaggle_income
- Version Control and Collaboration:
- GitHub is used for version control and team collaboration.
- The team's interactions and code contributions are managed through individual branches for each team member (AYang, AlisonWu, EFantinatti, FredyRincon, JamesLi).
- These individual branches are merged into the main project branch
team-project-1
.
- Final Delivery:
- The
team-project-1
branch is eventually merged into themain
branch for the final delivery of the project 1
- The
Details of each component:
-
Customer
- Attributes:
- CustomerID (Primary Key)
- Name
- Phone
- Address
- Description:
- Represents the customers using the system.
- Attributes:
-
Order
- Attributes:
- OrderID (Primary Key)
- OrderDate
- TotalAmount
- CustomerID (Foreign Key)
- Description:
- Represents the orders placed by customers. Each order is linked to a specific customer through CustomerID.
- Attributes:
-
Product
- Attributes:
- ProductID (Primary Key)
- Name
- Description
- Price
- Stock
- Description:
- Represents the products available in the system.
- Attributes:
-
OrderDetail
- Attributes:
- OrderDetailID (Primary Key)
- OrderID (Foreign Key)
- ProductID (Foreign Key)
- Quantity
- Price
- Description:
- Represents the details of each product within an order. Links to both Order and Product entities.
- Attributes:
-
Customer to Order
- Type: One-to-Many
- Description: One customer can place multiple orders. This relationship is represented by CustomerID being a foreign key in the Order entity.
-
Order to OrderDetail
- Type: One-to-Many
- Description: One order can have multiple order details. This relationship is represented by OrderID being a foreign key in the OrderDetail entity.
-
Product to OrderDetail
- Type: One-to-Many
- Description: One product can appear in multiple order details. This relationship is represented by ProductID being a foreign key in the OrderDetail entity.
-
Customer Entity:
- Contains attributes related to customer information such as CustomerID, Name, Email, Phone, and Address.
- Is related to the Order entity, indicating that customers can place orders.
-
Order Entity:
- Contains attributes like OrderID, OrderDate, TotalAmount, and a foreign key CustomerID.
- Is related to the OrderDetail entity, showing that an order consists of multiple order details.
-
Product Entity:
- Contains attributes such as ProductID, Name, Description, Price, and Stock.
- Is related to the OrderDetail entity, indicating that products can be part of multiple order details.
-
OrderDetail Entity:
- Contains attributes like OrderDetailID, OrderID, ProductID, Quantity, and Price.
- Links the Order and Product entities, showing which products are included in which orders.
- Be open and transparent in your communication to ensure everyone shares information.
- Acknowledged - Angel
- Acknowledged - Alison Wu
- Acknowledged - Ernani Fantinatti
- Acknowledged - Fredy Rincón
- Acknowledged - James Li
- Clearly define each team member's role to avoid confusion and ensure everyone is accountable.
- Acknowledged - Angel
- Acknowledged - Alison Wu
- Acknowledged - Ernani Fantinatti
- Acknowledged - Fredy Rincón
- Acknowledged - James Li
- Encourage all team members to participate and respect different perspectives.
- Acknowledged - Angel
- Acknowledged - Alison Wu
- Acknowledged - Ernani Fantinatti
- Acknowledged - Fredy Rincón
- Acknowledged - James Li
- Address disagreements promptly and positively manage them.
- Acknowledged - Angel
- Acknowledged - Alison Wu
- Acknowledged - Ernani Fantinatti
- Acknowledged - Fredy Rincón
- Acknowledged - James Li
- Prioritize essential issues, stay focused, and make good use of time during meetings and collaborations.
- Acknowledged - Angel
- Acknowledged - Alison Wu
- Acknowledged - Ernani Fantinatti
- Acknowledged - Fredy Rincón
- Acknowledged - James Li
- What is the primary focus within the dataset?
- This dataset gives a detailed look at customer behavior on an e-commerce platform. Each record represents a unique customer, showing their interactions and transactions. The information helps analyze customer preferences, engagement, and satisfaction. Businesses can use this data to make informed decisions to improve the customer experience.
- This dataset gives a detailed look at customer behavior on an e-commerce platform. Each record represents a unique customer, showing their interactions and transactions. The information helps analyze customer preferences, engagement, and satisfaction. Businesses can use this data to make informed decisions to improve the customer experience.
- What are potential relationships in the data that you could explore?
- 1- Analysis on Cities and Average income.
- 2- Purchase habits per Average Rating
- 3-
Membership Type
againstItems Purchased
. (See Membership Level Purchase Analysis for details)
- 1- Analysis on Cities and Average income.
- What are key questions your project could answer?
- How sensitive is each gender to customer satisfaction in relation to discounts being applied while purchasing?
- Are age and gender variables statistically significant predictor of a high value customer?
- Do customers with a Gold membership buy more items than those with Silver or Bronze memberships?
- Are customers who receive discounts more satisfied than those who do not?
- Do customers with a Gold membership buy more items than those with Silver or Bronze memberships? (See Membership Level Purchase Analysis for details)
- How sensitive is each gender to customer satisfaction in relation to discounts being applied while purchasing?
- What are the key variables and attributes in your dataset?
- Alison Wu:
Age, Gender and Transaction details - Angel Yang:
: Gender, Total Spend - Ernani Fantinatti: Age, Membership Type, Items Purchased, Average Rating, Generation, Satisfaction Level.
- Fredy Rincón: Membership Type, Items Purchased and Total Spend
- James Li: Gender, Discount Applied and Satisfaction Level
- Alison Wu:
- How can we explore the relationships between different variables?
- Alison Wu:
: Relationships can be explored through correlation analysis or regression modeling - Angel Yang:
: we can use python to create correlation plot, or run codes utilizing pandas data package - Ernani Fantinatti: Yes, specially between Age, Membership Type and Satisfaction Level.
- Fredy Rincón: We can use visualizations like boxplots and histograms to compare distributions, and statistical tests like ANOVA to identify significant differences. Regression models help quantify relationships between variables.
- James Li: Using chi-square method
- Alison Wu:
- Are there any patterns or trends in the data that we can identify?
- Alison Wu:
: Patterns can include purchasing trends across different demographics - Angel Yang:
: overall, male customers spent more than female customer. - Ernani Fantinatti: Yes, Higher prices grows with age.
- Fredy Rincón: Customers with Gold and Silver memberships tend to purchase slightly more items than Bronze members.Also, higher total spend is strongly associated with purchasing more items.
- James Li: Yes, for males, discounts seem to cause dissatisfaction, while for females, the response to discounts is mixed and might depend on other factors not captured in this dataset.
- Alison Wu:
- Who is the intended audience for our data analysis?
- Alison Wu:
: The intended audience can include e-commerce businesses, marketing teams, and customer experience managers looking to optimize their strategies and improve customer satisfaction. - Angel Yang:
: marketing team of the ecommerce company - Ernani Fantinatti: Companies interested in understanding the consumer market in general for Age group, generations and cities.
- Fredy Rincón: The intended audience includes marketing teams, business analysts, and decision-makers interested in understanding customer purchasing behavior and improving membership benefits.
- James Li: Marketing department
- Alison Wu:
- What is the question our analysis is trying to answer?
- Alison Wu:
: How different factors such as demographics, membership levels, and discounts influence customer behavior and satisfaction - Angel Yang:
: which demographic variables can be used as predictor of a high value customer - Ernani Fantinatti: What age group are intending to spend more.
- Fredy Rincón: Do customers with a Gold membership buy more items than those with Silver or Bronze memberships?
- James Li: The difference in sensitivity of genders reaction of discounts being applied or not
- Alison Wu:
- Are there any specific libraries or frameworks that are well-suited to our project requirements?
- Alison Wu:
: Pandas, NumPy and Matplotlib - Angel Yang:
sklearn, pandas, matplotlib - Ernani Fantinatti: Pandas, matplotlib. SM Model Spec, one-hot-encoding, SQLite3.
- Fredy Rincón: Libraries like Pandas and NumPy for data manipulation, Matplotlib and Seaborn for visualization, and scikit-learn for statistical modeling and regression analysis are well-suited for our project.
- James Li: chi-square
- Alison Wu:
Table | Description |
---|---|
E_Comm_Customer_Behavior | Main table from the designated dataset. |
income_by_city | Income by city, extracted from Kaggle. |
Generations | Own created table with the generations by age bins. |
Column | Description |
---|---|
Customer ID | A unique identifier for each customer. |
Gender | The gender of the customer. |
Age | The age of the customer. |
City | The city where the customer lives. |
Membership Type | The type of membership (Gold, Silver, Bronze). |
Total Spend | The total amount spent by the customer. |
Items Purchased | The number of items purchased by the customer. |
Average Rating | The average rating given by the customer. |
Discount Applied | Indicates whether a discount was applied. |
Days Since Last Purchase | The number of days since the last purchase. |
Satisfaction Level | The satisfaction level of the customer. |