Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODHack: Analyze user behavior across different lending protocols #100

Open
lukaspetrasek opened this issue May 22, 2024 · 18 comments · May be fixed by #255
Open

ODHack: Analyze user behavior across different lending protocols #100

lukaspetrasek opened this issue May 22, 2024 · 18 comments · May be fixed by #255
Assignees

Comments

@lukaspetrasek
Copy link
Collaborator

lukaspetrasek commented May 22, 2024

"Analyze user behavior across different lending protocols.

Steps:

  1. Load the data on loans for all lending protocols from the Google Storage. For example, https://storage.googleapis.com/derisk-persistent-state/zklend_data/loans.parquet is the file with loans for zkLend. It contains the information about the user, protocol, the user's collateral and debt (tokens and amounts). Write the loading part in a way that the source can be easily changed from the Google Storage to a local database.
  2. Visualize the behavior of a single user across the lending protocols in a Jupyter notebook. Use the following tokens: ""ETH"", ""wBTC"", ""USDC"", ""DAI"", ""USDT"", ""wstETH"", ""LORDS"", ""STRK"", ""UNO"" and ""ZEND"". You should be able to use the visualizations to answer the following questions:
  • How many users provide liquidity on just 1 protocol? How many users use 2, or more protocols?
  • How many users borrow on just 1 protocol? How many users use 2, or more protocols?
  • Created a Venn diagram to visualize the above.
  • Visualize the above taking into account the staked/borrowed capital. I.e. if a user has e.g. 10k USD (parameter) worth of capital deposited in the pools, how is it distributed across lending protocols?
  • Visualize the above on a per token basis.
  • Feel free to come up with your own questions, metrics, hypotheses and visualizations."

Definition of Done
The code functions well and is documented, the analysis provides meaningful outputs and answers the questions from the setup.

@lukaspetrasek lukaspetrasek added enhancement New feature or request medium labels May 22, 2024
@vibenedict
Copy link

Hi, can i jump on this issue

@NueloSE
Copy link
Contributor

NueloSE commented May 23, 2024

Hi @lukaspetrasek can I work on this

@lukaspetrasek
Copy link
Collaborator Author

lukaspetrasek commented May 23, 2024

Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼

@NueloSE
Copy link
Contributor

NueloSE commented May 23, 2024

Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼

I have worked on something similar to this before the difference was the dataset was stored in a csvfile not on a Google storage.

This project basically involves data visualization for informed decision making.

For this project i will be using python.

Steps to tackle task

  1. Install required libraries like pandas, matplotlib, seaborn etc
  2. Load the data from google storage using the google cloud sdk
  3. Analyze and visualize user behavior
  • using a jupyter notebook
  • visualize user behavior across protocols
  • create the venn diagram
  • visualize staked capital using the appropriate chart
  • perform token analysis
  1. Documentation of the codes

@lukaspetrasek
Copy link
Collaborator Author

Okay, assigning you @NueloSE 👍🏼

@NueloSE Let me know if everything is clear. If you have any questions, please ask here. What is you TG handler please? 🙏🏼

Consider joining our TG group.
See also our contributor guidelines.

@lukaspetrasek
Copy link
Collaborator Author

Hi @NueloSE , I assume the PR is ready for review, right?

@NueloSE
Copy link
Contributor

NueloSE commented May 30, 2024

@lukaspetrasek, i have implemented all requested changes. It is ready for review : a676ecc

@lukaspetrasek lukaspetrasek changed the title ODHack: Analyze user behavior across different lending protocols. ODHack: Analyze user behavior across different lending protocols Aug 14, 2024
@lukaspetrasek
Copy link
Collaborator Author

@NueloSE has started working on this, but the task is still not completely finished.

What's been done: #107

@tosoham
Copy link
Contributor

tosoham commented Sep 27, 2024

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I am a python dev, worked in the field of Data Science and ML. I am a new-comer and I am interested in solving this issue.

How I will approach this issue?

I would start by loading the data from Google Storage.I have experienced in Google storage and Jupyter Notebook, I'll load the data in pandas dataframe and analyze it as mentioned. Visualizations can be done by matplotlib, seaborn and dash for interactive dashboards. After carefully analyzing, manipulating and visualizing I'll be able to answer the mentioned questions.

@gregemax
Copy link

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

With a background in data analysis using Python, experience with Google Cloud, and proficiency in Jupyter notebooks, I have worked on projects that involve complex data visualization and user behavior analysis. My expertise with tools like Pandas, Matplotlib, and Seaborn allows me to efficiently analyze, manipulate, and visualize large datasets, making me well-suited for this project.

How I plan on tackling this issue

I would start by loading the data from Google Storage, ensuring the code is flexible to switch between cloud and local databases. I’ll perform an initial exploration of the data, using Pandas to handle the loan data and creating visualizations in Jupyter notebooks. For visualizations, I’ll use Venn diagrams to show user engagement across protocols and dive into token-specific behavior. Additional insights like staked/borrowed capital distribution across tokens and protocols will be highlighted, ensuring the analysis is both thorough and meaningful

@Luluameh
Copy link

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I have experience in Python, data analysis, and blockchain protocols. I’ve worked with datasets in Jupyter notebooks, performing behavior analysis and creating visualizations. My background in DeFi and lending platforms makes me well-suited for this task

How I plan on tackling this issue

I would first create a flexible data loader to handle both Google Storage and local databases. Then, I’d analyze user behavior by visualizing data across protocols and answering key questions with Venn diagrams and token-specific graphs. I’d ensure the code is well-documented and capable of answering additional hypotheses.

@ShantelPeters
Copy link

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

HI , i am a blockchain developer with experience in cario, javascript, typescript, solidity, css, html etc. i am an active contributor here on onlydust . this is my first time contributing to this repo. please assign me ,i am ready to work

How I plan on tackling this issue

i intend to approach the issue by carrying out the following :
1Load Data: I will write a function to load loan data from Google Storage or a local database.
2. Visualize Behavior: Show the how users interact with 1 or more lending protocols and tokens.
3. Venn Diagram: Create a Venn diagram to display users’ protocol participation.
4. Capital Distribution: Visualize capital distribution across protocols and tokens.
5. Document: I will ensure the code is clear, functional, and well-documented.

@vic-Gray
Copy link

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Background and Leverage: I have experience in building modular and scalable systems that handle data efficiently. I have worked extensively with APIs, databases, and data visualization libraries, allowing me to approach this problem with a solid foundation in both back-end development and data analysis. My background in both front-end and back-end development will enable me to handle the data-loading part flexibly and create meaningful visualizations to answer key questions.

How I plan on tackling this issue

  1. Loading Data from Google Storage or a Local Database
    I will design the data loading functionality in a way that it can be easily switched between loading data from Google Cloud Storage and a local database. This can be done using a modular function that abstracts the data source.

Implementation Plan:

Use Pandas to load the data from Google Storage or the local database (e.g., PostgreSQL).
Create a function to switch between the data source dynamically (Google Storage or local DB).
Use parquet for loading files from Google Cloud Storage (as provided in the example).import pandas as pd
from sqlalchemy import create_engine

def load_data(source="google", protocol="zklend"):
if source == "google":
url = f"https://storage.googleapis.com/derisk-persistent-state/{protocol}_data/loans.parquet"
return pd.read_parquet(url)
elif source == "local_db":
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
query = f"SELECT * FROM {protocol}_loans"
return pd.read_sql(query, engine)

Load zkLend data

loans_data = load_data(source="google", protocol="zklend")
This structure allows me to switch between loading from Google Storage and a local database with minimal code changes.
Data Aggregation for Users Across Lending Protocols
Once data is loaded, the next step involves:

Aggregating the data based on users, protocols, and their collateral and debt.
Ensuring that the token types (ETH, wBTC, USDC, etc.) are properly parsed and aggregated.
Implementation:# Aggregate loan data by user and protocol
def aggregate_user_data(data):
return data.groupby(["user", "protocol"]).agg({
"collateral_amount": "sum",
"debt_amount": "sum"
}).reset_index()

Example: Aggregate zkLend data

aggregated_data = aggregate_user_data(loans_data)
3. Visualize the Behavior of Users Across Lending Protocols
A. Number of Protocols Used by Users
To visualize the number of users who use 1 protocol, 2 protocols, or more:

Use the aggregated data to group users by the number of protocols they interact with.
This can be visualized with a Venn diagram or bar plot.
Implementation:

import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn3

Count users by number of protocols

user_protocol_count = aggregated_data.groupby('user').protocol.nunique()

Visualize the number of protocols used by users

protocol_count_distribution = user_protocol_count.value_counts()
protocol_count_distribution.plot(kind="bar", title="Number of Protocols Used by Users")
plt.show()
B. Venn Diagram for Users Borrowing/Providing Across Protocols
To create a Venn diagram:

Identify users who interact with different protocols (e.g., zkLend, another protocol).
Use the matplotlib_venn package to visualize overlaps.
Implementation:

For simplicity, assume we have user sets for zkLend and another protocol

users_zklend = set(aggregated_data[aggregated_data['protocol'] == 'zklend'].user)
users_other_protocol = set(aggregated_data[aggregated_data['protocol'] == 'other'].user)

Create a Venn diagram

venn2([users_zklend, users_other_protocol], set_labels=("zkLend", "Other Protocol"))
plt.title("Users Borrowing/Providing Across Protocols")
plt.show()
4. Capital Distribution Across Protocols
This analysis involves looking at the total amount of capital (collateral and debt) distributed across the protocols for each user.
You can visualize the distribution of capital across lending protocols in a bar plot or pie chart, adjusted for capital threshold.
Implementation:

Filter users with at least $10k USD worth of capital (collateral + debt)

high_capital_users = aggregated_data[aggregated_data['collateral_amount'] + aggregated_data['debt_amount'] >= 10000]

Visualize capital distribution

high_capital_users.groupby('protocol').agg({
'collateral_amount': 'sum',
'debt_amount': 'sum'
}).plot(kind='bar', stacked=True, title="Capital Distribution Across Protocols")
plt.show()
5. Token-Specific Analysis
To break down the data on a per token basis (e.g., ETH, wBTC, USDC):

Group data by both token and protocol.
Visualize how the capital is distributed for each token across protocols.
Implementation:

Group by token and protocol

token_data = aggregated_data.groupby(['token', 'protocol']).agg({
'collateral_amount': 'sum',
'debt_amount': 'sum'
}).reset_index()

Visualize capital distribution per token across protocols

token_data.pivot(index='token', columns='protocol', values='collateral_amount').plot(kind='bar', stacked=True, title="Capital by Token Across Protocols")
plt.show()

@bruhhgnik
Copy link

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I am a python dev, i am also working on many blockchain projects,in general i am looking to diversify my portfolio.

How I plan on tackling this issue

Data Loading: Load the Parquet file into a Jupyter notebook using Pandas, ensuring flexibility for local or cloud data sources.

Data Preprocessing: Clean and filter key columns like user ID, protocol, collateral, debt, and tokens.

User Behavior Visualization: Calculate how many users interact with one or multiple protocols. Use bar charts and Venn diagrams to visualize liquidity and borrowing behavior.

Advanced Analysis: Analyze staked/borrowed capital distribution across protocols and visualize it by token type.

Additional Insights: Explore additional metrics like protocol popularity and document findings.

@vic-Gray
Copy link

I am a Python developer with experience working on blockchain projects, aiming to broaden my portfolio.

Approach to the Issue

Data Loading: Use Pandas to load the Parquet file into Jupyter for local or cloud analysis.
Data Preprocessing: Clean and filter key fields like user ID, protocol, collateral, and tokens.
Visualization: Analyze user interaction with protocols using bar charts and Venn diagrams for liquidity and borrowing behavior.
Advanced Insights: Examine capital distribution across protocols and visualize token types.
This version keeps your plan intact while being more concise. Would you like to refine any part of it further?

@Ndifreke000
Copy link

this looks awesome tbvh

@UzNaZ
Copy link
Contributor

UzNaZ commented Oct 1, 2024

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Hi, I am backend developer and i'd like to take this task

@AndriiBogomolov
Copy link
Contributor

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Hi, I'm developer with experience in Starknet, I was working closely with blockchain and web3 technologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet