diff --git a/Documentation/Data-Field-Definition.md b/Documentation/Data-Field-Definition.md deleted file mode 100644 index c7dab11..0000000 --- a/Documentation/Data-Field-Definition.md +++ /dev/null @@ -1,29 +0,0 @@ - - -# Data Field Definitions - -This document outlines the data fields obtained for each lead. The data can be sourced from the online _Lead Form_ or be retrieved from the internet using APIs. It is currently unfinished, and will be updated once we finalise what data points will be used for the AI model. - -The data types selected are on the assumption that we’re using the PostgreSQL database. - -## Data Field Table - -| Field Name | Data Type | Description | Validation Rules | Data Source | Sample Data (if available) | Name Convention | -| -------------------------------- | :-------: | ----------------------------------------------------------------------------------------------------- | -------------------------------- | :------------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------: | -| First Name | text | First name of business owner | | Lead Form | | first_name | -| Last Name | text | Last name of business owner | | Lead Form | | last_name | -| Email Address | text | Owner’s email address (doesn’t specify business or personal) | | Lead Form | | email_address | -| Telephone Number | varchar | Owner’s telephone number (doesn’t specify business or personal) | Length dependent on country code | Lead Form | | phone_number | -| Annual Income from Card Payments | enum | Enumerated income-ranges that indicate how much of the company’s income is comprised of card payments | | Lead Form | Categories:
Keine
0 – 35.000
35.000 - 60.000
60.000 - 100.000
100.000 - 200.000
200.000 - 400.000
400.000 - 600.000
600.000 - 1 Mio.
1 Mio. – 2 Mio.
2 Mio. – 5 Mio.
Mehr als 5 Mio. | annual_income | -| Products of Interest | enum | Enumerated categories indicating SumUp products the owner is interested in | | Lead Form | Categories:
Keine
Alle
Kartenterminals
Kassensystem
Geshäftskonto
Andere | products_of_interest | -| Email Domain | text | Domain of the email address provided by the lead form | | Pre-processing | | domain | - -## Links to Data Sources: - -Lead form: https://www.sumup.com/de-de/kontaktieren-vertriebsteam/ \ -Google Places API: https://developers.google.com/maps/documentation/places/web-service/overview \ -OpenAI API: https://platform.openai.com/docs/overview \ -Meta API: https://developers.facebook.com/docs/graph-api/overview diff --git a/Documentation/Data-Fields.md b/Documentation/Data-Fields.md new file mode 100644 index 0000000..1043218 --- /dev/null +++ b/Documentation/Data-Fields.md @@ -0,0 +1,23 @@ + + +# Data Field Definitions + +This document outlines the data fields obtained for each lead. The data can be +sourced from the online _Lead Form_ or be retrieved from the internet using +APIs. + +## Data Field Table + +The most recent Data Fields table can now be found in a +[separate CSV File](./data-fields.csv). + +## Links to Data Sources: + +Lead form: https://www.sumup.com/de-de/kontaktieren-vertriebsteam/ \ +Google Places API: https://developers.google.com/maps/documentation/places/web-service/overview \ +OpenAI API: https://platform.openai.com/docs/overview \ +Meta API: https://developers.facebook.com/docs/graph-api/overview diff --git a/Documentation/data-fields.csv b/Documentation/data-fields.csv new file mode 100644 index 0000000..d69c013 --- /dev/null +++ b/Documentation/data-fields.csv @@ -0,0 +1,58 @@ +Field Name,Type,Description,Data source,Dependencies,Example +Last Name,string,Last name of the lead,Lead data,-,Mustermann +First Name,string,First name of the lead,Lead data,-,Mustername +Company / Account,string,Company name of the lead,Lead data,-,Mustercompany +Phone,string,Phone number of the lead,Lead data,-,49 1234 56789 +Email,string,Email of the lead,Lead data,-,musteremail@example.com +domain,string,"The domain of the email is the part that follows the "@" symbol, indicating the organization or service hosting the email address.",processing,Email,example.com +email_valid,boolean,Checks if the email is valid.,email_validator package,Email,True/False +first_name_in_account,boolean,Checks if first name is written in "Account" input,processing,First Name,True/False +last_name_in_account,boolean,Checks if last name is written in "Account" input,processing,Last Name,True/False +number_formatted,string,Phone number (formatted),phonenumbers package,Phone,49123456789 +number_country,string,Country derived from phone number,phonenumbers package,Phone,Germany +number_area,string,Area derived from phone number,phonenumbers package,Phone,Erlangen +number_valid,boolean,Indicator weather a phone number is valid,phonenumbers package,Phone,True/False +number_possible,boolean,Indicator weather a phone number is possible,phonenumbers package,Phone,True/False +google_places_place_id,string,Place ID used by Google,Google Places API,Company / Account,- +google_places_business_status,string,Business Status,Google Places API,Company / Account,Operational +google_places_formatted_address,string,Formatted address,Google Places API,Company / Account,Musterstr.1 +google_places_name,string,Business Name,Google Places API,Company / Account,Mustername +google_places_user_ratings_total,integer,Total number of ratings,Google Places API,Company / Account,100 +google_places_rating,float,Average star rating,Google Places API,Company / Account,4.5 +google_places_price_level,float,Price level (1-3),Google Places API,Company / Account,- +google_places_candidate_count_mail,integer,Number of results from E-Mail based search,Google Places API,Company / Account,1 +google_places_candidate_count_phone,integer,Number of results from Phone based search,Google Places API,Company / Account,1 +google_places_place_id_matches_phone_se arch,boolean,Indicator weather phone based and EMail based search gave the same result,Google Places API,Company / Account,True/False +google_places_confidence,float,Indicator of confidence in the Google result,processing,,0.9 +google_places_detailed_website,string,Link to business website,Google Places API,Company / Account,www.musterwebsite.de +google_places_detailed_type,list,Type of business,Google Places API,Company / Account,"[""florist"", ""store""]" +reviews_sentiment_score,float,Sentiment score between -1 and 1 for the reviews,GPT,Google reviews,0.9 +regional_atlas_pop_density,float,Population density,Regional Atlas,google_places_formatted_address,2649.6 +regional_atlas_pop_development,float,Population development,Regional Atlas,google_places_formatted_address,-96.5 +regional_atlas_age_0,float,Age group,Regional Atlas,google_places_formatted_address,16.3 +regional_atlas_age_1,float,Age group,Regional Atlas,google_places_formatted_address,8.2 +regional_atlas_age_2,float,Age group,Regional Atlas,google_places_formatted_address,31.1 +regional_atlas_age_3,float,Age group,Regional Atlas,google_places_formatted_address,26.8 +regional_atlas_age_4,float,Age group,Regional Atlas,google_places_formatted_address,17.7 +regional_atlas_pop_avg_age,float,Average population age,Regional Atlas,google_places_formatted_address,42.1 +regional_atlas_per_service_sector,float,-,Regional Atlas,google_places_formatted_address,88.4 +regional_atlas_per_trade,float,-,Regional Atlas,google_places_formatted_address,28.9 +regional_atlas_employment_rate,float,Employment rate,Regional Atlas,google_places_formatted_address,59.9 +regional_atlas_unemployment_rate,float,Unemployment rate,Regional Atlas,google_places_formatted_address,6.4 +regional_atlas_per_long_term_unemployme nt,float,Long term unemployment,Regional Atlas,google_places_formatted_address,49.9 +regional_atlas_investments_p_employee,float,Investments per employee,Regional Atlas,google_places_formatted_address,6.8 +regional_atlas_gross_salary_p_employee,float,Gross salary per employee,Regional Atlas,google_places_formatted_address,63.9 +regional_atlas_disp_income_p_inhabitant,float,Income per inhabitant,Regional Atlas,google_places_formatted_address,23703 +regional_atlas_tot_income_p_taxpayer,float,Income per taxpayer,Regional Atlas,google_places_formatted_address,45.2 +regional_atlas_gdp_p_employee,float,GDP per employee,Regional Atlas,google_places_formatted_address,84983 +regional_atlas_gdp_development,float,GDP development,Regional Atlas,google_places_formatted_address,5.2 +regional_atlas_gdp_p_inhabitant,float,GDP per inhabitant,Regional Atlas,google_places_formatted_address,61845 +regional_atlas_gdp_p_workhours,float,GDP per workhours,Regional Atlas,google_places_formatted_address,60.7 +regional_atlas_pop_avg_age_zensus,float,Average population age (from zensus),Regional Atlas,google_places_formatted_address,41.3 +regional_atlas_regional_score,float,Regional score,Regional Atlas,google_places_formatted_address,3761.93 +review_avg_grammatical_score,float,Average grammatical score of reviews,processing,google_places_place_id,0.56 +review_polarization_type,string,Polarization type of review ratings,processing,google_places_place_id,High-Rating Dominance +review_polarization_score,float,Polarization score of review ratings ,processing,google_places_place_id,1 +review_highest_rating_ratio,float,Ratio of the highest review ratings,processing,google_places_place_id,1 +review_lowest_rating_ratio,float,Ratio of the lowest review ratings,processing,google_places_place_id,0 +review_rating_trend,float,Value indicating the trend of ratings,processing,google_places_place_id,0 diff --git a/Documentation/data_fields.csv.license b/Documentation/data-fields.csv.license similarity index 61% rename from Documentation/data_fields.csv.license rename to Documentation/data-fields.csv.license index 2b087bd..407cbca 100644 --- a/Documentation/data_fields.csv.license +++ b/Documentation/data-fields.csv.license @@ -1,2 +1,3 @@ # SPDX-License-Identifier: MIT # SPDX-FileCopyrightText: 2023 Lucca Baumgärtner +# SPDX-FileCopyrightText: 2024 Ahmed Sheta diff --git a/Documentation/data_fields.csv b/Documentation/data_fields.csv deleted file mode 100644 index fe475c6..0000000 --- a/Documentation/data_fields.csv +++ /dev/null @@ -1,57 +0,0 @@ -column_id,description,source -domain,Custom domain (if any),calculated -email_valid,Indicator if E-Mail is valid,calculated -first_name_in_account,Indicator if first name is part of the E-Mail Account,calculated -last_name_in_account,Indicator if last name is part of the E-Mail Account,calculated -email,Normalized version of the E-Mail,calculated -category,, -reviews_sentiment_score,Sentiment score between -1 and 1 for the reviews,GPT/calculated -sales_person_summary,A summary of the website to support a salesperson in their call,GPT -google_places_place_id,Place ID used by Google,Google Places API -google_places_business_status,Business Status,Google Places API -google_places_formatted_address,Formatted address,Google Places API -google_places_name,Business Name,Google Places API -google_places_user_ratings_total,Total number of ratings,Google Places API -google_places_rating,Average star rating,Google Places API -google_places_price_level,Price level (1-3),Google Places API -google_places_candidate_count_mail,Number of results from E-Mail based search,calculated -google_places_candidate_count_phone,Number of results from Phone based search,calculated -google_places_place_id_matches_phone_search,Indicator weather phone based and E-Mail based search gave the same result,calculated -google_places_confidence,Indicator of confidence in the Google result,calculated -google_places_detailed_website,Link to business website,Google Places (detailed) API -google_places_detailed_type,Type of business,Google Places (detailed) API -number_formatted,Phone number (formatted),calculated -number_country,Country derived from phone number,calculated -number_area,Area derived from phone number,calculated -number_valid,Indicator weather a phone number is valid,calculated -number_possible,Indicator weather a phone number is possible,calculated -regional_atlas_pop_density,Population density,Regional Atlas -regional_atlas_pop_development,Population development,Regional Atlas -regional_atlas_age_0,,Regional Atlas -regional_atlas_age_1,,Regional Atlas -regional_atlas_age_2,,Regional Atlas -regional_atlas_age_3,,Regional Atlas -regional_atlas_age_4,,Regional Atlas -regional_atlas_pop_avg_age,Average population age,Regional Atlas -regional_atlas_per_service_sector,,Regional Atlas -regional_atlas_per_trade,,Regional Atlas -regional_atlas_employment_rate,Employment rate,Regional Atlas -regional_atlas_unemployment_rate,Unemployment rate,Regional Atlas -regional_atlas_per_long_term_unemployment,Long term unemployment,Regional Atlas -regional_atlas_investments_p_employee,Investments per employee,Regional Atlas -regional_atlas_gross_salary_p_employee,Gross salary per employee,Regional Atlas -regional_atlas_disp_income_p_inhabitant,Income per inhabitant,Regional Atlas -regional_atlas_tot_income_p_taxpayer,Income per taxpayer,Regional Atlas -regional_atlas_gdp_p_employee,GDP per employee,Regional Atlas -regional_atlas_gdp_development,GDP development,Regional Atlas -regional_atlas_gdp_p_inhabitant,GDP per inhabitant,Regional Atlas -regional_atlas_gdp_p_workhours,GDP per workhours,Regional Atlas -regional_atlas_pop_avg_age_zensus,Average population age (from zensus),Regional Atlas -regional_atlas_regional_score,Regional score,calculated -address_ver_1,?,? -review_avg_grammatical_score,Average grammatical score of reviews,calculated -review_polarization_type,Polarization type of review ratings,calculated -review_polarization_score,Polarization score of review ratings ,calculated -review_highest_rating_ratio,Ratio of the highest review ratings,calculated -review_lowest_rating_ratio,Ratio of the lowest review ratings,calculated -review_rating_trend,Value indicating the trend of ratings,calculated