Skip to content

JaimeStill/schema-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Schema Scraper

This repository contains a Schema Scraper CLI that allows you to connect to a SQL Server database and generate metadata documentation for all tables that contain data. This repository is configured to scrape the schema for the AdventureWorks 2022 OLTP database on a local SQL Server instance named .\DevSql. If you want to run the CLI against a different SQL database, just adjust the keys in connections.json.

The AdventureWorks directory was generated from the src directory by running:

dotnet run -- generate -k "AdventureWorks"

This generated the readme.md document, which contains all of the tables that have data along with the number of records in that table. Each table is linked to the specific metadata document for that table and tables are organized in sub-directories around their schema. For instance Production.Product is located at AdventureWorks/Production/Product.md.

Each table metadata document contains three sections:

  • Columns - Provides the column metadata for the table.
  • Relationships - Provides the relationship metadata for the table broken down into the following sections:
    • Dependencies - Provides the foreign key relationships defined on the table.
    • Dependents - Provides the foreign key relationships other tables define against the table.
  • Maps - Provides a unique collection of tables that can be mapped to table through a relationship dependency. If you made a database diagram consisting of every table in the database, these would be all of the tables that you could trace to this table through diagram connections. It is broken down into the following sections:
    • Dependency Map - Recursively analyzes the foreign key dependencies defined by the table to identify all tables that can be mapped through those relationships.
    • Dependent Map - Recursively analyzes the foreign key dependencies defined by tables that point to the table to identify all tables that can be mapped through those relationships.

Schema Scraper CLI

Back to Top

The src directory contains the schemascraper CLI tool. The sections that guide you through using the tool.

Run the CLI

Back to Top

To run the CLI tool, you will need access to a SQL database on a machine with the .NET 8 SDK installed.

Open a terminal pointed to the src directory.

Commands can be executed as follows:

dotnet run -- [command] [options]

# example
dotnet run -- generate -k "AdventureWorks"

Commands

Back to Top

Every command has the following options:

Option Type Description
--key
-k
string? Database connection configuration key in the JSON connections file, specified with --connections.
--server
-s
string? SQL server name. If used, --database is required.
--database
--db
-d
string? SQL database name. If used, --server is required.
--connections
-c
FileInfo SQL connection configuration JSON file. Default value is ./connections.json.

Columns

Back to Top

Scrape and output table columns.

Option Type Description
--table
-t
string SQL database table.

Example

dotnet run -- columns -t "Product" -k "AdventureWorks"

Output

Column, Type, IsNullable
ProductID, int, False
Name, nvarchar, False
ProductNumber, nvarchar, False
MakeFlag, bit, False
FinishedGoodsFlag, bit, False
Color, nvarchar, True
SafetyStockLevel, smallint, False
ReorderPoint, smallint, False
StandardCost, money, False
ListPrice, money, False
Size, nvarchar, True
SizeUnitMeasureCode, nchar, True
WeightUnitMeasureCode, nchar, True
Weight, decimal, True
DaysToManufacture, int, False
ProductLine, nchar, True
Class, nchar, True
Style, nchar, True
ProductSubcategoryID, int, True
ProductModelID, int, True
SellStartDate, datetime, False
SellEndDate, datetime, True
DiscontinuedDate, datetime, True
rowguid, uniqueidentifier, False
ModifiedDate, datetime, False

Generate

Back to Top

Scrape and generate schema metadata documentation in markdown format.

Option Type Description
--root
-r
string? Root directory to generate schema metadata documentation. If not provided, defaults to the value provided through the --key option. If the key value is null, will use the value provided through the --database option.

Example

dotnet run -- generate -k "AdventureWorks"

The AdventureWorks directory is an example of the output this command generates.

Maps

Back to Top

Scrape and output recursive relationship maps for a table.

Option Type Description
--table
-t
string SQL database table.

Example

dotnet run -- maps -t "Product" -k "AdventureWorks"

Output

Table, IsDependency
Production.ProductModel, True
Production.ProductSubcategory, True
Production.UnitMeasure, True
Production.BillOfMaterials, False
Production.ProductCostHistory, False
Production.ProductDocument, False
Production.ProductInventory, False
Production.ProductListPriceHistory, False
Production.ProductProductPhoto, False
Production.ProductReview, False
Production.TransactionHistory, False
Production.WorkOrder, False
Purchasing.ProductVendor, False
Purchasing.PurchaseOrderDetail, False
Sales.ShoppingCartItem, False
Sales.SpecialOfferProduct, False

Relationships

Back to Top

Scrape and output table relationships.

Option Type Description
--table
-t
string SQL database table.

Example

dotnet run -- relationships -t "Product" -k "AdventureWorks"

Output

Table, PrimaryKey, ForeignKey, ForeignKeyName, IsDependency
Production.ProductModel, ProductModelID, ProductModelID, FK_Product_ProductModel_ProductModelID, True
Production.ProductSubcategory, ProductSubcategoryID, ProductSubcategoryID, FK_Product_ProductSubcategory_ProductSubcategoryID, True
Production.UnitMeasure, UnitMeasureCode, SizeUnitMeasureCode, FK_Product_UnitMeasure_SizeUnitMeasureCode, True
Production.UnitMeasure, UnitMeasureCode, WeightUnitMeasureCode, FK_Product_UnitMeasure_WeightUnitMeasureCode, True
Production.BillOfMaterials, ProductID, ProductAssemblyID, FK_BillOfMaterials_Product_ProductAssemblyID, False
Production.BillOfMaterials, ProductID, ComponentID, FK_BillOfMaterials_Product_ComponentID, False
Production.ProductCostHistory, ProductID, ProductID, FK_ProductCostHistory_Product_ProductID, False
Production.ProductDocument, ProductID, ProductID, FK_ProductDocument_Product_ProductID, False
Production.ProductInventory, ProductID, ProductID, FK_ProductInventory_Product_ProductID, False
Production.ProductListPriceHistory, ProductID, ProductID, FK_ProductListPriceHistory_Product_ProductID, False
Production.ProductProductPhoto, ProductID, ProductID, FK_ProductProductPhoto_Product_ProductID, False
Production.ProductReview, ProductID, ProductID, FK_ProductReview_Product_ProductID, False
Production.TransactionHistory, ProductID, ProductID, FK_TransactionHistory_Product_ProductID, False
Production.WorkOrder, ProductID, ProductID, FK_WorkOrder_Product_ProductID, False
Purchasing.ProductVendor, ProductID, ProductID, FK_ProductVendor_Product_ProductID, False
Purchasing.PurchaseOrderDetail, ProductID, ProductID, FK_PurchaseOrderDetail_Product_ProductID, False
Sales.ShoppingCartItem, ProductID, ProductID, FK_ShoppingCartItem_Product_ProductID, False
Sales.SpecialOfferProduct, ProductID, ProductID, FK_SpecialOfferProduct_Product_ProductID, False

Tables

Back to Top

Scrape and output database tables and their record count.

Example

dotnet run -- tables -k "AdventureWorks"

Output

Table, RecordCount
dbo.AWBuildVersion, 1
dbo.DatabaseLog, 1596
HumanResources.Department, 16
HumanResources.Employee, 290
HumanResources.EmployeeDepartmentHistory, 296
HumanResources.EmployeePayHistory, 316
HumanResources.JobCandidate, 13
HumanResources.Shift, 3
Person.Address, 19614
Person.AddressType, 6
Person.BusinessEntity, 20777
Person.BusinessEntityAddress, 19614
Person.BusinessEntityContact, 909
Person.ContactType, 20
Person.CountryRegion, 238
Person.EmailAddress, 19972
Person.Password, 19972
Person.Person, 19972
Person.PersonPhone, 19972
Person.PhoneNumberType, 3
Person.StateProvince, 181
Production.BillOfMaterials, 2679
Production.Culture, 8
Production.Document, 13
Production.Illustration, 5
Production.Location, 14
Production.Product, 504
Production.ProductCategory, 4
Production.ProductCostHistory, 395
Production.ProductDescription, 762
Production.ProductDocument, 32
Production.ProductInventory, 1069
Production.ProductListPriceHistory, 395
Production.ProductModel, 128
Production.ProductModelIllustration, 7
Production.ProductModelProductDescriptionCulture, 762
Production.ProductPhoto, 101
Production.ProductProductPhoto, 504
Production.ProductReview, 4
Production.ProductSubcategory, 37
Production.ScrapReason, 16
Production.TransactionHistory, 113443
Production.TransactionHistoryArchive, 89253
Production.UnitMeasure, 38
Production.WorkOrder, 72591
Production.WorkOrderRouting, 67131
Purchasing.ProductVendor, 460
Purchasing.PurchaseOrderDetail, 8845
Purchasing.PurchaseOrderHeader, 4012
Purchasing.ShipMethod, 5
Purchasing.Vendor, 104
Sales.CountryRegionCurrency, 109
Sales.CreditCard, 19118
Sales.Currency, 105
Sales.CurrencyRate, 13532
Sales.Customer, 19820
Sales.PersonCreditCard, 19118
Sales.SalesOrderDetail, 121317
Sales.SalesOrderHeader, 31465
Sales.SalesOrderHeaderSalesReason, 27647
Sales.SalesPerson, 17
Sales.SalesPersonQuotaHistory, 163
Sales.SalesReason, 10
Sales.SalesTaxRate, 29
Sales.SalesTerritory, 10
Sales.SalesTerritoryHistory, 17
Sales.ShoppingCartItem, 3
Sales.SpecialOffer, 16
Sales.SpecialOfferProduct, 538
Sales.Store, 701

Configuration

Back to Top

ConnectorConfig schema

Property Type Default Value Required
DataSource string Yes
InitialCatalog string Yes
UserID string? null No
Password string? null No
IntegratedSecurity bool true No
TrustServerCertificate bool true No
ConnectRetryCount int 3 No
ConnectRetryInterval int 10 No

These properties map to the SqlConnectionStringBuilder properties of the same name.

If no UserID and Password are specified, it is assumed that the current logged in user has access to connect to the configured SQL database.

You can provide configuration to the CLI in two ways:

  • Specifying a JSON configuration file:

    {
        "AdventureWorks": {
            "DataSource": "SqlDb01",
            "InitialCatalog": "AdventureWorks2022",
            "UserID": "viewer",
            "Password": "Password1234!@#$"
        }
    }

    The CLI is configured to look for a file named connections.json in the location you are executing the command from. If you have a different name / location for your JSON file, specify it with the --connections option.

  • Configuring environment variables:

    Environment variables can be configured in PowerShell as follows:

    $env:Key__Property = Value
    
    # example
    $env:AdventureWorks__DataSource = SqlDb01
    $env:AdventureWorks__InitialCatalog = AdventureWorks2022
    $env:AdventureWorks__UserID = viewer
    $env:AdventureWorks__Password = 'Password1234!@#$'

Architecture

Back to Top

The project depends on three libraries:

  • System.CommandLine - A library that provides functionality that is commonly needed by command-line apps, such as parsing the command-line input and displaying help text.
  • Microsoft.Data.SqlClient - A data provider for Microsoft SQL Server and Azure SQL Database.
  • Dapper - A simple object mapper for .NET.

The project is structured as follows:

Notes

Back to Top

Currently the Schema Scraper CLI tool only supports connections to SQL Server. If a need is ever encountered, the Connector class could be abstracted around multiple providers and their interface libraries, e.g. Npgsql for PostgreSQL. Would need to add additional metadata to ConnectorConfig to indicate the intended provider.

About

CLI for generating SQL database metadata documentation for all tables that contain data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published