Skip to content

sayantikabanik/DataJourney

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

License Code of Conduct
CI github-repo-stats Deploy DataJourney Stats Lint prose Monitor GitHub API Rate Limit

DJ rocks

๐ŸšŒ DataJourney

๐ŸชถShort version

Design- first Open Source Data Management Toolkit. Simplifies data workflows with modular, reproducible solutions

๐ŸŒฒLong version

DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building scalable, and reproducible data workflows.

Built on open-source principles, the framework guides users through essential stepsโ€”from identifying goals and selecting tools to testing and customising workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.

๐Ÿงฑ Design Philosophy (LEGO)

Built with additive, subtractive capabilities glued with open source. Each layer has a certain strength of communication inbuilt

  • PO (Base): Static home(s) to keep it together (GitHub)
  • P1 (Tooling): Tooling, strings (Powered by open source)
  • P2 (Maintenance + Monitoring): Env, automations (Pixi + GHA)
  • P3 (Abstraction): Layer(s), CLI/task manager for users to interact with (Pixi)

DJ Design

๐Ÿ›  Current workflows covered

{โœจ= Experimental, โœ… = Implemented}

Status Workflow Description
โœ… Python Packaging framework design principles
โœ… GitHub actions configured
โœ… Vale.sh configured at PR level
โœ… Pre-commit hooks configured for code linting/formatting
โœจ Hello world LLM design example based on LangChain
โœ… Environment management via pixi
โœ… Reading data from online sources using intake
โœ… Sample pipeline built using Dagster
โœ… Building Dashboard using holoviews + panel
โœ… Exploratory data analysis (EDA) using mito
โœ… Web UI build on Flask
โœ… Web UI re-done and expanded with FastHTML
โœ… Leverage AI models to analyse data GitHub AI models Beta

โ˜•๏ธ Quickly getting started with DataJourney

  • Clone DJ [email protected]:sayantikabanik/DataJourney.git
  • Generate & add GITHUB_TOKEN, instructions here
    • Added requirement to run the LLM workflows
  • Switch directory cd DataJourney
  • Download pixi : prefix.dev
  • Activate env: pixi shell
  • Install DJ framework locally pixi run DJ_package
  • List all the tasks: pixi task list
  • Execute a task from the list: pixi run <TASK>
  • Execute a task with verbosity enabled: pixi run -v <TASK>

๐Ÿƒ๐Ÿฝโ€โ™€๏ธ Active tasks under DJ

  • GIT_TOKEN_CHECK
  • DJ_package
  • DJ_pre_commit
  • DJ_dagster
  • DJ_fasthtml_app
  • DJ_flask_app
  • DJ_mito_app
  • DJ_panel_app
  • DJ_llm_analysis
  • DJ_hello_world_langchain

๐Ÿ”Œ About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details

pixi run DJ_pre_commit

๐Ÿฆญ Executing LLM script: Generate stock price recommendations

pixi run DJ_llm_analysis

๐Ÿชผ Execute pre-configured Dagster pipeline

pixi run DJ_dagster

Dagit UI output

๐Ÿ™ Panel app

pixi run DJ_panel_app

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_twilio_dashboard

Panel app output

๐Ÿต Mito

To explore further visit trymito.io

pixi run DJ_mito_app
mito_output mito_output

๐Ÿฆ‹ Display all data sources present via web UI

# Run FastHTML app
pixi run DJ_fasthtml_app

data_sources_fasthtml.png