- A tool for analyzing conversations using large language models.
- A machine learning tool that analyzes chat conversations to detect potentially concerning patterns related to child safety.
- The system uses LLaMA models to process conversations and identify markers like age discussions, meetup requests, gift exchanges, and media sharing.
- The Interface Layer contains RescueBox for web display and a Command Line Interface for batch processing.
- RescueBox provides custom UI components for analysis display and handles file upload processing.
- The Command Line Interface focuses on batch processing capabilities and output file generation.
- The Server Layer manages request handling, CSV parsing, and results formatting.
- Ollama Integration takes care of LLaMA model setup, API communication, and response handling.
- The process uses a two-stage system for initial detection and evidence extraction.
- Stage 1: YES/NO detection
- basic classification
- question list:
- Has any person given their age? (and what age was given)
- Has any person asked the other for their age?
- Has any person asked to meet up in person? Where?
- Has any person given a gift to the other? Or bought something from a list like an amazon wish list?
- Have any videos or photos been produced? Requested?
- Stage 2: Evidence extraction
- Evidence processing includes pattern matching, context extraction, and multi-evidence handling.
- Stage 1: YES/NO detection
- Output Generation supports both markdown reports for frontend and CSV format for CLI use.
- Project requires Python version 3.11 or higher.
- Download and install Ollama from GitHub
- Pull the LLaMA 3.1 model:
ollama pull llama3.1
- Installing requirements
python3 -m pip install -r requirements.txt
python3 -m src.backend.server
python3 src/client/client.py
python3 -m src.client.cmd_client --input_file ./src/data_processing/cornell_movie_dialogs/split_conversations/conversations_part_000.json --output_file ./analysis_results.csv --model=llama3.1
- CLI Parameters
--input_file
: Path to the input JSON file containing conversations--output_file
: Path where the analysis results will be saved (CSV format)--model
: Name of the LLM model to use (default: llama3.1)
- documentation location: doc/evaluation_readme.md
- Command line interface tool 5 samples take 2 minutes and 4 seconds in MacBook Air (8G memory) Each sample takes 25 seconds
- Frontend 5 samples take 15 minutes, so each job takes around 3 minutes to complete on MacBook Air (8G memory) Conversation size ranges from 8 lines to 20 lines.
message-analyzer/
├── src/
│ ├── backend/ # Flask server implementation
│ ├── client/ # API and CLI clients
│ └── data_processing/# Data processing utilities
├── evaluation/
│ ├── api_doc.md # Flask-ML related doc
│ ├── evaluation_readme.md # Evaluation doc
│ └── evaluation_result.md # Evaluation result
├── requirements.txt # Python dependencies
└── README.md # This file
- Enhanced data validation with combined AI and human checking
- Long conversation handling optimization
- Improved model accuracy through few-shot prompting