Skip to content

v5.0.0

Latest
Compare
Choose a tag to compare
@t83714 t83714 released this 22 Feb 10:36

We are excited to announce the release of Magda v5.0.0, a major update introducing groundbreaking features to enhance data discovery, querying, and exploration. This release includes three powerful new capabilities:

  • In-Browser LLM-Powered Chatbot – An AI-driven chat interface for intuitive data exploration.
  • SQL Console – A built-in, browser-based SQL tool for powerful data querying.
  • Hybrid Search Engine – A smart combination of lexical and semantic search for improved dataset discovery.

💬 In-Browser LLM Chatbot: AI-Powered Data Exploration

Magda Chatbot Demo Video

Understanding and analyzing datasets has never been easier. Magda v5.0.0 introduces an LLM-powered chatbot that operates entirely within your web browser, making data exploration more intuitive than ever.

🧠 Key Features
Conversational Dataset Search — Ask the chatbot to find datasets using natural language queries.
Automated Data Analysis — Upload tabular data, and the chatbot will analyze, visualize, and summarize key insights.
SQL Query Generation — Generate SQL queries dynamically from chat prompts, which can be executed in Magda’s new SQL Console.
🔹 Why In-Browser AI?
Unlike traditional server-based AI solutions, Magda’s chatbot runs directly in the browser using WebGPU, reducing infrastructure costs and enhancing privacy by keeping inference local to the user’s device.

Learn more about Magda Chatbot in our Intro Document.

📊 SQL Console: Powerful Data Querying in Your Browser

sql-console

With Magda v5.0.0, we’re introducing SQL Console, a built-in browser-based SQL tool that enables users to query datasets including Excel spreadsheets, CSV, TSV, and TAB-separated data files, regardless of whether the data is stored and managed by Magda or a third-party platform. The queries are executed on the client side using browser resources, making it a scalable and efficient solution.

⚡ Key Capabilities

  • Query CSV, Excel, and other tabular data — Directly from Magda or third-party sources.
  • Fast, client-side execution — Runs entirely in-browser for scalability.
  • One-click result export — Download query results as CSV files.

🖥️ How to Access SQL Console

You can launch the SQL Console using a simple keyboard shortcut:

  • Mac: Command + Shift + S
  • Windows: Ctrl + Shift + S

Additionally, other Magda features (such as the LLM-powered chatbot) may open the SQL Console automatically when needed.

🔎 Querying Data with source() Function

Magda extends SQL with the source() function, allowing you to query datasets without needing to worry about underlying storage details or file formats.

Basic Usage:

SELECT * FROM source(0)

The source() function accepts one of the following as its parameter:

  • Magda Data Distribution ID (String)
SELECT * FROM source("dist-dga-xxxxx-xxx-xxx-xxx")

The function resolves the distribution ID to its access URL and selects the appropriate query engine based on the file format.

  • Index Number (Starting from 0)
SELECT * FROM source(0)

This refers to the data resources available on the current page. source(0) accesses the first available dataset, source(1) the second, and so on.
If you are on a Magda distribution page with a single dataset, source(0) is the default option.

  • String “this” or No Parameter
SELECT * FROM source()

Equivalent to source(0), it queries the first available dataset on the page.

🚀 Performance Benchmark

Tested on a 16GB M1 MacBook Pro using a 1 million-records (174.2MB) CSV dataset:

  • Simple Query:
SELECT * FROM source(0) LIMIT 10

Query time: 2.458s (3 times average excluding file download time)

  • Aggregation Query:
SELECT Company, COUNT(*) AS CustomerNum FROM source(0) GROUP BY Company ORDER BY CustomerNum DESC LIMIT 10

Query time: 3.470s (3 times average excluding file download time)

Learn more about SQL Console in our Quick Guide.

🚀 Hybrid Search Engine: The Best of Keyword & AI-Powered Search

Magda v5.0.0 introduces a hybrid search engine that combines keyword-based (lexical) search and semantic (vector) search to improve search relevance and precision.

🔍 Lexical vs. Vector Search

  • Lexical Search – Traditional keyword matching, fast and deterministic but lacks contextual understanding.
  • Vector Search – AI-powered semantic search that finds conceptually relevant results based on meaning rather than exact matches.

✅ Why Hybrid Search?

By combining lexical and vector search, Magda’s hybrid engine:

  • Delivers more relevant search results by understanding query intent.
  • Supports structured filtering while incorporating semantic relevance.
  • Enhances the LLM-powered chatbot by enabling intelligent dataset retrieval.

Learn more about Magda Hybrid Search Engine from our Intro Document.

Changes since v4.2.4

  • #3549 New feature: Hybrid search engine
  • #3570 New feature: In-browser Local LLM powered Chatbot
  • #3571 New feature: a SQL console UI Allows the user to run SQL on tabular data files at client side
  • #3572 Allow different helmet config to be specified per request path & upgrade helmet to v8.0.0
  • #3573 Make JVM MaxRAMPercentage option configurable for all scala services via helm charts
  • #3575 Remove unused & obsolete isAdmin field on users table
  • #3576 Avoid search engine entering idle mode
  • #3574 Allow disabling frontend auto metadata extraction feature via config
  • Include magda-embedding-api v1.1.0 to support #3549
  • fixes: avoid bundling auth-api-client typescript definition in authentication-plugin-sdk to prevent TS2345 type error
  • #3567: Upgrade opensearch to v2.17.1

Migration

If you are on version < v4.0.0, you should upgrade to the latest v4 version before upgrading to v5.0.0.

For more detailed migration notes, please find from: https://github.com/magda-io/magda/blob/main/docs/docs/migration/5.0.0.md

Full Changelog: v4.2.4...v5.0.0