Skip to content

Releases: magda-io/magda

v4.2.5

28 Feb 03:58
Compare
Choose a tag to compare

What's New

  • Upgrade dompurify to v3.2.4

Full Changelog: v4.2.4...v4.2.5

v5.0.0

22 Feb 10:36
Compare
Choose a tag to compare

We are excited to announce the release of Magda v5.0.0, a major update introducing groundbreaking features to enhance data discovery, querying, and exploration. This release includes three powerful new capabilities:

  • In-Browser LLM-Powered Chatbot – An AI-driven chat interface for intuitive data exploration.
  • SQL Console – A built-in, browser-based SQL tool for powerful data querying.
  • Hybrid Search Engine – A smart combination of lexical and semantic search for improved dataset discovery.

💬 In-Browser LLM Chatbot: AI-Powered Data Exploration

Magda Chatbot Demo Video

Understanding and analyzing datasets has never been easier. Magda v5.0.0 introduces an LLM-powered chatbot that operates entirely within your web browser, making data exploration more intuitive than ever.

🧠 Key Features
Conversational Dataset Search — Ask the chatbot to find datasets using natural language queries.
Automated Data Analysis — Upload tabular data, and the chatbot will analyze, visualize, and summarize key insights.
SQL Query Generation — Generate SQL queries dynamically from chat prompts, which can be executed in Magda’s new SQL Console.
🔹 Why In-Browser AI?
Unlike traditional server-based AI solutions, Magda’s chatbot runs directly in the browser using WebGPU, reducing infrastructure costs and enhancing privacy by keeping inference local to the user’s device.

Learn more about Magda Chatbot in our Intro Document.

📊 SQL Console: Powerful Data Querying in Your Browser

sql-console

With Magda v5.0.0, we’re introducing SQL Console, a built-in browser-based SQL tool that enables users to query datasets including Excel spreadsheets, CSV, TSV, and TAB-separated data files, regardless of whether the data is stored and managed by Magda or a third-party platform. The queries are executed on the client side using browser resources, making it a scalable and efficient solution.

⚡ Key Capabilities

  • Query CSV, Excel, and other tabular data — Directly from Magda or third-party sources.
  • Fast, client-side execution — Runs entirely in-browser for scalability.
  • One-click result export — Download query results as CSV files.

🖥️ How to Access SQL Console

You can launch the SQL Console using a simple keyboard shortcut:

  • Mac: Command + Shift + S
  • Windows: Ctrl + Shift + S

Additionally, other Magda features (such as the LLM-powered chatbot) may open the SQL Console automatically when needed.

🔎 Querying Data with source() Function

Magda extends SQL with the source() function, allowing you to query datasets without needing to worry about underlying storage details or file formats.

Basic Usage:

SELECT * FROM source(0)

The source() function accepts one of the following as its parameter:

  • Magda Data Distribution ID (String)
SELECT * FROM source("dist-dga-xxxxx-xxx-xxx-xxx")

The function resolves the distribution ID to its access URL and selects the appropriate query engine based on the file format.

  • Index Number (Starting from 0)
SELECT * FROM source(0)

This refers to the data resources available on the current page. source(0) accesses the first available dataset, source(1) the second, and so on.
If you are on a Magda distribution page with a single dataset, source(0) is the default option.

  • String “this” or No Parameter
SELECT * FROM source()

Equivalent to source(0), it queries the first available dataset on the page.

🚀 Performance Benchmark

Tested on a 16GB M1 MacBook Pro using a 1 million-records (174.2MB) CSV dataset:

  • Simple Query:
SELECT * FROM source(0) LIMIT 10

Query time: 2.458s (3 times average excluding file download time)

  • Aggregation Query:
SELECT Company, COUNT(*) AS CustomerNum FROM source(0) GROUP BY Company ORDER BY CustomerNum DESC LIMIT 10

Query time: 3.470s (3 times average excluding file download time)

Learn more about SQL Console in our Quick Guide.

🚀 Hybrid Search Engine: The Best of Keyword & AI-Powered Search

Magda v5.0.0 introduces a hybrid search engine that combines keyword-based (lexical) search and semantic (vector) search to improve search relevance and precision.

🔍 Lexical vs. Vector Search

  • Lexical Search – Traditional keyword matching, fast and deterministic but lacks contextual understanding.
  • Vector Search – AI-powered semantic search that finds conceptually relevant results based on meaning rather than exact matches.

✅ Why Hybrid Search?

By combining lexical and vector search, Magda’s hybrid engine:

  • Delivers more relevant search results by understanding query intent.
  • Supports structured filtering while incorporating semantic relevance.
  • Enhances the LLM-powered chatbot by enabling intelligent dataset retrieval.

Learn more about Magda Hybrid Search Engine from our Intro Document.

Changes since v4.2.4

  • #3549 New feature: Hybrid search engine
  • #3570 New feature: In-browser Local LLM powered Chatbot
  • #3571 New feature: a SQL console UI Allows the user to run SQL on tabular data files at client side
  • #3572 Allow different helmet config to be specified per request path & upgrade helmet to v8.0.0
  • #3573 Make JVM MaxRAMPercentage option configurable for all scala services via helm charts
  • #3575 Remove unused & obsolete isAdmin field on users table
  • #3576 Avoid search engine entering idle mode
  • #3574 Allow disabling frontend auto metadata extraction feature via config
  • Include magda-embedding-api v1.1.0 to support #3549
  • fixes: avoid bundling auth-api-client typescript definition in authentication-plugin-sdk to prevent TS2345 type error
  • #3567: Upgrade opensearch to v2.17.1

Migration

If you are on version < v4.0.0, you should upgrade to the latest v4 version before upgrading to v5.0.0.

For more detailed migration notes, please find from: https://github.com/magda-io/magda/blob/main/docs/docs/migration/5.0.0.md

Full Changelog: v4.2.4...v5.0.0

v5.0.0-alpha.2

20 Feb 10:56
Compare
Choose a tag to compare
v5.0.0-alpha.2 Pre-release
Pre-release

What's Changed

  • Hybrid Search #3549
  • In-browser Local LLM powered Chatbot #3570
  • Make the JVM MaxRAMPercentage option configurable for all scala services via helm chart #3573
  • Allow different helmet config to be specified per request path & upgrade helmet to v8.0.0 #3572
  • A SQL console UI Allows the user to run SQL on tabular data files on client side #3571
  • Allow disabling frontend auto metadata extraction feature via config #3574
  • Avoid search engine entering idle mode #3576
  • fixes: authentication-plugin-sdk should not bundle the definition of auth-api-client as TypeScript treats classes with private properties as structurally incompatible, even if they have identical definitions (TS2345)
  • Remove unused & obsolete isAdmin field on the "users" table feature request #3575

Full Changelog: v4.2.4...v5.0.0-alpha.2

v5.0.0-alpha.1

09 Feb 07:29
Compare
Choose a tag to compare
v5.0.0-alpha.1 Pre-release
Pre-release

What's Changed

  • New hybrid search engine
  • In-browser Local LLM powered Chatbot
  • New SQLConcole allows to query data files using SQL queries
  • fixes: authentication-plugin-sdk should not bundle definition of auth-api-client as TypeScript treats classes with private properties as structurally incompatible, even if they have identical definitions (TS2345)

Full Changelog: v4.2.4...v5.0.0-alpha.1

v5.0.0-alpha.0

09 Feb 01:31
Compare
Choose a tag to compare
v5.0.0-alpha.0 Pre-release
Pre-release

What's Changed

  • New hybrid search engine
  • In-browser Local LLM powered Chatbot
  • New SQLConcole allow to query data files using SQL queries

Full Changelog: v4.2.4...v5.0.0-alpha.0

v4.2.4

23 Sep 08:26
Compare
Choose a tag to compare

What's Changed

  • #3559: Set conflicts to true when the Indexer performs the trim operation.
  • Increase indexer client connection idle-timeout to avoid encountering connection reset error for downloading large region files
  • Upgraded OpenSearch to v2.16.0
  • #3556: Serves robots.txt as content-type text/plain instead and other sitemap & crawler view related improvements.
  • #3564: Add rel="canonical" annotations to dataset & distribution page crawler views

Full Changelog: v4.2.3...v4.2.4

v4.2.4-alpha.1

16 Sep 06:47
Compare
Choose a tag to compare
v4.2.4-alpha.1 Pre-release
Pre-release

What's Changed

  • #3559: Set conflicts to true when Indexer performs the trim operation.
  • Increase indexer client connection idle-timeout to avoid encountering connection reset error for downloading large region files
  • Upgraded OpenSearch to v2.16.0
  • #3556: Serves robots.txt as content-type text/plain instead and other sitemap & crawler view related improvements.

Full Changelog: v4.2.3...v4.2.4-alpha.1

v4.2.4-alpha.0

30 Aug 12:32
Compare
Choose a tag to compare
v4.2.4-alpha.0 Pre-release
Pre-release

What's Changed

  • #3559 Set conflicts to true when the Indexer performs the trim operation
  • Increase indexer client connection idle-timeout to avoid encountering connection reset error for downloading large region files

Full Changelog: v4.2.3...v4.2.4-alpha.0

v4.2.3

26 Aug 07:20
Compare
Choose a tag to compare

What's New

  • #3554: Replace "schema" portion of URL alike string with [URL] in user-supplied content of any emails sent out
  • #3553: Make Chart Preview & Table Preview Configurable per Dataset

Full Changelog: v4.2.2...v4.2.3

v4.2.2

25 Aug 08:22
Compare
Choose a tag to compare

What's Changed

  • Removed the create-secrets deployment helper command-line tool script, as it has not been used since Magda v1.

Full Changelog: v4.2.1...v4.2.2