Skip to content

Commit

Permalink
#579 remove rdf search
Browse files Browse the repository at this point in the history
  • Loading branch information
joepio committed Feb 4, 2023
1 parent 59ac5f0 commit 3c1a427
Show file tree
Hide file tree
Showing 8 changed files with 30 additions and 180 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ See [STATUS.md](server/STATUS.md) to learn more about which features will remain
- Refactor static file asset hosting #578
- Meta tags server side #577
- Include JSON-AD in initial response, speed up first render #511
- Remove feature to index external RDF files and search them #579

## [v0.34.0] - 2022-10-31

Expand Down
11 changes: 3 additions & 8 deletions server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ https://user-images.githubusercontent.com/2183313/139728539-d69b899f-6f9b-44cb-a
- [Table of contents](#table-of-contents)
- [When should you use this](#when-should-you-use-this)
- [When _not_ to use this](#when-not-to-use-this)
- [Installation & getting started](#installation--getting-started)
- [Installation \& getting started](#installation--getting-started)
- [1. Run using docker](#1-run-using-docker)
- [2. Install desktop build (macOS only)](#2-install-desktop-build-macos-only)
- [3. Run pre-compiled binary](#3-run-pre-compiled-binary)
Expand All @@ -45,19 +45,19 @@ https://user-images.githubusercontent.com/2183313/139728539-d69b899f-6f9b-44cb-a
- [Initial setup and configuration](#initial-setup-and-configuration)
- [Running using a tunneling service (easy mode)](#running-using-a-tunneling-service-easy-mode)
- [HTTPS Setup on a VPS (static IP required)](#https-setup-on-a-vps-static-ip-required)
- [HTTPS Setup using external HTTPS proxy](#https-setup-using-external-https-proxy)
- [Usage](#usage)
- [Using Atomic-Server with the browser GUI](#using-atomic-server-with-the-browser-gui)
- [Use `atomic-cli` as client](#use-atomic-cli-as-client)
- [API](#api)
- [FAQ & Troubleshooting](#faq--troubleshooting)
- [FAQ \& Troubleshooting](#faq--troubleshooting)
- [Can / should I create backups?](#can--should-i-create-backups)
- [I lost the key / secret to my Root Agent, and the `/setup` invite is no longer usable! What now?](#i-lost-the-key--secret-to-my-root-agent-and-the-setup-invite-is-no-longer-usable-what-now)
- [How do I migrate my data to a new domain?](#how-do-i-migrate-my-data-to-a-new-domain)
- [How do I reset my database?](#how-do-i-reset-my-database)
- [How do I make my data private, yet available online?](#how-do-i-make-my-data-private-yet-available-online)
- [Items are missing in my Collections / Search results](#items-are-missing-in-my-collections--search-results)
- [I get a `failed to retrieve` error when opening](#i-get-a-failed-to-retrieve-error-when-opening)
- [What is `rdf-search` mode?](#what-is-rdf-search-mode)
- [Can I embed Atomic-Server in another application?](#can-i-embed-atomic-server-in-another-application)
- [Where is my data stored on my machine?](#where-is-my-data-stored-on-my-machine)

Expand Down Expand Up @@ -279,11 +279,6 @@ Also, if you can, recreate and describe the indexing issue in the issue tracker,

Try re-initializing atomic server `atomic-server --initialize`.

### What is `rdf-search` mode?

This turns `atomic-server` into a full-text search server that indexed RDF Turtle documents.
Check out [the readme](./rdf-search.md).

### Can I embed Atomic-Server in another application?

Yes. This is what I'm doing with the Tauri desktop distribution of Atomic-Server.
Expand Down
10 changes: 0 additions & 10 deletions server/example_requests.http
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,6 @@ Accept: application/ld+json
GET http://localhost:9883/search?q=Foo&include=true HTTP/1.1
Accept: application/ld+json

### Index at (RDF) document for search
POST http://localhost:9883/search HTTP/1.1
Content-Type: text/turtle

@prefix schema: <http://schema.org/> .
<http://example.com/foo> a schema:Person ;
schema:name "Foo" .
<http://example.com/bar> a schema:Person ;
schema:name "asdfsajhdfgbasdf" .

### Send a Commit
### The hard part here is setting the correct signature.
### Use a library (@tomic/lib for JS, and atomic_lib for Rust).
Expand Down
56 changes: 0 additions & 56 deletions server/rdf-search.md

This file was deleted.

4 changes: 0 additions & 4 deletions server/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,6 @@ pub struct Opts {
#[clap(long, env = "ATOMIC_DATA_DIR")]
pub data_dir: Option<PathBuf>,

/// CAUTION: Makes data publicly readable on the `/search` endpoint. When enabled, it allows POSTing to the /search endpoint and returns search results as single triples, without performing authentication checks. See https://github.com/atomicdata-dev/atomic-data-rust/blob/master/server/rdf-search.md
#[clap(long, env = "ATOMIC_RDF_SEARCH")]
pub rdf_search: bool,

/// By default, Atomic-Server keeps previous versions of resources indexed in Search. When enabling this flag, previous versions of resources are removed from the search index when their values are updated.
#[clap(long, env = "ATOMIC_REMOVE_PREVIOUS_SEARCH")]
pub remove_previous_search: bool,
Expand Down
114 changes: 23 additions & 91 deletions server/src/handlers/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ pub async fn search_query(
.search(&query, &TopDocs::with_limit(initial_results_limit))
.map_err(|e| format!("Error with creating search results: {} ", e))?;

let (subjects, _atoms) = docs_to_resources(top_docs, &fields, &searcher)?;
let subjects = docs_to_resources(top_docs, &fields, &searcher)?;

// Create a valid atomic data resource.
// You'd think there would be a simpler way of getting the requested URL...
Expand All @@ -101,92 +101,35 @@ pub async fn search_query(
let mut results_resource = atomic_lib::plugins::search::search_endpoint().to_resource(store)?;
results_resource.set_subject(subject.clone());

if appstate.config.opts.rdf_search {
// Always return all subjects in `--rdf-search` mode, don't do authentication
results_resource.set_propval(urls::ENDPOINT_RESULTS.into(), subjects.into(), store)?;
} else {
// Default case: return full resources, do authentication
let mut resources: Vec<Resource> = Vec::new();

// This is a pretty expensive operation. We need to check the rights for the subjects to prevent data leaks.
// But we could probably do some things to speed this up: make it async / parallel, check admin rights.
// https://github.com/atomicdata-dev/atomic-data-rust/issues/279
// https://github.com/atomicdata-dev/atomic-data-rust/issues/280
let for_agent = crate::helpers::get_client_agent(req.headers(), &appstate, subject)?;
for s in subjects {
match store.get_resource_extended(&s, true, for_agent.as_deref()) {
Ok(r) => {
if resources.len() < limit {
resources.push(r);
} else {
break;
}
}
Err(_e) => {
tracing::debug!("Skipping search result: {} : {}", s, _e);
continue;
// Default case: return full resources, do authentication
let mut resources: Vec<Resource> = Vec::new();

// This is a pretty expensive operation. We need to check the rights for the subjects to prevent data leaks.
// But we could probably do some things to speed this up: make it async / parallel, check admin rights.
// https://github.com/atomicdata-dev/atomic-data-rust/issues/279
// https://github.com/atomicdata-dev/atomic-data-rust/issues/280
let for_agent = crate::helpers::get_client_agent(req.headers(), &appstate, subject)?;
for s in subjects {
match store.get_resource_extended(&s, true, for_agent.as_deref()) {
Ok(r) => {
if resources.len() < limit {
resources.push(r);
} else {
break;
}
}
Err(_e) => {
tracing::debug!("Skipping search result: {} : {}", s, _e);
continue;
}
}
results_resource.set_propval(urls::ENDPOINT_RESULTS.into(), resources.into(), store)?;
}
results_resource.set_propval(urls::ENDPOINT_RESULTS.into(), resources.into(), store)?;
let mut builder = HttpResponse::Ok();
// TODO: support other serialization options
Ok(builder.body(results_resource.to_json_ad()?))
}

/// Posts an N-Triples RDF document to index the triples in search
#[tracing::instrument(skip(appstate))]
pub async fn search_index_rdf(
appstate: web::Data<AppState>,
body: String,
) -> AtomicServerResult<HttpResponse> {
// Parse Turtle
use rio_api::parser::TriplesParser;
use rio_turtle::{TurtleError, TurtleParser};

let mut writer = appstate.search_state.writer.write()?;
let fields = crate::search::get_schema_fields(&appstate.search_state)?;

TurtleParser::new(body.as_ref(), None)
.parse_all(&mut |t| {
match (
get_inner_value(t.subject.into()),
get_inner_value(t.predicate.into()),
get_inner_value(t.object),
) {
(Some(s), Some(p), Some(o)) => {
crate::search::add_triple(&writer, s, p, o, None, &fields).ok();
}
_ => return Ok(()),
};
Ok(()) as Result<(), TurtleError>
})
.map_err(|e| format!("Error parsing turtle: {}", e))?;

// Store the changes to the writer
writer.commit()?;
let mut builder = HttpResponse::Ok();
Ok(builder.body("Added turtle to store"))
}

// Returns the innver value of a Term in an RDF triple. If it's a blanknode or triple inside a triple, it will return None.
use rio_api::model::Term;
fn get_inner_value(t: Term) -> Option<String> {
match t {
Term::Literal(lit) => match lit {
rio_api::model::Literal::Simple { value } => Some(value.into()),
rio_api::model::Literal::LanguageTaggedString { value, language: _ } => {
Some(value.into())
}
rio_api::model::Literal::Typed { value, datatype: _ } => Some(value.into()),
},
Term::NamedNode(nn) => Some(nn.iri.into()),
Term::BlankNode(_bn) => None,
Term::Triple(_) => None,
}
}

#[derive(Debug, std::hash::Hash, Eq, PartialEq)]
pub struct StringAtom {
pub subject: String,
Expand Down Expand Up @@ -286,29 +229,18 @@ fn docs_to_resources(
docs: Vec<(f32, tantivy::DocAddress)>,
fields: &Fields,
searcher: &tantivy::LeasedItem<tantivy::Searcher>,
) -> Result<(Vec<String>, Vec<StringAtom>), AtomicServerError> {
) -> Result<Vec<String>, AtomicServerError> {
let mut subjects: HashSet<String> = HashSet::new();
// These are not used at this moment, but would be quite useful in RDF context.
let mut atoms: HashSet<StringAtom> = HashSet::new();

// convert found documents to resources
for (_score, doc_address) in docs {
let retrieved_doc = searcher.doc(doc_address)?;
let subject_val = retrieved_doc.get_first(fields.subject).ok_or("No 'subject' in search doc found. This is required when indexing. Run with --rebuild-index")?;
let prop_val = retrieved_doc.get_first(fields.property).ok_or("No 'property' in search doc found. This is required when indexing. Run with --rebuild-index")?;
let value_val = retrieved_doc.get_first(fields.value).ok_or("No 'value' in search doc found. This is required when indexing. Run with --rebuild-index")?;

let subject = unpack_value(subject_val, &retrieved_doc, "Subject".to_string())?;
let property = unpack_value(prop_val, &retrieved_doc, "Property".to_string())?;
let value = unpack_value(value_val, &retrieved_doc, "Value".to_string())?;

subjects.insert(subject.clone());
atoms.insert(StringAtom {
subject,
property,
value,
});
}

Ok((subjects.into_iter().collect(), atoms.into_iter().collect()))
Ok(subjects.into_iter().collect())
}
12 changes: 2 additions & 10 deletions server/src/routes.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//! Contains routing logic, sends the client to the correct handler.
//! We should try to minimize what happens in here, since most logic should be defined in Atomic Data - not in the server itself.
use crate::{config::Config, content_types, handlers};
use crate::{content_types, handlers};
use actix_web::{guard, http::Method, web};
use actix_web_static_files::ResourceFiles;

Expand All @@ -15,7 +15,7 @@ include!(concat!(env!("OUT_DIR"), "/generated.rs"));
/// Set up the Actix server routes. This defines which paths are used.
// Keep in mind that the order of these matters. An early, greedy route will take
// precedence over a later route.
pub fn config_routes(app: &mut actix_web::web::ServiceConfig, config: &Config) {
pub fn config_routes(app: &mut actix_web::web::ServiceConfig) {
app.service(web::resource("/ws").to(handlers::web_sockets::web_socket_handler))
.service(web::resource("/download/{path:[^{}]+}").to(handlers::download::handle_download))
// This `generate` imports the static files from the `app_assets` folder
Expand Down Expand Up @@ -45,14 +45,6 @@ pub fn config_routes(app: &mut actix_web::web::ServiceConfig, config: &Config) {
.guard(guard::Method(Method::GET))
.to(handlers::search::search_query),
);
if config.opts.rdf_search {
tracing::info!("RDF search enabled. You can POST to /search to index RDF documents.");
app.service(
web::resource("/search")
.guard(guard::Method(Method::POST))
.to(handlers::search::search_index_rdf),
);
}
app.service(web::resource(ANY).to(handlers::resource::handle_get_resource))
// Also allow the home resource (not matched by the previous one)
.service(web::resource("/").to(handlers::resource::handle_get_resource));
Expand Down
2 changes: 1 addition & 1 deletion server/src/serve.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ pub async fn serve(config: crate::config::Config) -> AtomicServerResult<()> {
.wrap(tracing_actix_web::TracingLogger::default())
.wrap(middleware::Compress::default())
// Here are the actual handlers / endpoints
.configure(|app| crate::routes::config_routes(app, &appstate.config))
.configure(crate::routes::config_routes)
.default_service(web::to(|| {
tracing::error!("Wrong route, should not happen with normal requests");
actix_web::HttpResponse::NotFound()
Expand Down

0 comments on commit 3c1a427

Please sign in to comment.