Natural Language Query #3
bshambaugh
started this conversation in
Ideas
Replies: 1 comment
-
**Rationale: Further Areas to Explore:** |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Original: Google Groups - Natural Langauge Query
seeAlso : Wireframes
seeAlso: YouTube Wireframe Animation
ch1_1_nlq (Natural Language Query)
{transcript}
Slide 1: (EISPP_3p)
The following is a mockup for EISPP, or Enterprise Information System for Peer Production. It is based on linked data and semantic web technologies, and it is a decentralized and distributed data management platform with an interface to payment systems and applications, allowing for interoperability among them.
Slide 2: (EISPP_3p2e)
I can run a natural language query, say Car Projects with Brent Shambaugh.
Slide 3: (EISPP_3_M_Fernandez_NLQ_2)
And press the query button to execute the query which is based off of Miriam Fernandez Sanchez's Dissertation.
On the right under ontology instances I see triples in the form of subject-predicate-object
available from an ontology index as a result of the query.
On the top left, I see a ranked list of documents from a document index that semantically related to
the triples by means of an index containing annotations between them. Below ontology instances, I see 1st Degree Linked Data (or any degree specified by Preferences) related to the URI e.org/contribute_1 and possibly filtered by a fresnel lens.
In the center, I see a directed graph containing the triples under ontology instances and 1st degree linked data
(is constructed by D3.js).
On the bottom right I see a list of ontologies used by the triples with corresponding prefixes representing their
full length URIs.
Please note that for the moment I have not explored generalizing URIs as IRIs.
Slide 4) (EISPP_3_M_Fernandez_NLQ_2)
Miriam Fernandez Sanchez's Dissertation extends semantic search with techniques from the document based IR field. By using the technique of TF-IDF she creates an inverted document index and ontology entity index with Lucene. She also creates an annotation database for linking ontology instances with documents which she also weights with TF-IDF. This is key for extending semantic search. To deal with heterogenity, and make it easier for applications and users to interact with web scale underlying data, she uses a Semantic Web Gateway. This Gateway also feeds processed information for index construction. For natural language queries, she uses PowerAqua which has as a key part the PowerMap algorithm.
For experimental purposes, she creates her own Semantic Web Gateway called WebCore. She mentions another Semantic Web Gateway called Watson that she would like to move to. In her discussion about natural language queries, she mentions a Golden Standard which PowerAqua does not consider. It could allow for richer contextual queries.
Storage of the indicies for documents and ontology entities may be more appropriate in a distributed architecture. It may also be more appropriate if the annotation database is distributed. Perhaps distributed hash tables could be used.
Simple distributed hash tables rely on query flooding which creates an abundance of network traffic. This may not be what is desired, and since our system involves semantic data it may be appropriate to consider semantic social overlay networks to lessen traffic. A semantic social overlay network appropriate for unstructured P2P networks called INGA is described by Loser et al. It allows for clustering based on interest.
In addition, a triple repository could be added, which could be used for retrieving triples (including linked data) in the queries.
This may be more appropriate as a distributed architecture. SwarmLinda is used for triple retrieval in Sebastian Koske's Thesis through swarm intelligence and clustering of semantically similar things. It seems to be one of the most scalable solutions, and lacks centralized control.
Reasoning is used in semantic web gateways such as Watson, and it could be used by an implementation such as PowerMap to improve the semantic quality of queries. Dentler et al. proposed a way to scale reasoning to growing amounts of distributed and dynamic resources through a paper called Semantic Web Reasoning by Swarm Intelligence. This could be considered instead of reasoners built around more centralized solutions.
It seems likely that the final architecture will be peer-to-peer and oriented by swarm intelligence and organized semantically with groups of semantically similar things. In this way, network traffic will not be wasted looking in improbable places, the system may be more scalable, resilient, and small players may be able to exploit the systems functionality on mesh networks. For the meantime, centralized solutions may be easier for proof of concept at small scales. Centralized solutions such as Google's and Yahoo's indicies may be considered for a document index. Perhaps Yacy could be considered as an easy way to move toward peer-to-peer search with some decentralization and a distributed architecture.
According to Illya Rudomilov et al., in their paper Semantic peer-to-peer search engine, Yacy is not fully decentralized P2P since it uses 4 predefined servers with node lists. To make Yacy fully peer-to-peer, or any distributed index that is settled on, perhaps the node that hosts these lists, if they exist, should change. A possible answer comes from a Peer-to-Peer implementation of MapReduce in a paper by Marozzo et al. In this paper, they derive a model to address "node churn, master, failures, and job recovery" through master, slave, and user nodes with primary and backup masters. Perhaps the primary and backup master model could be used to inspire a way to decentralize the node list. And like INGA, part of the index could be at each peer. This index part might be used to find other peers with other parts of the index, and this node list could be used for bootstrapping.
In addition, to avoid making indexing such a heavy task, and make the indicies smaller, perhaps grouping by resources with the linked data platform could be considered.
References Include:
D3.js, http://d3js.org/
IIlya Rudomilov, Prof. Ivan Jelinek, Semantic P2P Search Engine, Proceedings on the Federated Conference on Computer Science and Information Systems, pg. 991 - 995, http://fedcsis.eucip.pl/proceedings/pliks/237.pdf
Marozzo, Fabrizio et al., P2P-MapReduce: Parallel data processing in dynamic Cloud environments, Journal of Computer and System Sciences, May 20, 2011, http://www.academia.edu/2821203
Michael Herrmann et al., Description of the YaCy Distributed Web Search Engine, iMinds, Leuven Belgium, National Chiao Tung University, Hsinchu, Taiwan, https://www.cosic.esat.kuleuven.be/publications/article-2459.pdf
Sebastian Koske, Swarm Approaches For Semantic Triple Clustering And Retrieval In Distributed RDF Spaces, Masters Thesis, Freie Universitat Berlin, Fachbereich Mathematik Und Informatik, Feburary 2009,
http://www.mi.fu-berlin.de/inf/publications/techreports/tr2009/B-09-04/TR-B-09-04.pdf?1346662692
Kathrin Dentler et al., Semantic Web Reasoning by Swarm Intelligence, Department of Artificial Intelligence, Vrije Universiteit Amsterdam, The Netherlands, http://www.few.vu.nl/~kdr250/publications/Reasoning-by-Swarm-Intelligence.pdf
Loser, Alexander et al., Semantic Social Overlay Networks, IEEE Journal on Selected Areas in Communication, Vol. 25, No. 1, January 2007, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.7668&rep=rep1&type=pdf
Mathieu d'Aquin et al., Watson: Supporting Next Generation Semantic Web Applications, Knowledge Media Institute, The Open University, Milton Keynes, UK, http://watson.kmi.open.ac.uk/DownloadsAndPublications_files/www-int07.pdf
Vanessa Lopez et al., PowerMap: Mapping the Real Semantic Web on the Fly, Knowledge Media Institute, The Open University, Milton Keynes, UK, http://technologies.kmi.open.ac.uk/aqualog/powerMap-iswc06-camera-ready.pdf
Vanessa Lopez, PowerAqua: Open Question Answering on the Semantic Web, Thesis, Semantic Web and Knowledge Services, 2011, The Open University, Milton Keynes, UK, http://technologies.kmi.open.ac.uk/poweraqua/thesis-master-viva.pdf
Miriam Fernandez Sanchez, Semantically enhanced Information Retrieval: an ontology-based approach, Dissertation, 2009, Universidad Autonoma, Madrid, http://nets.ii.uam.es/miriam/thesis.pdf
Beta Was this translation helpful? Give feedback.
All reactions