content.txt

01.	Title
02.	Goal:
Main goal: The main goal of the subgroup Data Extraction, is to create a knowledge extraction platform. These should unify the access to different tools for extraction and converting of any dataset to RDF as knowledge graph format. Find the effiecency of tools of extraction. In the next step the extracted knowledge should be reused from other tools of the DICE group.
\\\\
	First semester:
1.	Requirement phase: user stories, use cases, architecture & technology meetings
2.	Setup Eureka
3.	Implement Basic UI functionality
4.	Implement repository functionality
5.	Implement Fred, Fox, Sorokin, Cederic, OpenIE
6.	Implement basic database functionality
7.	Implement basic executer functionality
Second semester:
1.	Extend UI and executer functionality for complex workflows
2.	Extend Database functionality
3.	Implement Exception & Message handling
4.	Integrate chatbot and autoindex
5.	Implement TAIPAN
6.	Dockern of microservices
7.	Setup the staging machine
8.	Implement SASK-Commons
9.      Predict the efficieny of extractor using Machine Learning

03.	Software Architecture [Kevin Haack, André Sonntag]

04.	UI [Kevin Haack]
05.	Repository [André Sonntag]

06.	Extractors [André Sonntag, Kevin Haack, Sepide Tari, Suganya Kannan,]
Current implemented extractors: 
•	Cederic [Kevin]
•	DBPedia [Kevin]
•	FOX [André]
•	FRED [André]
•	Open-IE [Suganya]
•	Sorookin[Suganya, Kevin]
•	TAIPAN [André]

07.	Database [Sepide Tari, Suganya Kannan]

1. Set up database using Apache jena and Fuseki server. We use a common dataset named SASK.
2. Methods added to query the default graph and the named graph, update the dataset, getting the named graphs
and to process SPARQL queries.
3. Functionality is implemented to update AutoIndex once the updation is done on the dataset.
4. At first the triples were sent to the Fuseki server as a String. Later, the functionality is modified to transform the output of the extractors into a Jena Model and storing it into the database.


08.	Exception & Message handling 
1. The application logs everything on a central location. 
2. We use ELK(ElasticSearch, Logstash, Kibana) Stack for this purpose. 
3. Every microservice sends logs to a centralized location and logstash sends the logs to ElasticSearch.
4. The logs can be visualized in Kibana.
4. The ELK Stack has been deployed in the VM as well.

09.	Demo
---------------------------------------------------------------------------------
10.	Predict the efficieny of extractor using Machine Learning
1. Extract input sentences from training dataset  from oke dataset and give it to every extractor 
2. Take the responses of every extractors and store them
3. Transform the response of every extractor in unified form to make it comparable
4. Compre the output of extractor against the ideal answer of OKE 
5.  Select the classes based on the performance of every extractor
6. Build machine learning model which can predict the which extractor output is more efficient