The third party library can be found in the local directory .\lib
.
Also you can download it your own with the links or use maven with the .\pom.xml
to add the library. All the libraries and its link(in Maven repository) are listed below.
The .\postsXML
directory is where you put such huge xml file Posts.xml
. To start the system, you can put the huge xml in the .\postsXML
directory or edit the class utils.Paths
in the code, which means change the POSTSPATH
to where the Posts.xml
file dataset exists in your computer.
After setting Posts.xml
directory, all python related questions and answers can be filtered by running FilterPythonMain.java
in the default package of scr/main/java
. When filtering is finished, python.xml
and pythonanswer.xml
will be created in .\filteredXML
directory. It may take serval hours for the filtering process.
Index is created with the two filtered xml file python.xml
and pythonanswer.xml
. Run PostIndexMain.java
in the default package of scr/main/java
, then index will be created in .\index\postindex
directory. It may takes 10 minutes to run create the whole index.
After index is created, we can query on the field that has been indexed. Run PostSearchMain
class in the default package of "src/main/java" to search. In the search process, the first step is to input the query, phrase query can be specified with double quotation marks. The picture below shows a example of a query.
Then specify the fields you want to query. The figures below shows query all fields and customized fields example.
Input "0" to query on all fields.
Input "1" for customizing fields, and then input the corresponding number to choose fields.
After query fields, user can determine top N results to be returned. Here is a example of a certain N.
Then two example results will be returned like below. Note: If you try to set a large returned query results number, please unselect the Limit console output
in Preference
-> Run/Debug
->Console
.
The IR applications can be found at exampleapp.app1
and exampleapp.app2
package, They can be initialized by running the Main.java
separately. After running, index and results will be created in .\exampleappindex\01
and .\exampleappindex\02
respectively.