Skip to content

Chr1st0p/SearchStackOverflow

Repository files navigation

PyFlowSearch - A Search Engine for Retrieving Python-Related Topic on Stack Overflow

1.Third party libary

The third party library can be found in the local directory .\lib.

Also you can download it your own with the links or use maven with the .\pom.xml to add the library. All the libraries and its link(in Maven repository) are listed below.

2.Set file directory

The .\postsXML directory is where you put such huge xml file Posts.xml. To start the system, you can put the huge xml in the .\postsXML directory or edit the class utils.Paths in the code, which means change the POSTSPATH to where the Posts.xml file dataset exists in your computer.

3.Filter python related questions and answers

After setting Posts.xml directory, all python related questions and answers can be filtered by running FilterPythonMain.java in the default package of scr/main/java. When filtering is finished, python.xml and pythonanswer.xml will be created in .\filteredXML directory. It may take serval hours for the filtering process.

4. Create index

Index is created with the two filtered xml file python.xml and pythonanswer.xml. Run PostIndexMain.java in the default package of scr/main/java, then index will be created in .\index\postindex directory. It may takes 10 minutes to run create the whole index.

5.Search

After index is created, we can query on the field that has been indexed. Run PostSearchMain class in the default package of "src/main/java" to search. In the search process, the first step is to input the query, phrase query can be specified with double quotation marks. The picture below shows a example of a query.

queryexample

Then specify the fields you want to query. The figures below shows query all fields and customized fields example.

Input "0" to query on all fields.

queryonall

Input "1" for customizing fields, and then input the corresponding number to choose fields.

queryonspecific

After query fields, user can determine top N results to be returned. Here is a example of a certain N.

topN

Then two example results will be returned like below. Note: If you try to set a large returned query results number, please unselect the Limit console output in Preference -> Run/Debug->Console .

queryallresult

phrasequeryspecificfields

6.Two IR Applications

The IR applications can be found at exampleapp.app1 and exampleapp.app2 package, They can be initialized by running the Main.java separately. After running, index and results will be created in .\exampleappindex\01 and .\exampleappindex\02 respectively.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages