racket src/main.rkt 2010-01.xml
It will print an output which is a list of questions of the selected exam with the answers:
-question (document struct)
-min-dist (float)
-best-article (document struct)
-best-answer (docuemnt struct)
You can just add more exams at data/raw/exams/
and call
them at racket src/main.rkt \<your-exam>
.
All the articles are saved at data/raw/articles/
.
If you need to change this path, you can pass a modifier
-a --articles-path
to change the articles path
-e --exams-path
to change the exams path
From root directory, install dependencies with
raco pkg install src
raco pkg install https://github.com/n3mo/data-science.git
if its packages are already intalled, update the dependencies
raco pkg update --link src
Struct that defines the node in graph
(struct node (document vector [neineighbors #:mutable #:auto])
#:auto-value (list)
#:transparent)
Function that returns Dijkstra algorithm from a distance function
(dij-from dist)
Transform into graph from question, answers and a list of intermediary layers of articles
(to-graph question answers . list-articles)
Calculates the shortest distance, the best article and the best answer of a graph with a question, an intermediate layer of articles and a final layer of answers
(get-distance-article-answer question articles answers)
(tf-idf corpus)
Calculates tf-idf vector for each Document
in corpus
and returns:
- a list of words/tokens found across all statments corresponding to each dimension on the tf-idf vector Space (The order of the list of tokens corresponds to the columns in the returned tf-idf)
- a list of Document wherin each
Document
'srep
field points to the respective tf-idf vector.
corpus
should be a list of two or more Documents
.
;Just a simple corpus
> (define i1 (item 'a "string item 1"))
> (define i2 (item 'b "string item 2"))
> (define doc-item1 (document i1))
> (define doc-item2 (document i2))
> (define doc-qt (document (question 1 'a "ethics" "string question 1" (list i1 i2))))
> (define doc-art (document (article "lei8096" 1 "string article 1")))
> (define corpus (list doc-qt doc-item1 doc-item2 doc-art))
;Convert a list of strings in a tf-idf matrix
> (tf-idf corpus)
(list
'("article" "string" "item" "question")
(list
(document (question 1 'a "ethics" "string question 1" (list (item 'a "string item 1" 1) (item 'b "string item 2" 1))) '#(0 0 0 0.30102999566398114))
(document (item 'a "string item 1" 1) '#(0 0 0.15051499783199057 0))
(document (item 'b "string item 2" 1) '#(0 0 0.15051499783199057 0))
(document (article "lei8096" 1 "string article 1") '#(0.30102999566398114 0 0 0))))
To execute coverage command, run:
raco cover -f html src
Aluno | Tarefas |
---|---|
Guilherme | Grafos (dijkstra), testes, cobertura de testes |
João | Juntar partes do projeto |
Hugo | TF-IDF & Data-Structures |
Pedro | Cálculo de Distâncias |
Alexandre | Parser do documento |