-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
executable file
·46 lines (43 loc) · 2.11 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Known problems / TODO:
-<base> confuses the web indexer because of this line in get_url:
$url = $response->base;
-"join parts of hyphenated words": warum eigentlich?!
-doesn't work anymore with mod_perl -> change documentation
-make -htmlmeta for pdftotext default?
-matches in title and CONTEXT_SIZE set -> no summary
-http://www.psyke.org/about/tech/patch.txt
-no "highlight" link in German template
-tools.pl, line 87:
return grep(/^$ext$/i, @HIGHLIGHT_EXT);
-> quote $ext?!
-error "not below $HTTP_LIMIT_URL" -> it's now $HTTP_LIMIT_URLS (note the "S")
-*.PDF ist per default als text indexiert wg. grossschreibung -> lc()
in filterFile()?
-does $CONTEXT_SIZE > 0 work with PDF files? (e.g.5465.pdf)
-visited links are not recognized by e.g. Mozilla because
both "+" and "%20" can be used to escape a space (Mozilla
uses "+", Perlfect Search uses "%20")
-redirected urls can be indexed twice (a second check after get_url() is needed)
-translated template files are not up-to-date
-"highlight matches" doesn't work with umlauts
WISHES:
-normalize config values wrt ending slash
-make the 65,000 files limit optional
-remove quotemeta in load_excludes() so that full regexp can be used?
-optionally use @EXT for http too (default to "yes"), so zip etc isn't downloaded at all
-make the template valid XHTML by allowing ##cgi: varname## or so
(isn't completely possible anway, b/c of generated attributes)
-"Searched yyy for xxx": the stopwords should not appear here
(??? google shows them, too)
-better test for visited links (hash for urls, another hash for md5 checksums?)
-test if @HTTP_CONTENT_TYPES option still works
-"use strict"
-fix "deep recursion" warning?!? perl warns if recursion > 100 :-(
-setup: automatically find pdftotext
-url rewriting function that can easily be modified by anybody
-make "follow symlink" optional
-make a case option (for http on windows servers)
-command line search always uses OR -> make an option?
-"in %.2f seconds" -> i18n
-general DEBUG info, not HTTP_DEBUG. show result of to_be_ignored() in indexer_filesystem.pl
-"getting the file '...' is not allowed": no highlighting (doesn't matter?)