Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
essiembre committed Aug 22, 2013
2 parents ac3e511 + 3907cfd commit 4a7e393
Show file tree
Hide file tree
Showing 9 changed files with 40 additions and 20 deletions.
4 changes: 0 additions & 4 deletions norconex-collector-http/TODO
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
The following are either things to be done, or ideas to consider:

- Add Norconex Commons Lang 1.0.1 everywhere before release 1.1.
- Use toSafeFileName and fromSafeFileName from classe FileUtil class to
ensure directories can be created without issues.

- Redo crawler event model so only 1 method needs to be implemented
and an event object is passed, allowing any custom event implementation
to be added, each event having a unique id.

- Packaging: Deploy the source and javadoc jars for dependent libs as well
(importer, commmitter, etc) and add link to their site in javadoc.

- DONE, need testing: Package source jars with Maven repository deployments.

- Consider BlockingQueue for fast memory access of queued URL, with the
Expand Down
4 changes: 2 additions & 2 deletions norconex-collector-http/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ along with Norconex HTTP Collector. If not, see <http://www.gnu.org/licenses/>.
<modelVersion>4.0.0</modelVersion>
<groupId>com.norconex.collectors</groupId>
<artifactId>norconex-collector-http</artifactId>
<version>1.1.0-SNAPSHOT</version>
<version>1.1.0</version>
<name>Norconex HTTP Collector</name>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<site.baseurl/>
<currentStableVersion>1.0.1</currentStableVersion>
<currentStableVersion>1.1.0</currentStableVersion>
</properties>
<inceptionYear>2009</inceptionYear>

Expand Down
2 changes: 1 addition & 1 deletion norconex-collector-http/src/changes/changes.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
</properties>
<body>

<release version="1.1.0" date="2013-??-??" description="Feature release.">
<release version="1.1.0" date="2013-08-21" description="Feature release.">
<action dev="essiembp" type="add">
Crawler now fires additional events. Added "documentRobotsMetaRejected"
and "documentImportRejected" methods to IHttpCrawlerEventListener.
Expand Down
11 changes: 11 additions & 0 deletions norconex-collector-http/src/main/resources/adsense.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<script type="text/javascript"><!--
google_ad_client = "ca-pub-4572498016361754";
/* Norconex HTTP Collector Site */
google_ad_slot = "8193066023";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
2 changes: 2 additions & 0 deletions norconex-collector-http/src/site/apt/configuration.apt.vm
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
Norconex Inc.
------

%{snippet|file=src/main/resources/adsense.txt|verbatim=false}

Configuration

<b>Note:</b> The following documentation covers version 1.1 or higher.
Expand Down
10 changes: 7 additions & 3 deletions norconex-collector-http/src/site/apt/download.apt.vm
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
Norconex Inc.
------

%{snippet|file=src/main/resources/adsense.txt|verbatim=false}

Download

<<WAIT!>> Don't forget you need a
Expand All @@ -30,13 +32,15 @@ Download

Latest releases

~~ * {{{TODO} 1.1.0-RC1.zip}} (Release Candidate 1)

* {{{http://norconex.s3.amazonaws.com/repo/release/com/norconex/collectors/norconex-collector-http/1.0.1/norconex-collector-http-1.0.1.zip} 1.0.1.zip}} (Stable)
* {{{http://norconex.s3.amazonaws.com/repo/release/com/norconex/collectors/norconex-collector-http/${currentStableVersion}/norconex-collector-http-${currentStableVersion}.zip} ${currentStableVersion}.zip}}

[]

Older releases:

* {{{http://norconex.s3.amazonaws.com/repo/release/com/norconex/collectors/norconex-collector-http/1.0.2/norconex-collector-http-1.0.2.zip} 1.0.2.zip}}

* {{{http://norconex.s3.amazonaws.com/repo/release/com/norconex/collectors/norconex-collector-http/1.0.1/norconex-collector-http-1.0.1.zip} 1.0.1.zip}}

* {{{http://norconex.s3.amazonaws.com/repo/release/com/norconex/collectors/norconex-collector-http/1.0.0/norconex-collector-http-1.0.0.zip} 1.0.0.zip}}

Expand Down
23 changes: 13 additions & 10 deletions norconex-collector-http/src/site/apt/index.apt.vm
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,16 @@
Norconex Inc.
------

%{snippet|file=src/main/resources/adsense.txt|verbatim=false}

Welcome to Norconex HTTP Collector

Norconex HTTP Collector is a web crawler that aims to make
Enterprise Search integrators and developers's life easier.

The current stable version is <<${currentStableVersion}>>.

Quick Links:
<<Quick Links:>>

* {{{./download.html}Download}}

Expand All @@ -37,10 +39,10 @@ Welcome to Norconex HTTP Collector

[]

* Version 1.1 Release Candidate 1 now available
* Version 1.1 now available

Upcoming 1.1 version will bring you new features we hope you will like.
You can {{{./download.html}download}} a "release candidate" version right
Version 1.1 brings you new features we hope you will like.
You can {{{./download.html}download}} this version right
away. Amongst new features or enhancements you will find:

* Faster and more constant crawling performance at high volume
Expand All @@ -55,7 +57,7 @@ Welcome to Norconex HTTP Collector

* Support for <<<ftp://>>> URLs.

* More. See {{{./changes-report.html}Release notes}} for a more complete list.
* See {{{./changes-report.html}Release notes}} for a complete list of changes.

[]

Expand All @@ -73,9 +75,9 @@ Welcome to Norconex HTTP Collector
as Enterprise Search integrators. While they all have
their strength and weaknesses, we always wished we could get our hands
on one that combines all the things we like, while minimizing many of the
reccurent pain points we kept experiencing. After years of waiting for it
recurrent pain points we kept experiencing. After years of waiting for it
we took matters into our own hands and the results is here. While at first
its main goal was to faciltate our own job as integrators, we now hope it
its main goal was to facilitate our own job, we now hope it
can benefit you too. Please be vocal about things you would like to see
included in future releases.

Expand Down Expand Up @@ -298,8 +300,8 @@ Welcome to Norconex HTTP Collector
[]

* <<Tested with millions of URLs>>\
A single crawler instance has been tested with a few million URLs without
issues.
Single or multiple crawler instances have been tested with millions
of web pages and documents without issues.

[]

Expand All @@ -318,7 +320,8 @@ Welcome to Norconex HTTP Collector
for, even though it may be one day (provided by Norconex or community).
For example:

* Does not focus on BigData.
* Does not focus on BigData (but has been tested with millions -- see feature
list above).

* Does not focus on preserving document structure.

Expand Down
2 changes: 2 additions & 0 deletions norconex-collector-http/src/site/apt/support.apt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
Norconex Inc.
------

%{snippet|file=src/main/resources/adsense.txt|verbatim=false}

Support Options

Community support is available on
Expand Down
2 changes: 2 additions & 0 deletions norconex-collector-http/src/site/apt/usage.apt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
Norconex Inc.
------

%{snippet|file=src/main/resources/adsense.txt|verbatim=false}

Usage

Use the Norconex HTTP Collector as a command-line application or java library.
Expand Down

0 comments on commit 4a7e393

Please sign in to comment.