Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work with pages? #25

Open
BA88 opened this issue Sep 18, 2017 · 8 comments
Open

Work with pages? #25

BA88 opened this issue Sep 18, 2017 · 8 comments

Comments

@BA88
Copy link

BA88 commented Sep 18, 2017

I am trying to use searchyll to add ElasticSearch (ES) capabilities to my Git-Page website. My Git-Pages site is made up of "pages" not "posts". So I wonder if that's the root of the issue? (Also, I use collections in my _config.yml file.)

I've gotten as far as trying to add documents to my ES database via jekyll build. I don't see that my pages are added.

Below I have included details of what I've done, but an overview is:

  1. Updated my _config.yml:
    a. Added searchyll gem.
    b. Added elasticsearch

  2. Updated my `_layouts/page.html' to include

<article>...</article>
  1. Run elasticsearch locally (for now)

  2. Run jekyll build
    a. I can see the indexing document puts output.
    b. I added some additional puts to searchyll.rb just in case. All seems okay.

  3. In my elasticsearch, I do not see any new messages
    a. I expected a message as each document is indexed into the ES database but nope

  4. GET _search returns nothing
    a. Not surprising

  5. To test my ES:
    a. I manually PUT
    b. I saw a message in my elasticsearch output
    c. I manually GET

My environment:

  • Mac OS Sierra v 10.12.16
  • Gems 2.0.0
  • searchyll 0.10.1
  • jekyll 3.4.5
  • elasticsearch 5.6.0

Details:

#-------
$ cat _config.yml
[ snip ]
# stuff BA added
gems: [
  jekyll-paginate, jekyll-feed, rouge, searchyll
]
 
elasticsearch:
  url: http://localhost:9200
  index_name: CSG-Wiki
  default_type: "page"          # Optional. Default type is "post".
 
collections:
  general:
    title: General
    output: true
    permalink: /:collection/:path/:title.html
 
#-------
$ cat _layouts/page.html
---
layout: default
---
 
<div class="page">
  <h1 class="page-title">{{ page.title }}</h1>
    <!-- this will be sent to elasticsearch, along with full page metadata -->
    <article class="page-content">
      {{ content }}
    </article>
</div>
 
#-------
$ elasticsearch --verbose
[2017-09-18T08:04:30,005][INFO ][o.e.n.Node               ] [] initializing ...
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [126.1gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] node name [7_61xZT] derived from node ID [7_61xZTTSr6bdGqad_FYTQ]; set [node.name] to override
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] version[5.6.0], pid[42125], build[781a835/2017-09-07T03:09:58.087Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/local/Cellar/elasticsearch/5.6.0/libexec]
[2017-09-18T08:04:30,706][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [aggs-matrix-stats]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [ingest-common]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-expression]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-groovy]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-mustache]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-painless]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [parent-join]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [percolator]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [reindex]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty3]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty4]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] no plugins loaded
[2017-09-18T08:04:31,815][INFO ][o.e.d.DiscoveryModule    ] [7_61xZT] using discovery type [zen]
[2017-09-18T08:04:32,191][INFO ][o.e.n.Node               ] initialized
[2017-09-18T08:04:32,192][INFO ][o.e.n.Node               ] [7_61xZT] starting ...
[2017-09-18T08:04:32,358][INFO ][o.e.t.TransportService   ] [7_61xZT] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2017-09-18T08:04:35,401][INFO ][o.e.c.s.ClusterService   ] [7_61xZT] new_master {7_61xZT}{7_61xZTTSr6bdGqad_FYTQ}{5kt0gbCuQZ2ZDrjkx6cImg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-09-18T08:04:35,419][INFO ][o.e.h.n.Netty4HttpServerTransport] [7_61xZT] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2017-09-18T08:04:35,419][INFO ][o.e.n.Node               ] [7_61xZT] started
[2017-09-18T08:04:35,529][INFO ][o.e.g.GatewayService     ] [7_61xZT] recovered [1] indices into cluster_state
[2017-09-18T08:04:35,675][INFO ][o.e.c.r.a.AllocationService] [7_61xZT] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[library][1]] ...]).
 
#-------
$ jekyll build
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
Incremental build: disabled. Enable with --incremental
      Generating...
        indexing document /general/AE_job_desc.html
        indexing document /general/index.html
        indexing document /general/setup_elasticsearch.html
        indexing document /general/setup_phone_cisco_unity.html
        [ snip ]
        indexing document /unix/setup_linux_analytics_server.html
        indexing page /404.html
        indexing page /atom.xml
        indexing page /
        indexing page /feed.xml
       Old indices:
                    done in 5.697 seconds.
Auto-regeneration: disabled. Use --watch to enable.
 
#-------
$ elasticsearch --verbose
[ no new messages ]
 
#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}
 
#-------
$ curl -X PUT 'localhost:9200/library/books/1?pretty' -H 'Content-Type: application/json' -d'
{
  "title" : "A fly on the wall",
  "name"  : {
    "first": "Drosophila",
    "last" : "Melanogaster"
  },
  "publish_date" : "2015-06-21T23:39:40-0400",
  "price"        : 19.95
}
'
 
# output:
{
  "_index" : "library",
  "_type" : "books",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
 
#-------
$ elasticsearch --verbose
[ new messages: ]
[2017-09-18T08:59:10,176][INFO ][o.e.c.m.MetaDataCreateIndexService] [7_61xZT] [library] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-09-18T08:59:10,252][INFO ][o.e.c.m.MetaDataMappingService] [7_61xZT] [library/qqN2Ig5uSQO7HBI94tp6fQ] create_mapping [books]
 
#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "books",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "A fly on the wall",
          "name" : {
            "first" : "Drosophila",
            "last" : "Melanogaster"
          },
          "publish_date" : "2015-06-21T23:39:40-0400",
          "price" : 19.95
        }
      }
    ]
  }
}
@robsears
Copy link
Member

It's possible that Searchyll isn't (yet) compatible with Elasticsearch 5.x. I would be interested to see a traffic capture between Jekyll and Elasticsearch. You could use socat for that:

  1. brew install socat
  2. Set url: http://localhost:9400 in _config.yml
  3. Run sudo socat -v TCP-LISTEN:9400,fork TCP:localhost:9200 &> data.log
  4. In another terminal window, re-run jekyll build
  5. Inspect the data.log file: cat data.log

In this case, socat will bind to port 9400 and it will take whatever traffic is sent to this port and pass it along to localhost:9200, then relay the response. As far as Jekyll knows, it's talking directly to Elasticsearch, but socat is a middleman logging everything that passes through it to data.log. There may be some interesting information in there that explains what's happening.

@BA88
Copy link
Author

BA88 commented Sep 18, 2017 via email

@BA88
Copy link
Author

BA88 commented Sep 18, 2017

I am attaching two (2) files here.

File data.txt is from running the five steps outlined above.

File data2.txt is from running a successful PUT. I ran these curl commands:

curl -XGET localhost:9400/_search?pretty
curl -XGET 'http://localhost:9400/_cat/health?v'
curl -X PUT 'localhost:9400/library/books/1?pretty' -H 'Content-Type: application/json' -d'
{
  "title" : "A fly on the wall",
  "name"  : {
    "first": "Drosophila",
    "last" : "Melanogaster"
  },
  "publish_date" : "2015-06-21T23:39:40-0400",
  "price"        : 19.95
}
'
curl -XGET localhost:9400/_search?pretty
curl -XDELETE localhost:9400/library
curl -XGET localhost:9400/_search?pretty

data.txt
data2.txt

@allizad
Copy link
Member

allizad commented Sep 18, 2017

@BA88 can you paste those .txt files in a Github gist and share a link? Thanks!

@BA88
Copy link
Author

BA88 commented Sep 18, 2017

data.txt: https://gist.github.com/BA88/d7bd52688328c99c9c81f19089a547b4
data2.txt: https://gist.github.com/BA88/ece564bb9859c75c9761003dcee3303a

I've never created gists before. Please let me know if these links don't work. Or if you need something else.

Thanks!

@BA88
Copy link
Author

BA88 commented Sep 27, 2017

Hi all. Do you think I should downgrade Elasticsearch 5.x if that is the issue with searchyll? If so, is there a version of Elasticsearch that you recommend?

I've looked around the searchyll code a bit. I don't know enough to determine where the problem is.

I have a professional goal of adding search ability to our GitHub Pages wiki. If there is something I can / should do, I'll do it! I hope to help however I can.

Thank you!

@robsears
Copy link
Member

Hey there,

We just pushed a change to authentication settings. Not sure if that's the root cause, but I do see auth exceptions in the logs. Want to pull the latest changes and test again?

@BA88
Copy link
Author

BA88 commented Oct 9, 2017

I incorporated the changes you made. I worked around a few errors until I got stuck again. I think I'm running different versions of things such that I encounter problems.

Problem 1 + Resolution:
in lib/searchyll/indexer.rb:
removed all [double quotes] from definition of update_aliases.body:

162,165c162,165
<         actions: [
<            { remove: { index: old_indices.join(','), alias: configuration.elasticsearch_index_base_name }},
<            { add:    { index: elasticsearch_index_name, alias: configuration.elasticsearch_index_base_name }}
<          ]
---
>         "actions": [
>           { "remove": { "index": old_indices.join(','), "alias": configuration.elasticsearch_index_base_name }},
>           { "add":    { "index": elasticsearch_index_name, "alias": configuration.elasticsearch_index_base_name }}
>         ]

Problem 2 + Resolution:
undefined method present?:
Fix -- added to lib/searchyll/indexer.rb:
require 'active_support/all'

Error:

$ jekyll build --trace
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
   1. begin searchyll.rb
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
     2. begin Jekyll::Hooks.register :site, :pre_render
/Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:104:in `http_request': undefined method `present?' for nil:NilClass (NoMethodError)
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:83:in `http_put'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:52:in `prepare_index'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:70:in `start'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll.rb:17:in `block in <top (required)>'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:98:in `call'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:98:in `block in trigger'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:97:in `each'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:97:in `trigger'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:188:in `render'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:69:in `process'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/command.rb:26:in `process_site'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:63:in `build'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:34:in `process'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:16:in `block (2 levels) in init_with_program'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `call'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/exe/jekyll:13:in `<top (required)>'
            from /Users/bfo7328/bin/jekyll:23:in `load'
            from /Users/bfo7328/bin/jekyll:23:in `<main>'

Problem 3 -- no resolution:
Error:
indexer.rb:138: stack level too deep

NB: I added debug puts statements to lib/searchyll/indexer.rb to help me ...

$ jekyll build --trace
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
   1. begin searchyll.rb
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
     2. begin Jekyll::Hooks.register :site, :pre_render
     5. begin Jekyll::Hooks.register :documents, :post_render
        indexing document /general/index.html
          5a. document.id /general/index
     5. begin Jekyll::Hooks.register :documents, :post_render
        indexing document /general/setup_elasticsearch.html
          5a. document.id /general/setup_elasticsearch
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /404.html
          4a. page.name 404.html
          4b. page.url  /404.html
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /atom.xml
          4a. page.name atom.xml
          4b. page.url  /atom.xml
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /
          4a. page.name index.html
          4b. page.url  /
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /feed.xml
          4a. page.name feed.xml
          4b. page.url  /feed.xml
     3. begin Jekyll::Hooks.register :site, :post_render
/Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:138: stack level too deep (SystemStackError)

While the above is running, my elasticsearch process does not "see" any of these pages.

What additional information can I provide?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants