SPA 的 SEO 解决方案 #1

sweeetcc · 2016-05-28T09:41:07Z

SPA 的 SEO 解决方案

这一篇放了最久，因为这是一个很老的话题了，大约在我刚刚接触 Angular 的时候，需要处理 Angular 项目的 SEO 问题，实践了一个解决方案，也在实践过程中遇到了很多问题，丰富了这个解决方案。

背景：

当时是 Angular 的时代，还没有流行起来使用 React 和 Vue，也就没有现在的 Server-Render 和 isonomorphic（同构）的解决方案。所以当时的解决方案放在现在的单页面应用上能够有用，但是已经不再推荐了。

方案：

用一张图来表示的话：

技术：

基于以上的过程，确定基本使用以下技术来实现：

基于 PhantomJS + Express
Nginx 判断 UA 实现爬虫请求转发
Express 作为 SEO 服务服务器接收和处理请求
PhantomJS 作为抓取工具生成网页
Redis / 文件存储实现网页缓存的存取删操作

遇到的问题：

静态文件如 js、css、img的请求程序退出问题，解决方案就是从请求里过滤掉这些请求；
存取时采用 key-value，key 的格式设定问题；
由于网页内容较多超出命令行 stdout 流大小；
redis 连接丢失重连时间设定 connect_timeout；

后面几个问题基本是设置上的问题。

后来，运行的过程中当然也出现了一些问题：

预爬取不到位，会在爬虫到来时产生很多实时爬取，这样机器的 CPU 和内存消耗飙升；
使用 redis 存取几十万个页面缓存，很快 redis 的存储被占满，而缓存时间又不能太短，所以决定采用文件存储；
使用文件存储又遇到问题，加入都存在同一个目录下，Linux 文件系统对于单个目录下存取的文件数量有限制（单个目录最多存储31998个？），所以采用了以下解决方案：

取得 URL hash 值；
取得 hash 值前十位；
根据前十位创建十级目录；
最后存取文件到最深层目录；

function FileSystem(options) {
  this.options = options;
  mkdirp.sync(this.options.snapshotFolder);
}

/**
 * 从路径中获取 html snapshot
 */
FileSystem.prototype.get = function (hashedPath, callback) {
  var filePath = this.options.snapshotFolder + '/' + hashedPath.substring(0, 10).split('').join('/') + '/' + hashedPath;

  fs.readFile(filePath, 'utf8', function (err, data) {
    if (err) {
      callback(err);
    } else {
      callback(err, data);
    }
  });
};

/**
 * 向路径中存储 html snapshot
 */
FileSystem.prototype.set = function (hashedPath, snapshot, callback) {
  var that = this;
  mkdirp(that.options.snapshotFolder + '/' + hashedPath.substring(0, 10).split('').join('/'), function () {
    var filePath = that.options.snapshotFolder + '/' + hashedPath.substring(0, 10).split('').join('/') + '/' + hashedPath;

    fs.writeFile(filePath, snapshot, function (err, data) {
      callback(err, data);
    });
  });
};

/**
 * 从路径中删除 html snapshot
 */
FileSystem.prototype.del = function (hashedPath, callback) {
  var filePath = this.options.snapshotFolder + '/' + hashedPath.substring(0, 10).split('').join('/') + '/' + hashedPath;

  fs.unlink(filePath, callback);
};

module.exports = function (options) {
  return new FileSystem(options);
};

好啦，这个小小的总结完成了一个心愿。

sweeetcc added question and removed question labels May 28, 2016

sweeetcc changed the title ~~Markdown test~~ SPA 的 SEO 解决方案 May 28, 2016

sweeetcc added the 解决问题 label May 28, 2016

sweeetcc self-assigned this Mar 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPA 的 SEO 解决方案 #1

SPA 的 SEO 解决方案 #1

sweeetcc commented May 28, 2016 •

edited

Loading

SPA 的 SEO 解决方案 #1

SPA 的 SEO 解决方案 #1

Comments

sweeetcc commented May 28, 2016 • edited Loading

SPA 的 SEO 解决方案

背景：

方案：

技术：

遇到的问题：

sweeetcc commented May 28, 2016 •

edited

Loading