Skip to content

lagner/QGumboParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

If you need to parse HTML page in Qt application it can be a problem. Qt doesn't have a HTML parser. You can use gumbo-parser developed by google but it was written in pure C and doesn't provide Qt-like interface. Therefore it is not so comfortable to work with. This small library solves the issue.

Quick Start

The easiest way to use QGumboParser is add it to your project as git submodule.
To add the library use the following steps:

  • Create Subdirs Project.
  • Add application subproject. "Qt Console Application" for example
  • Open project folder and create libs directory
  • Run git submodule add [email protected]:lagner/QGumboParser.git libs/QGumboParser in terminal.
  • Run git submodule update --init --recursive
  • Add SUBDIRS += libs/QGumboParser/QGumboParser into the root project. QGumboParser have to appear in your project tree
  • Right click to application project that need Html parser and hit Add library -> Internal library -> select QGumboParser in combobox. Click finish.

The library is ready to use.
Please pay attention to the library requires c++11 support(just add "CONFIG += c++11" into your .pro file).

Example

#include <QCoreApplication>
#include <QDebug>
#include <qgumbodocument.h>
#include <qgumbonode.h>


const char* HTML_PAGE = R"~("
<!DOCTYPE html>
<html>
  <head>
    <title>Title text</title>
    <meta content="">
    <style></style>
  </head>
  <body>
    <h3>First header</h3>
    <p>text text text</p>
    <div class="content">
        <h3>Nested header <a href="">with link</a></h3>
    </div>
  </body>
</html>
")~";

int main()
{
    auto doc = QGumboDocument::parse(HTML_PAGE);
    auto root = doc.rootNode();
    auto nodes = root.getElementsByTagName(HtmlTag::TITLE);
    Q_ASSERT(nodes.size() == 1);

    auto title = nodes.front();
    qDebug() << "title is: " << title.innerText();

    nodes = root.getElementsByTagName(HtmlTag::H3);
    for (const auto& node: nodes) {
        qDebug() << "h3: " << node.innerText();
    }

    auto container = root.getElementsByClassName("content");
    Q_ASSERT(container.size() == 1);

    auto children = container.front().children();
    for (const auto& node: children) {
        qDebug() << "Tag: " << node.tagName();
    }

    return 0;
}

License

MIT License. See LICENSE file

About

C++ wrapper for gumbo-parser library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •