Skip to content

Commit

Permalink
Merge branch 'develop' for MeTA v2.3.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
Chase Geigle committed Aug 2, 2016
2 parents 5d726cf + 2871f18 commit 57382ba
Show file tree
Hide file tree
Showing 49 changed files with 269 additions and 132 deletions.
6 changes: 5 additions & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,20 @@ os: Visual Studio 2015

install:
- set PATH=C:\msys64\usr\bin;%PATH%
- set MSYSTEM=MINGW64
- bash -lc ""
- bash -lc "pacman --noconfirm --needed -Sy bash pacman pacman-mirrors msys2-runtime msys2-runtime-devel"
# we don't actually need ada, fortran, libgfortran, or objc, but in
# order to update gcc we need to also update those packages as well...
- bash -lc "pacman --noconfirm -S mingw-w64-x86_64-{gcc,gcc-ada,gcc-fortran,gcc-libgfortran,gcc-objc,cmake,make,icu,jemalloc,zlib}"
before_build:
- set MSYSTEM=MINGW64
- cd C:\projects\meta
- git submodule update --init --recursive
- bash -lc "export PATH=/mingw64/bin:$PATH && cd $APPVEYOR_BUILD_FOLDER && mkdir build && cd build && cmake .. -G \"MSYS Makefiles\""
build_script:
- bash -lc "export PATH=/mingw64/bin:$PATH && cd $APPVEYOR_BUILD_FOLDER/build && make"
- set MSYSTEM=MINGW64
- bash -lc "export PATH=/mingw64/bin:$PATH && cd $APPVEYOR_BUILD_FOLDER/build && make -j2"
test_script:
- set MSYSTEM=MINGW64
- bash -lc "export PATH=/mingw64/bin:$PATH && cd $APPVEYOR_BUILD_FOLDER/build && cp ../config.toml . && ./unit-test --reporter=spec"
18 changes: 17 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ language: cpp

sudo: false

cache:
directories:
deps/icu

addons:
apt:
packages: &default-packages
Expand Down Expand Up @@ -49,6 +53,18 @@ matrix:
- gcc-5
- g++-5

# Linux/GCC 6
- os: linux
env: COMPILER=gcc GCC_VERSION=6
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- *default-packages
- gcc-6
- g++-6

# Linux/Clang 3.6
- os: linux
env: COMPILER=clang CLANG_VERSION=3.6
Expand Down Expand Up @@ -81,7 +97,7 @@ matrix:
osx_image: xcode7.2
env: COMPILER=clang

# OS X/GCC 5
# OS X/GCC 6
- os: osx
env: COMPILER=gcc

Expand Down
57 changes: 56 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,57 @@
# [v2.3.0][2.3.0]
## New features
- Forward and inverted indexes are now stored in one directory. **To make
use of your existing indexes, you will need to move their
directories.** For example, a configuration that used to look like the
following

```toml
dataset = "20newsgroups"
corpus = "line.toml"
forward-index = "20news-fwd"
inverted-index = "20news-inv"
```

will now look like the following

```toml
dataset = "20newsgroups"
corpus = "line.toml"
index = "20news-index"
```

and your folder structure should now look like

```
20news-index
├── fwd
└── inv
```

You can do this by simply moving the old folders around like so:

```bash
mkdir 20news-index
mv 20news-fwd 20news-index/fwd
mv 20news-inv 20news-index/inv
```
- `stats::multinomial` now can report the number of unique event types
counted (`unique_events()`)
- `std::vector` can now be hashed via `hash_append`.

## Bug fixes
- Fix rounding bug in language model-based rankers. This bug caused
severely degraded performance for these rankers with short queries. The
unit tests have been improved to prevent such a regression in the
future.

## Enhancements
- The bundled ICU version has been bumped to ICU 57.1.
- MeTA will now attempt to build its own version of ICU on Windows if it
fails to find a suitable ICU installed.
- CI support for GCC 6.x was added for all three major platforms.
- CI support also uses a fixed version of LLVM/libc++ instead of trunk.

# [v2.2.0][2.2.0]
## New features
- Parallelized versions of PageRank and Personalized PageRank have been
Expand Down Expand Up @@ -381,7 +435,8 @@
# [v1.0][1.0]
- Initial release.

[unreleased]: https://github.com/meta-toolkit/meta/compare/v2.2.0...develop
[unreleased]: https://github.com/meta-toolkit/meta/compare/v2.3.0...develop
[2.3.0]: https://github.com/meta-toolkit/meta/compare/v2.2.0...v2.3.0
[2.2.0]: https://github.com/meta-toolkit/meta/compare/v2.1.0...v2.2.0
[2.1.0]: https://github.com/meta-toolkit/meta/compare/v2.0.1...v2.1.0
[2.0.1]: https://github.com/meta-toolkit/meta/compare/v2.0.0...v2.0.1
Expand Down
8 changes: 4 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/deps/meta-cmake/)

# We require Unicode 8 for the unit tests, which was added in ICU 56.1
FindOrBuildICU(
VERSION 56.1
URL http://download.icu-project.org/files/icu4c/56.1/icu4c-56_1-src.tgz
URL_HASH MD5=c4a2d71ff56aec5ebfab2a3f059be99d
VERSION 57.1
URL http://download.icu-project.org/files/icu4c/57.1/icu4c-57_1-src.tgz
URL_HASH MD5=976734806026a4ef8bdd17937c8898b9
)

add_library(meta-definitions INTERFACE)
Expand All @@ -54,7 +54,7 @@ if(UNIX OR MINGW)
target_compile_options(meta-definitions INTERFACE -Wall -Wextra -pedantic)

if (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
SetClangOptions()
SetClangOptions(meta-definitions)
endif()
endif()

Expand Down
2 changes: 1 addition & 1 deletion STYLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ void myclass::member(const type&);
- Prefer `enum class` (strongly typed `enum`s).
- Prefer no pointer over `unique_ptr` over `shared_ptr`.
- Do not use `rand()` [deprecated in
C++14](www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3841.pdf).
C++14](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3841.pdf).
- Use `#ifndef META_FILE_NAME_H_` for double inclusion guards.
- `#define` kept to a minimum, and ALL_CAPS_SNAKE if used.
- Lines should be no longer than 80 characters
Expand Down
4 changes: 1 addition & 3 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ query-path = "../queries.txt" # create this file

dataset = "ceeaus"
corpus = "line.toml" # located inside dataset folder
forward-index = "ceeaus-fwd"
inverted-index = "ceeaus-inv"
index = "ceeaus"
indexer-ram-budget = 1024 # **estimated** RAM budget for indexing in MB
# always set this lower than your physical RAM!
# indexer-num-threads = 8 # default value is system thread concurrency
Expand All @@ -32,7 +31,6 @@ method = "one-vs-all"
[classifier.base]
method = "sgd"
loss = "hinge"
prefix = "sgd-model"

[lda]
inference = "gibbs"
Expand Down
2 changes: 1 addition & 1 deletion deps/meta-cmake
2 changes: 1 addition & 1 deletion deps/meta-stlsoft
2 changes: 1 addition & 1 deletion include/meta/corpus/corpus.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class corpus
/**
* @return the corpus' metadata schema
*/
virtual metadata::schema schema() const;
virtual metadata::schema_type schema() const;

/**
* Destructor.
Expand Down
2 changes: 1 addition & 1 deletion include/meta/corpus/file_corpus.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class file_corpus : public corpus
/**
* @return the metadata schema for this corpus
*/
metadata::schema schema() const override;
metadata::schema_type schema() const override;

private:
/// the current document we are on
Expand Down
2 changes: 1 addition & 1 deletion include/meta/corpus/libsvm_corpus.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ class libsvm_corpus : public corpus

uint64_t size() const override;

metadata::schema schema() const override;
metadata::schema_type schema() const override;

private:
/// The current document we are on
Expand Down
16 changes: 12 additions & 4 deletions include/meta/corpus/metadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ class metadata

// I want the below to be a const field_info, but g++ gives a cryptic
// compiler error in that case... clang++ accepts it just fine. -sigh-
using schema = std::vector<field_info>;
using schema_type = std::vector<field_info>;

metadata(const char* start, const schema& sch)
metadata(const char* start, const schema_type& sch)
: schema_{&sch}, start_{start}
{
// nothing
Expand Down Expand Up @@ -124,6 +124,14 @@ class metadata
return util::nullopt;
}

/**
* Returns the schema for this metadata object.
*/
const schema_type& schema() const
{
return *schema_;
}

/**
* Tagged union to represent a single metadata field.
*/
Expand Down Expand Up @@ -303,7 +311,7 @@ class metadata
};

/// pointer to the metadata_file's schema
const schema* schema_;
const schema_type* schema_;

/// the start of the metadata within the metadata_file
const char* start_;
Expand All @@ -314,7 +322,7 @@ class metadata
* @param config The configuration group that specifies the metadata
* @return the corresponding metadata::schema object.
*/
metadata::schema metadata_schema(const cpptoml::table& config);
metadata::schema_type metadata_schema(const cpptoml::table& config);

/**
* Exception class for metadata operations.
Expand Down
6 changes: 3 additions & 3 deletions include/meta/corpus/metadata_parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class metadata_parser
* @param filename The name of the file to parse
* @param schema The schema to parse the file with
*/
metadata_parser(const std::string& filename, metadata::schema schema);
metadata_parser(const std::string& filename, metadata::schema_type schema);

/**
* @return the metadata vector for the next document in the file
Expand All @@ -42,14 +42,14 @@ class metadata_parser
/**
* @return the schema for the metadata in this file
*/
const metadata::schema& schema() const;
const metadata::schema_type& schema() const;

private:
/// the parser used to extract metadata
io::mifstream infile_;

/// the schema for the metadata being extracted
metadata::schema schema_;
metadata::schema_type schema_;
};
}
}
Expand Down
25 changes: 25 additions & 0 deletions include/meta/hashing/hash.h
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,14 @@ template <class HashAlgorithm, class T1, class T2, class... Ts>
void hash_append(HashAlgorithm& h, const T1& first, const T2& second,
const Ts&... ts);

template <class HashAlgorithm, class T, class Alloc>
typename std::enable_if<is_contiguously_hashable<T>::value>::type
hash_append(HashAlgorithm& h, const std::vector<T, Alloc>& v);

template <class HashAlgorithm, class T, class Alloc>
typename std::enable_if<!is_contiguously_hashable<T>::value>::type
hash_append(HashAlgorithm& h, const std::vector<T, Alloc>& v);

// begin implementations for hash_append

template <class HashAlgorithm, class T, std::size_t N>
Expand Down Expand Up @@ -258,6 +266,23 @@ hash_append(HashAlgorithm& h, const std::basic_string<Char, Traits, Alloc>& s)
hash_append(h, s.size());
}

template <class HashAlgorithm, class T, class Alloc>
typename std::enable_if<is_contiguously_hashable<T>::value>::type
hash_append(HashAlgorithm& h, const std::vector<T, Alloc>& v)
{
h(v.data(), v.size() * sizeof(T));
hash_append(h, v.size());
}

template <class HashAlgorithm, class T, class Alloc>
typename std::enable_if<!is_contiguously_hashable<T>::value>::type
hash_append(HashAlgorithm& h, const std::vector<T, Alloc>& v)
{
for (const auto& val : v)
hash_append(h, val);
hash_append(h, v.size());
}

template <class HashAlgorithm, class T1, class T2, class... Ts>
void hash_append(HashAlgorithm& h, const T1& first, const T2& second,
const Ts&... ts)
Expand Down
5 changes: 0 additions & 5 deletions include/meta/index/disk_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,6 @@ class string_list;
class vocabulary_map;
}

namespace tokenizers
{
class tokenizer;
}

namespace util
{
template <class>
Expand Down
Loading

0 comments on commit 57382ba

Please sign in to comment.