diff --git a/README.4bin b/README.4bin deleted file mode 100644 index dfd4554..0000000 --- a/README.4bin +++ /dev/null @@ -1,107 +0,0 @@ - - Apache UIMA C++ (Unstructured Information Management Architecture) v3.0.0 - ------------------------------------------------------------------------- - -Getting Started ----------------- - -Apache UIMA C++ can be used as a standalone framework, but it is primarily intended to be integrated with the Apache UIMA Java framework. Interoperability is enhanced if the uimacpp SDK package is installed directly under the top level directory of the Apache UIMA Java framework. - -For information about the Apache UIMA C++ package go to index.html in the uimacpp/docs directory. For more information about Apache UIMA, go to http://uima.apache.org, or to the documentation in the Apache UIMA Java package. - - -Supported Platforms --------------------- - -The Apache UIMA C++ SDK has been built and tested in 64-bit mode on Linux systems with gcc version 4.8.5. MacOS and Windows versions are delayed pending user requests. - -UIMACPP has dependencies on APR, ICU, Xerces-C and optionally ActiveMQ-cpp libraries. ActiveMQ-cpp has a dependency on APR-UTIL. - -This UIMA C++ SDK has been built with the following versions of these dependencies: -- APR 1.6.5 -- ICU 50.2 -- XERCES 3.1.4 -- ACTIVEMQ CPP 3.9.3 -- APR-UTIL 1.6.1 - - -Environment Variables ----------------------- - -The following environmental variables are needed for UIMA C++ to function properly. - - * UIMACPP_HOME should point to the uimacpp directory of your unpacked Apache UIMA C++ - distribution. UIMACPP_HOME is used when compiling & linking UIMA C++ components. - * Append $UIMACPP_HOME/bin to your PATH to pick up the runAECpp test driver and - deployCppService utility. - * Append $UIMACPP_HOME/lib to your LD_LIBRARY_PATH (Linux) or DYLD_LIBRARY_PATH (MacOSX) - so that the necessary shared libraries can be found. - -Also note that UIMA C++ annotators are built as shared libraries, so they must be in a directory in the LD_LIBRARY_PATH, DYLD_LIBRARY_PATH or PATH (as appropriate to your platform) as well. An example of this is given in the next section. - -For better runtime integration between Java and C++, the Apache UIMA Java SDK command line utilities and Eclipse run configurations automatically add $UIMA_HOME/uimacpp/lib to LD_LIBRARY_PATH and DYLD_LIBRARY_PATH, and add $UIMA_HOME/uimacpp/bin to PATH. - - -Verifying Your Installation ----------------------------- - -The procedure here is to first test that Apache UIMA C++ is installed and operating correctly. Then if desired, check if the code is interoperating properly with Apache UIMA Java. - -Set up the environment as described above. Go to $UIMACPP_HOME/examples/src to build the sample code and add the src directory to the appropriate path as follows. - -On Linux: (Please see below Notes on lib/base.mak) - * make -f DaveDetector.mak - * LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd` - -On Windows: -From a MSVC Command Prompt: - * devenv DaveDetector.vcproj /build release - * PATH=%PATH%;%CD% - -To test the sample code in the C++ environment, change back to the $UIMACPP_HOME/examples directory and run: - * runAECpp descriptors/DaveDetector.xml data - -The console should show that a Dave was found in some of the files in the data directory. - -To test interoperability with Java using the JNI, enable UIMA-AS by setting UIMA_HOME and adding $UIMA_HOME/bin to PATH. Then use the runAE.sh utility (use runAE on Windows) to run DaveDetector from the $UIMACPP_HOME/examples: - * runAE.sh descriptors/DaveDetector.xml data - -To test interoperability with UIMA-AS, continuing with above configuration: - * Start and ActiveMQ broker using the startBroker.sh command - * Build and deploy the uimacpp MeetingAnnotator component - * cd $UIMACPP_HOME/examples/tutorial/src and run "make -f MeetingAnnotator.mak" - * add this directory to C++ library path: "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD" - * deploy the service: - deployCppService $UIMACPP_HOME/examples/tutorial/descriptors/MeetingAnnotatorCPP.xml MeetingAnnotator - * deploy MeetingDetectorTAE service that uses this meeting annotator service: - UIMA_DATAPATH=$UIMA_HOME/examples/descriptors deployAsyncService.sh \ - $UIMACPP_HOME/examples/tutorial/descriptors/Deploy_MeetingDetectorTAE_RemoteMeeting.xml - * Send work to the two services: - runRemoteAsyncAE.sh tcp://localhost:61616 MeetingDetectorTaeQueue \ - -c $UIMA_HOME/examples/descriptors/collection_reader/FileSystemCollectionReader.xml - - -For more information about the C++ sample code see $UIMACPP_HOME/examples/readme.html. - -For more information about UIMA C++ $UIMACPP_HOME/RELEASE_NOTES.html. - -Notes on lib/base.mak ---------------------- -The annotator and application Make files include lib/base.mak. - -On Mac OS X, it is necessary to modify base.mak -1) Change APR, ICU and XERCES-C includes - In line - INCLUDES=-I$(UIMACPP_HOME)/include -I$(UIMACPP_HOME)/include/apr-1 - set the correct path to the APR, ICU and XERCES-C include directories. For example: - INCLUDES=-I$(UIMACPP_HOME)/include -I/usr/local/opt/apr/libexec/include/apr-1 -I/usr/local/opt/icu4c/include -I/usr/local/opt/xerces-c/include -2) Add APR, ICU and XERCES-C libraries - In line - LIBRARIES=-L$(UIMACPP_HOME)/lib - include the path to the APR, ICU and XERCES-C lib directories. For example: - LIBRARIES=-L$(UIMACPP_HOME)/lib -L/usr/local/opt/apr/libexec/lib -L/usr/local/opt/icu4c/lib -L/usr/local/opt/xerces-c/lib -3) Change the compiler - If you have issues with the default Mac OS X GCC, install GCC-5 and change the line - CC=g++ - to - CC=g++-5 diff --git a/README.4src b/README.4src deleted file mode 100644 index 6a1e540..0000000 --- a/README.4src +++ /dev/null @@ -1,347 +0,0 @@ - -See the LICENSE file for licensing information. - --------------------------------- -The Apache UIMA C++ SDK --------------------------------- -1. General -2. Building on Unix -3. Building on Windows -4. Building, testing and packaging on Mac OSX -5. Building the dependencies -6. UIMA C++ Release Compatibility - ----------------- -1. General ----------------- - -UIMACPP enables analytics written in C++, Python, Perl and Tcl to be -easily integrated into the UIMA Java framework. - -The Apache UIMA C++ SDK contains UIMACPP as well as all prerequisite -components. The SDK is intended to enable user's annotator code -be packaged for integration into a UIMA pipeline and to make it -easier to redistribute UIMACPP-based applications. This also allows -different UIMACPP annotators, each using different versions of UIMACPP, -to be integrated into a common UIMA pipeline on the same machine. For this -reason the SDK contains UIMACPP as well as all prerequisite components. - -To build the SDK, it is recommended that the prerequisites be built from source. -The SDK build process must be configured with the location of the install directory for each -of the dependencies. It locates the headers and libraries of the dependencies relative -to the specified location. - -Alternatively, on Linux, UIMACPP can be built and installed on a machine along -with all its prerequisites in the standard system directories. In this case the -prerequisites can be installed from binary distributions. - -UIMACPP runtime prerequisites are: - -3rd Party Recommended Version -------------------------------- -APR >= 1.6.5 -ICU == 50.2 -Xerces-C >= 3.1.4 -Java SDK >= 1.8 -ActiveMQ CPP >= 3.9.3 (optional) to build the UIMA-AS service wrapper -APR-util >= 1.6.1 (optional) an ActiveMQ prerequisite - - -On Linux the following GNU build tools are required: - -Tool Recommended Version -------------------------------- -autoconf >= 2.69 -automake >= 1.13.4 -libtool >= 2.4.2 -g++ >= 4.8.5 - -On Windows, UIMACPP is built with Microsoft Visual Studio. The UIMA C++ MSVC projects -are Microsoft Visual Studio 8 (2005) projects. - -The SDK build process also builds the documentation and requires doxygen for building -the documentation. - -The Apache UIMA C++ SDK has been built and tested in 64-bit mode on Linux systems with gcc version 4.8.5; -Mac OSX 64-bit and Windows versions are currently delayed. - -For up-to-date build instructions. please see these pages on the Apache UIMA site: -https://uima.apache.org/dev-quick.html -https://uima.apache.org/doc-uimacpp-build.html - -------------------------- -2. Building on Unix/Linux -------------------------- - -To build an SDK it is recommended that all prerequisite components, APR, ICU, Xerces-C, -ActiveMQ-cpp and APR-util be built from source and installed by running "make -install". They need not be installed in system directories. - -Prerequisite installs from compatible binary distributions can be used to build -the UIMACPP components. At runtime, only one version of ICU can be loaded. Some -binary distributions of Xerces-C are built with ICU and have been reported to have -problems. In this case, rebuild Xerces-C from source and specify --without-icu argument -to the configure script. - -There is also a dependency on JNI headers from an installed Java development package. - - -Build Steps ------------ - -Configure and build - Generate a configure script compatible with the build machine tool level: - ./autogen.sh - - Generate the required makefiles and build UIMACPP components: - ./configure --with-jdk=location_of_jni.h [other options, see below] - make - - By default the install step will put UIMACPP components in - /usr/local/uimacpp. To build an SDK, install them elsewhere by adding - the following option to configure: - --prefix=install_target_loc e.g. --prefix=~/uimacpp-3.0.0/install - - On Linux jni.h is usually in the JDK's "include" directory. On MacOSX - jni.h is in the JDK's "Headers" dirctory. In some cases jni.h will - #include files in other directories. For example, if jni.h includes - jni_md.h located in a subdirectory named linux, use - --with-jdk="loc_of_jdk/include' -I'loc_of_jdk/include/linux" - - If the prerequisites are not installed in the system directories as well as - for a full SDK build, additional parameters must be provided to the - configure script: - For a full SDK build, all of the following are needed: - --with-apr=loc_of_apr_install --with-icu=loc_of_icu_install \ - --with-xerces=loc_of_xerces_install --with-activemq=loc_of_amq_install \ - --with-apr-util=loc_of_apr-util_install - - For a build of UIMACPP without UIMA-AS support, specify the option - --without-activemq and leave out --with-activemq and --with-apr-util. - - For more help on how to customize the build configuration, run: - ./configure --help - -Run the test suite - make check - -Install and Build the SDK tree - make install - make docs - make sdk TARGETDIR=loc_of_sdk_tree [CLEAN=clean] - - The SDK tree will be created in loc_of_sdk_tree/uimacpp. Package by: - cd loc_of_sdk_tree - tar czf uimacpp-X.Y.Z-bin.tgz uimacpp - -For additional information on building on Mac OSX, please see Section 4. - - ----------------------- -3. Building on Windows ----------------------- - -To build an SDK all prerequisite components, APR, ICU, Xerces-C, -ActiveMQ-cpp, APR-util and APR-iconv must first be built, and a -JDK installed. The location of the dependencies must be set in -environment variables APR_HOME, ICU_HOME, XERCES_HOME, ACTIVEMQ_HOME -and JAVA_HOME. - -For details on building the dependencies, please see section 5.2 of -this document. - -In order to be able to build annotators, the SDK must be built as -described below in step 4, since the annotator projects use the -environment variable UIMACPP_HOME to locate the UIMA libraries -and dependencies. - -If using MSVC Express Edition, first run these prebuild steps. - - cd uimacpp-X.Y.Z\src - - run vcexpress uimacpp.sln and do any conversions as prompted. - - replace devenv command with vcexpress in the winmake and test/fvt.sh scripts and in all build instructions. - - continue with the instructions below. - -If using a newer version of MSVC, the uimacpp.sln in uimacpp-X.Y.Z\src -must be converted. - -The following commands assume you are running from a Microsoft Visual -Studio 2005 Command Prompt. - -1 Build the UIMA C++ framework in both release and debug: - cd \uimacpp-X.Y.Z\src - winmake /build release - winmake /build debug - -2 Build and run the test suite: - cd \uimacpp-X.Y.Z\src\test - devenv test.sln /build release - fvt - -3 Build the documentation: - Note: The documentation build requires Doxygen 1.3.6 or later. - cd \uimacpp-X.Y.Z\docs - builddocs - -4 Build the SDK tree: - set MSVCRT_HOME to the directory with the required msvc*.dll files. - set ACTIVEMQ_HOME if building the ActiveMQ service wrapper, deployCppService. - - cd \uimacpp-X.Y.Z - buildsdk "target_dir [clean]" - -5 Package the SDK zipfile by creating a compressed folder of - target_dir\uimacpp into uimacpp-X.Y.Z-bin.zip - -6 Package a source zipfile by creating a compressed folder of the - the directory containing the uimacpp source from git - - ----------------------------------------------- -4. Building, testing and packaging on Mac OSX: ----------------------------------------------- -Except for note below, building is the same here as outlined -in Building on Unix (section 2). - -Make sure you have built and installed in your system all prerequisites, -as stated in General (section 1). - - -4.1 Patch APR -------------- - -For the Intel-based Mac OSX machines we have tested with, the APR function -to dynamically load shared libraries does not respect DYLD_LIBRARY_PATH. - -A fix is to patch dso/unix/dso.c as follows: - -26a27,31 ->#if defined(DSO_USE_DYLD) ->#define DSO_USE_DLFCN ->#undef DSO_USE_DYLD ->#endif -> - - -4.2 Re-generated configure --------------------------- - -In Mac OSX it is necessary to re-generate the configure file: - - ./autogen.sh - - -4.3 Executing configure ------------------------ - -Mac OSX Sierra default GCC is not capable of building UIMACPP. As alternative, you can -install GCC-5 and pass it as an agument in configure command: - - CC=gcc-5 CXX=g++-5 ./configure - -Make sure to pass the JDK Hearder path (--with-jdk) and the path of each dependency -(--with- syntax). - - -4.4 Packaging UIMA C++ annotators: -On Mac OSX, the install names are embedded in the binaries. Run the -following steps manually post build to neutralize the embedded name in -the UIMA C++ binary and to change the dependency path in the -annotator: - -1) changing the install name in libuima, to neutralize it: - -install_name_tool -id libuima.dylib $UIMACPP_HOME/install/lib/libuima.dylib - -2) changing the dependency path in the annotator: - -install_name_tool -change "/install/lib/libuima.dylib" -"/absolute_path_to_uimacpp_home/install/lib/libuima.dylib" MyAnnotator.dylib - - ----------------------------------------------------------------------------- -5. Building the dependencies: APR et al, ICU, Xerses-c and Activemq-cpp ----------------------------------------------------------------------------- - -Download and build information for these libraries are at: - APR - http://apr.apache.org/ - ICU - http://www.icu-project.org/ - XERCES - http://xml.apache.org/xerces-c/ - ACTIVEMQ - http://activemq.apache.org/cms/download.html - -ACTIVEMQ CPP library version 3.2 or higher is required to support -the ActiveMQ failover protocol and to support multi-byte payload data. -ACTIVEMQ CPP 3.2 and higher has a dependency on APR at version 1.3.8 -or higher and APR-util 1.3.8. (On Windows APR-util requires APR-iconv) - - -5.1 Building Dependencies on Unix/MacOSX ----------------------------------------- -The directions for these components is straightforward. The UIMACPP -build expects to find headers in install_loc/include and libraries -in install_loc/lib. - - -5.2 Building Dependencies on Windows: -------------------------------------- -The build of dependent libraries on Windows is less consistent. -The APR components must be checked out and built in parallel -directories (see apr.apache.org) and the libraries are expected -to be located relative to %APR_HOME%. -ActiveMQ libraries are in %ACTIVEMQ_HOME%\vs2008-build\ReleaseDLL -and the headers are expected in %ACTIVEMQ_HOME%\src\main. - -On Windows, buildsdk command tries to copy the msvc*.dll runtime libs from -C:\Program Files\Microsoft Visual Studio8\VC\redist\x86\Microsoft.VC80.CRT -To override the location for MSCV redistributable libraries, use MSVCRT_HOME. - -ActiveMQ-CPP - The UIMA C++ MSVC projects are Microsoft Visual Studio -8 (2005) projects. The ActiveMQ CPP source distribution comes with -MSVC 8 (2008) project. These can be down converted to MSVC 2005 by -following these step reproduced from -http://stackoverflow.com/questions/609419/how-do-i-downgrade-a-c-visual-studio-2008-project-to-2005 - -Put the following sed script in a file called downgrade_vc9_to_vc8.sed : -s#Version=\"9.00\"#Version=\"8.00\"#g -s#9.0.21022#8.0.50727#g -s#v2.0##g -s# ToolsVersion=\"3.5\"##g -s#MSBuildToolsPath#MSBuildBinPath#g - -Run -sed.exe -f downgrade_sln_vc9_to_vc8.sed vs2008-build/activemq-cpp.vcproj > vs2008-build/activemq-cpp2005.vcproj - -The only activemq-cpp target needed by uimacpp is ReleaseDLL, - e.g. devenv vs2008-build/activemq-cpp2005.vcproj /build ReleaseDLL - -The three APR libraries can be built by launching aprutil.dsw and -building libaprutil or by following the instructions in Makefile.win. - -XERCES and ICU -Binary distributions are available for Xerces and ICU. -Use only those built with a compatible version of Visual Studio! -Currently the SDK uses xerces-c_2 so if a higher version is -installed the MSVC project files must be edited. - - ----------------------------------- -6. UIMA C++ Release Compatibility ----------------------------------- -There are two distinct features of UIMA C++ to consider when dealing -with release compatibility: - -- The framework dynamically loads annotators which are user code. The - annotators make calls to UIMA C++ APIs and are built with some - version of the SDK. A possible scenario is for an application to - run annotators that were built with different releases of UIMA C++ - SDK. -- The SDK depends on ICU, XERCES, APR and ACTIVEMQ-CPP and a release - is built with a particular version of these. Binary compatibility - therefore also depends on the compatibility of these underlying - libraries. In particular, ICU and XERCES encode the major and - minor release numbers in the APIs which restricts binary - compatibility across releases of these libraries. An application - running UIMA C++ is restricted to running one version of the ICU - library in a process and all annotators and underlying libraries - must use the same ICU version. - -In general, different UIMACPP releases are not binary compatible. diff --git a/README.md b/README.md index 63bf3ab..dc31457 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,20 @@ Apache UIMA C++ SDK =================== +The UIMA C++ framework is currently undergoing a number of enhancements to allow for full standalone pipelines written in C++ or in supported scripting languages. As such, there is no available distribution and there are some [major enhancements](https://github.com/apache/uima-uimacpp/issues/6) being [worked on](https://cwiki.apache.org/confluence/display/COMDEV/GSoC+2024+Ideas+list#GSoC2024Ideaslist-UIMA). If interested in contributing, contact the [current maintainer](https://github.com/DrDub). + + What is the UIMA C++ SDK? ------------------------- -The UIMA C++ framework is designed to facilitate the creation of UIMA compliant Analysis Engines (AE) from analytics written in C++, or written in languages that can utilize C++ libraries. The UIMACPP SDK directly supports C++, and indirectly supports Perl, Python and Tcl languages via SWIG (https://www.swig.org/). Existing analytic programs in any of these languages can be wrapped with a UIMACPP annotator and integrated with other UIMA compliant analytics or UIMA-based applications. - -![uimaFIT?](docs/images/framework-core.png) +The UIMA C++ framework is designed to facilitate the creation of UIMA compliant Analysis Engines (AE) from analytics written in C++, or written in languages that can utilize C++ libraries. The UIMACPP SDK directly supports C++, and indirectly supports Perl and Python languages via SWIG (https://www.swig.org/). Existing analytic programs in any of these languages can be wrapped with a UIMACPP annotator and integrated with other UIMA compliant analytics or UIMA-based applications. -A UIMA C++ AE can be used anywhere a UIMA Java AE can be used, for example, as a delegate in an aggregate AE, or as a UIMA service (using JMS, Vinci or SOAP protocols). When used in the Java framework, by default a C++ AE is instantiated and called via the JNI, running as part of the JVM process. This is also true for Vinci and SOAP services. For JMS services, the UIMACPP SDK includes a native service wrapper compatible with UIMA-AS. +![Framework Core](docs/images/framework-core.png) The UIMA C++ framework supports testing and embedding UIMA components into native processes. A UIMA C++ test driver, `runAECpp`, is available so that UIMA C++ components can be fully developed and tested in the native environment, no use of Java is needed. -UIMA C++ includes APIs to parse component descriptors, instantiate and call analysis engines, so that UIMA C++ compliant AE can be used in native applications. However, UIMA C++ components are primarily intended to be integrated into applications using UIMA's Java-based interfaces. +UIMA C++ includes APIs to parse component descriptors, instantiate and call analysis engines, so that UIMA C++ compliant AE can be used in native applications. The Apache UIMA C++ SDK is Docker-based. For interoperability, UIMA C++ components are expected to be built and distributed against a particular Docker image, thus ensuring correct compiler and dependent library settings. + Building -------- @@ -23,111 +25,62 @@ Checkout the source code as follows: git clone https://github.com/apache/uima-uimacpp.git -UIMACPP runtime prerequisites are APR, ICU, Xerces-C, ActiveMQ-cpp, -APR-Util and a JDK for building the JNI interface. The SDK also -requires doxygen for building the documentation. - -### Building dependencies - -The Apache UIMA C++ SDK has been built and tested in 32-bit mode on Linux systems with gcc version 3.4.6 and on Windows using MSVC version 8. 64-bit builds have only been tested on Linux with gcc 4.3.2 and 4.4.6. - -The UIMA C++ SDK has been built with the following versions of these dependencies: - -- APR 1.3.8 -- ICU 3.6 -- XERCES 2.8.0 -- ACTIVEMQ CPP 3.4.1 -- APR-UTIL 1.3.8 - -If changes are made to `configure.ac` or `Makefile.am`, then configure needs to be re-generated by running `./autogen.sh` in the root of the SVN extract. - -`autogen.sh` requires GNU tools at or above the following versions: automake v1.9.6, autoconf v2.59 and libtool v1.5.24. - -To build the SDK, all prerequisites need to be built from source. -Alternatively UIMACPP can be built and installed on a machine with all the prerequisites available in system directories. -In this case the prerequisites can be installed from binary distributions. - -Download and build information for these libraries are at: - -- APR - http://apr.apache.org/ -- APR-Util - http://apr.apache.org/ -- ICU - http://www.icu-project.org/ -- XERCES - http://xml.apache.org/xerces-c/ -- ACTIVEMQ - http://activemq.apache.org/cms/download.html/ - -ACTIVEMQ CPP library version 3.2 or higher is required to support the ActiveMQ failover protocol and to support multi-byte payload data. ACTIVEMQ CPP 3.2 and higher has a dependency on APR at version 1.3.8 or higher and APR-Util 1.3.8. - -### Checking on Unix - -To build and install on a machine with prerequisites available in system directories: - - cd uima-uimacpp - ./configure --with-jdk=location_of_jni.h [other options] - make - make check - -For a full SDK build, +UIMACPP runtime prerequisites are APR, ICU, Xerces-C, APR-Util and a JDK for building the JNI interface. +The SDK also requires doxygen for building the documentation. See the [Dockerfile](Dockerfile) for details. - ./configure --with-apr=loc_of_apr_install --with-icu=loc_of_icu_install --with-xerces=loc_of_xerces_install --with-activemq=loc_of_amq_install --with-apr-util=loc_of_apr-util_install - make install - make sdk TARGETDIR="loc_of_sdk_tree [clean]" -For a build of UIMACPP without UIMA-AS support, specify the option -`--without-activemq`. The options `--with-activemq` and `--with-apr-util` can be left out. +### Building the Docker image -### Building on Windows +The Docker image is built on top of Debian stable slim image. After cloning the project, on the root directory do: -To build an SDK all prerequisite components, APR, ICU, Xerces-C, -ActiveMQ-cpp and APR-Util must first be built on the machine, and a -JDK installed. The location of the dependencies must be set in -environment variables `APR_HOME`, `ICU_HOME`, `XERCES_HOME`, `ACTIVEMQ_HOME`, `APU_HOME` and `JAVA_INCLUDE`. +```bash +sudo docker build . -t apache:uimacpp +``` +This should create an image about 250+ Mb in size. - cd /myWorkingCopyUimacpp - winmake /build release (or debug) - cd src\test - devenv test.sln /build release - fvt - cd /myWorkingCopyUimacpp/docs - builddocs - buildsdk "target_dir [clean]" +### Testing the Docker image -### Building on OS X (experimental) +The easier way to test it is by running the Perltorator: -These instructions should work on the Max OSX but have not been tested. +```bash +mkdir out +sudo docker run --interactive --tty --name uimacppdev \ + --mount type=bind,source="$(pwd)"/examples/data,target=/data \ + --mount type=bind,source="$(pwd)"/out,target=/out \ + apache:uimacpp \ + /usr/local/uimacpp/desc/Perltator.xml /data /out +``` -Except for one problem with APR, building is the same here as on Linux. For the Intel-based Mac OSX machines we have tested with, the APR function to dynamically load shared libraries does not respect DYLD_LIBRARY_PATH. +The `out` folder will be populated by XMI files with the same name as the original files in `data`. -A fix is to patch dso/unix/dso.c as follows: +Other useful Docker commands: - 26a27,31 - >#if defined(DSO_USE_DYLD) - >#define DSO_USE_DLFCN - >#undef DSO_USE_DYLD - >#endif +```bash +sudo docker rm uimacppdev +``` -Packaging UIMA C++ annotators: +To remove an old container. -On Mac OSX, the install names are embedded in the binaries. Run the following steps manually post build to neutralize the embedded name in the UIMA C++ binary and to change the dependency path in the annotator: +```bash +sudo docker run --interactive --tty --name uimacppdev --entrypoint /bin/bash apache:uimacpp +``` -* changing the install name in libuima, to neutralize it: - - install_name_tool -id libuima.dylib $UIMACPP_HOME/install/lib/libuima.dylib -* changing the dependency path in the annotator: +To run a container interactively using `bash`. - install_name_tool -change "/install/lib/libuima.dylib" "/absolute_path_to_uimacpp_home/install/lib/libuima.dylib" MyAnnotator.dylib Examples -------- -The UIMACPP package includes several sample UIMA C++ annotators and a sample C++ application that instantiates and uses a C++ annotator. Please go to the UIMA Download Page and get the "UIMACPP Framework" package for Linux or Windows as appropriate. For best interaoperability with the Java version of UIMA, unpack into the $UIMA_HOME directory. See the README file in the top level directory for instructions on testing the package, and follow the links there to the sample code in C++, Perl, Python and Tcl. +The UIMACPP package includes several sample UIMA C++ annotators and a sample C++ application that instantiates and uses a C++ annotator. More details on how to build and run the examples will be available over time. A UIMA C++ annotator descriptor differs from a Java descriptor in the frameworkImplementation, specifying org.apache.uima.cpp -For a C++ annotator, the annotatorImplementationName specifies the name of a dynamic link library. UIMACPP will add the OS appropriate suffix and search the active dynamic libary path: LD_LIBRARY_PATH for Linux, PATH for Windows, and DYLD_LIBRARY_PATH for MacOSX. The suffix is not automatically added when the annotatorImplementationName includes a path. -An annotator library is derived from the UIMACPP class "Annotator" and must implement basic annotator methods. Annotators in Perl, Python and Tcl languages each use a C++ annotator to instantiate the appropriate interpreter, load the specified annotator source and call the annotator methods. +For a C++ annotator, the annotatorImplementationName specifies the name of a dynamic link library. UIMACPP will add the OS appropriate suffix and search the active dynamic libary path (`LD_LIBRARY_PATH` for Linux). The suffix is not automatically added when the annotatorImplementationName includes a path. +An annotator library is derived from the UIMACPP class "Annotator" and must implement basic annotator methods. Annotators in Perl and Python languages each use a C++ annotator to instantiate the appropriate interpreter, load the specified annotator source and call the annotator methods. + UIMACPP Example - Running a C++ analytic in a Native Process @@ -137,46 +90,9 @@ As in UIMA, UIMACPP includes application level methods to instantiate an Analysi `examples/src/ExampleApplication.cpp` is a simple program that instantiates the specified annotator, reads a directory of txt files, and for each file sets the document text in a CAS and calls the AE process method. For annotator development, this program can be modified to create arbitrary CAS content to drive the annotator. Because the entire application is C++, standard tools such as `gdb` or `devenv` can be easily used for debugging. -`runAECpp` is a UIMA C++ application driver modeled closely after the Java tool runAE. Like `ExampleApplication`, this tool can read a directory of text files and exercise the given annotator. In addition, `runAECpp` can take input from XML format CAS files, call the annotator's `process()` method, and output the resultant CAS in XML format files. XML format CAS input files can be created from upstream UIMA components, or created manually with the content needed to develop and unit test an annotator. - -![uimaFIT?](docs/images/uimacppnative.png) - - - -UIMACPP Example - Running a C++ analytic in a JVM Process ---------------------------------------------------------- - -Using the UIMA or UIMA AS packages, a UIMA C++ Analysis Engine can be used anywhere a UIMA Java AE can be used, for example, as a delegate in an aggregate AE, or as a UIMA service (using JMS, Vinci or SOAP protocols). When used in the Java framework, by default a C++ AE is instantiated and called via the JNI, running as part of the JVM process. - -When a UIMA component descriptor specifies the frameworkImplementation as `org.apache.uima.cpp`, UIMA's Java framework instantiates a proxy annotator that transparently creates the UIMACPP component through the JNI. When the process(cas) method is called on the proxy, the CAS is binary serialized through the JNI into the native environment. The UIMA C++ annotator operates on the native copy of the CAS, and then the CAS is serialized back to the Java environment. - -There are some limitations to this configuration: - -* When more than one UIMA C++ component is colocated in the JVM, all must share identical versions of the UIMACPP framework. -* Runtime problems in the C++ code can crash the entire JVM process. -* Standard OS parameters for a process, such as program stack size, are different for a JVM process than a native process. -* Debugging native code running in a JVM process can be problematic. - - -![uimaFIT?](docs/images/uimacppthrujni.png) - - -UIMACPP Example - Running a C++ analytic as a Native UIMA AS Service --------------------------------------------------------------------- - -With the UIMA AS package, a UIMA C++ component can be run as a UIMA AS service using the UIMA C++ application `deployCppService`. This application instantiates a UIMA C++ AE from the specified annotator descriptor, and then connects to the specified ActiveMQ broker and input queue. In order to take advantage of multi-core hardware, `deployCppService` supports instantiating multiple copies of the C++ analytic, each in a different thread; this option requires the analytic to be designed for multithreaded operation. - -Once deployed, the service can be utilized from UIMA applications and aggregate analysis engines in exactly the same way as other UIMA AS services written in Java. - -UIMA AS services written in Java are deployed using UIMA Deployment Descriptors. These descriptors, which specify the UIMA component descriptor to instantiate and the connectivity and error handling options, are used by the UIMA utility `deployAsyncService` to launch a Java service. Deployment Descriptors have special support for UIMA C++ services, with the ability to provide lifecycle management, JMX monitoring and integrated logging of C++ native services. This support is enabled when the UIMA AS Deployment Descriptor specifies - - +`runAECpp` is a UIMA C++ application driver modeled closely after the Java tool runAE. Like `ExampleApplication`, this tool can read a directory of text files and exercise the given annotator. In addition, `runAECpp` can take input from XML format CAS files, call the annotator's `process()` method, and output the resultant CAS in XML format files. XML format CAS input files can be created from upstream UIMA components, or created manually with the content needed to develop and unit test an annotator. This is the default [entrypoint point](docker-entrypoint.sh) for the Docker image. -in which case Java will launch deployCppService as a separate process on the same machine and establish socket connections for logging and monitoring. -Note that in this case the Deployment Descriptor can also specify the environment for the native process using entries such as +![UIMA CPP Native Deployment](docs/images/uimacppnative.png) - /home/user/apache-uima-as/uimacpp/lib -This feature enables multiple UIMA C++ components with different levels of UIMACPP to be managed by the same JVM. -![uimaFIT?](docs/images/deploycppservice.png) diff --git a/docs/images/deploycppservice.png b/docs/images/deploycppservice.png deleted file mode 100644 index 61d1b06..0000000 Binary files a/docs/images/deploycppservice.png and /dev/null differ diff --git a/docs/images/uimacppthrujni.png b/docs/images/uimacppthrujni.png deleted file mode 100644 index 59f5859..0000000 Binary files a/docs/images/uimacppthrujni.png and /dev/null differ