Natural Language Processing

The new generation of Natural Language Processing solutions requires a comprehensive approach to managing Big Data in any environment. In today’s global economy information comes not only from sources in many different languages, it also includes sources using multiple languages in a single information element.

Natural Language Processing Software Products

Rosoka Natural Language Processing software is available as an entity extraction engine, a language identification engine, a cloud service, and complete product suite with full capabilities. Geotagging and Geospatial Analysis are available in Rosoka Extraction and Rosoka Toolkit or in a complete product bundle with Rosoka GeoGravy.


Rosoka Extraction is a multilingual NLP, API-driven engine that performs entity and relationship extraction along with an integrated geotagging capability. Rosoka Extraction allows a user access to the Java API or REST web service. Results are output in XML, JSON, or POJO. Users must have their own, separate data store to persist results. Rosoka Extraction ships with the Core Rosoka LxBase; however, a different LxBase can be deployed with this product. Rosoka Toolkit is needed to modify or enhance the Core Rosoka LxBase. The intended Rosoka Extraction user is a tech savvy user familiar with API integration programming who wants to plug extraction in to their production pipeline. This is also an appropriate product for OEM integration. Rosoka Extraction provides the basic workflow for processing documents. It provides document retrieval, automatic routing based on licenses, tokenization, rule checking, output formatting, and many other utilities.


Rosoka Toolkit is a development tool for the data scientist. A Data Scientist can modify and create entity types, relationship definitions, lexicons, character-based regex rules, semantic vector regex rules, and maintain quality control with regression testing. Rosoka Toolkit provides the data scientist with an ability to create domain-specific document sets (corpora) and corpora baselines through an integrated results store. The output of Rosoka Toolkit is a Rosoka LxBase. This LxBase can be deployed with any of the other Rosoka products that require an LxBase. The intended Rosoka Toolkit user is a knowledge engineer or data scientist who needs to modify or maintain a Rosoka LxBase.


​Rosoka Analyst is a guided analytics tool for use with Rosoka Extraction or Rosoka Extraction Plus. Extraction results are tabulated and resolved across a document collection, allowing users to explore and visualize data. The results are visualized in several different paradigms with the ability to pivot between views, along with full text search. The visualizations can be saved as PNG image files for easy embedding in reports. The intended Rosoka Analyst user is an analyst or researcher who wants to explore their extracted data in an easy and intuitive way without the need to integrate additional visualization and analytical tools.