Acquire, Extract, Analyze Web Data

Empowering you to make Data-Driven Decisions.

Leverage Open Source Datasets

Process and leverage existing datasets for your business objectives:

  • Wikipedia, Yago, DbPedia for entity resolution
  • GDELT project for global event and news data
  • Geonames for resolving addresses to geolocations

Text and Data Analysis

  • Approximate Record matching of products, companies, people, etc.
  • Content Extraction from HTML, PDF and other documents
  • Natural Language Processing
  • Document clustering
  • Image Classification using Deep Learning networks
  • Parsing of postal addresses
  • Parsing of phone numbers
  • Indexing and querying thousands of websites
  • Website technology detection
  • Automatic extraction of tabular data

Custom Broad Crawls

Extract information from millions of websites and make it actionable and queriable.

We routinely crawl thousands or millions of websites, extracting either standard information, or custom details per use case.

  • contact data (postal address, phone, VAT, etc.)
  • high-level information about the company
  • industry classification
  • etc.

Extraction of Structured Data

Extract data from millions of structured websites (i.e. E-Commerce):

  • product name
  • manufacturer
  • SKU
  • price
  • etc.