Data Collection & Data Fusion

In a Big Data world, meaningful context begins with the right connections.

Big Data is Everywhere

Big Data is everywhere today: in constant streams of data flowing from networked machines, in data warehouses, in legacy apps and mainframes, on the Web... Accordingly, the first challenge in extracting value from Big Data is getting it into a repository where you can exploit it without impacting your existing operations.

To meet this challenge, EXALEAD CloudView offers an advanced Web crawler and exploitable WWW index, and a powerful portfolio of connectors to unstructured and structured Big Data sources inside and outside the enterprise.

The second challenge is integrating multiple data sources in an automated, industrial way to transform raw, heterogeneous data into action-guiding wisdom. To this end, EXALEAD CloudView features a powerful semantic processing pipeline for meaningfully structuring and enriching unstructured content and correlating it with structured data.


Data Collection

  • Web Content
    With an HTTP crawler honed for performance in the noisy, massively voluminous world of the Web, EXALEAD offers organizations a uniquely powerful and intelligent tool for extracting quality content from the Internet, including structured and unstructured data from secure and open sources (respecting access rules and rights).

Benefits

  • Guarantees non-intrusive, secure and automated data collection
  • Delivers a fully-unified view of information
  • Ensures high performance at Big Data scale

EXALEAD customers can also quickly and easily enrich their databases and applications with the high-quality content from EXALEAD's public WWW search engine index, now the world's third largest, behind Google and Microsoft's Bing (with Yahoo!'s switch to the Bing search infrastructure).

Specialized social media connectors further extend EXALEAD's Web data collection capabilities, making it easy to capture relevant information from sources such as Facebook, LinkedIn and Twitter.

  • Enterprise Content
    Packaged CloudView connectors are available for an extensive range of enterprise sources, including file servers, XML systems, databases, email systems, directories, content management and collaboration systems, and the ENOVIA platform.

    EXALEAD's OEM agreement with Informatica extends this connectivity with advanced support for dozens of Big Data sources, including enterprise applications, data warehouses, Business Intelligence platforms, mainframes, NoSQL stores (e.g., Hadoop HDFS), and real-time message queue data.
  • Custom & Legacy Systems
    EXALEAD's portfolio of packaged connectors is complemented by a public, fully-documented Application Programming Interface (API) for connectivity to obsolete or custom (bespoke) repositories using standard protocols and languages (HTTP/Rest, Java, C#, etc.).

To learn more about EXALEAD CloudView's data capture capabilities, download the EXALEAD Connectors and Formats data sheet.

Data Fusion

While there is much value to be gained from being able to search, explore and analyze individual Big Data collections, the highest potential for breakthrough insights and innovation lies in meaningfully cross-referencing diverse data silos.

With a Map/Reduce-style processing framework and a high-performance semantic processing pipeline, EXALEAD CloudView is ideally engineered for aggregating heterogeneous Big Data sources. Use it to unearth the hidden meanings and relationships within and between collections comprising:

  • Unstructured content like documents, emails, call recordings and videos
  • Semi-structured data such as XML records and machine data produced by smart meters, RFID readers, barcode scanners, weblogs and GPS tracking units
  • Highly-structured relational data like that housed in transactional databases and data warehouses