Let's analyze your dataBig Data

The amount and types of company data explode literally. We are now talking about Big Data, a term that is often employed loosely.

It defines the fact that the data volume is so large that traditional tools are no longer sufficient and/or that data processing poses a problem in terms of variety, velocity and volume, and becomes impossible.

In short, the implementation of Big Data projects can be summarized in 4 verbs:

  • Describe a phenomenon: What? When?
  • Explain this phenomenon: Why?
  • Predict it: What is going to happen if...?
  • Prescribe a course of action: What can we do to...?

 

What's Big Data?

Here are some definitions of Big Data for companies who want to add value to their data.

Definition in terms of possibilities

  • "An opportunity to gain knowledge on new data and content types" IBM
  • "Analyzing data that was previously ignored because of technology limitations" 451 Research
    • Fraud detection
    • Analysis of clients' feelings
    • Study of the human genome

Technical definition

  • Series of tools tackling issues that cannot be solved with a unique machine
  • It's the logical extension of Business Intelligence

Big Data is generally associated with the implementation of a Data Lake within the company:

  • Overall storage space of information available in the company
  • Provide flexibility to interact with data
  • Lack of strict scheme imposed on incoming flows
  • Beyond the storage, the main challenge is to process and transform data to accelerate innovation cycles

The Data Lake is the evolution of the Data Warehouse (cfr BI), which doesn't prevent their complementarity.

Our Big Data Solutions

For the implementation of our Big Data solutions, we mainly use Hortonworks, a software suite that focuses on the development and support of Hadoop, a framework that allows distributed processing of large data sets through computer clusters.

Among the tools of this suite, we find:
  • HDFS: Storage of large amounts of files on several machines
  • Hive: Structured data processing (SQL) in a distributed way
  • SQOOP: Tool for the migration of data from a relational database to the Big Data.

Example of use

Repository of unstructured data (less structured than a datawarehouse, but containing more data):

 

Entrepôt de données non structurées

 

 

We also use the ETL Talend tool for the data import and integration.

Creation in1999
131dedicated people
Turnover27%
355customers and you?
Sponsor