Let's analyze your dataBig Data
The amount and types of company data explode literally. We are now talking about Big Data, a term that is often employed loosely.
It defines the fact that the data volume is so large that traditional tools are no longer sufficient and/or that data processing poses a problem in terms of variety, velocity and volume, and becomes impossible.
In short, the implementation of Big Data projects can be summarized in 4 verbs:
- Describe a phenomenon: What? When?
- Explain this phenomenon: Why?
- Predict it: What is going to happen if...?
- Prescribe a course of action: What can we do to...?
What's Big Data?
Here are some definitions of Big Data for companies who want to add value to their data.
Definition in terms of possibilities
- "An opportunity to gain knowledge on new data and content types" IBM
- "Analyzing data that was previously ignored because of technology limitations" 451 Research
- Fraud detection
- Analysis of clients' feelings
- Study of the human genome
Technical definition
- Series of tools tackling issues that cannot be solved with a unique machine
- It's the logical extension of Business Intelligence
Big Data is generally associated with the implementation of a Data Lake within the company:
- Overall storage space of information available in the company
- Provide flexibility to interact with data
- Lack of strict scheme imposed on incoming flows
- Beyond the storage, the main challenge is to process and transform data to accelerate innovation cycles
The Data Lake is the evolution of the Data Warehouse (cfr BI), which doesn't prevent their complementarity.
Our Big Data Solutions
Among the tools of this suite, we find:
- HDFS: Storage of large amounts of files on several machines
- Hive: Structured data processing (SQL) in a distributed way
- SQOOP: Tool for the migration of data from a relational database to the Big Data.
Example of use
Repository of unstructured data (less structured than a datawarehouse, but containing more data):
We also use the ETL Talend tool for the data import and integration.