bigdata

Big Data

Such an extensive data collection in terms of volume, speed and variety cannot be separated from advanced and performing solutions for both storage and processing.

Storage and Computation

The already countless existing data sources, both structured, such as relational databases, but also unstructured, such as images, emails, GPS data logs, are constantly increasing and therefore we are witnessing a proliferation of heterogeneous data, destined to multiply exponentially in future. Think of the IoT - Internet of Things, in which the extension of the Internet to the world of real objects and places will give them the ability to collect, process and exchange data typical of computers.


The concept of big data implies more factors, than the quantity and complexity of the data, to the structure needed to collect and store them.

In Koros Consulting we believe that the Apache Hadoop architecture is the optimal tool to quickly store large amounts of structured and unstructured data, both for the high reliability and availability that it guarantees and because, by supporting distributed applications with high data access, it allows applications to work with thousands of nodes and petabytes of data. For batch processing and streaming, we consider it essential, especially if combined with the Hadoop architecture based on the HDFS file system, the use of Spark.

For specific applications, we believe it is appropriate to combine Hadoop with other types of storage that are more suitable for streaming and real-time data processing, such as MongoDB and RethinkDB.

In order to our big data projects to be successful, we strongly believe that you need the right skills to master the technologies of the Hadoop stack. For this reason, our collaborators have Cloudera's CCA Spark and Hadoop Developer certification which we offer as an on premise distribution of Hadoop. For the cloud environment, we offer AWS Big Data services such as EMR, for processing and S3 for storage.

Technologies


Cloudera

CDH is Cloudera’s open source distribution. By integrating Apache Hadoop and its core projects, CDH allows you to develop end-to-end Big Data pipelines, managing all aspects, from storage to computation to analysis.

Spark

It is a Big Data processing engine that allows for high performance for both streaming and batch analysis. It also provides interfaces for interactive data analysis and supports languages such as Scala, Python, Java and R.

Need help?

Contact us.