By using Apache Hadoop cluster, we are able to process huge amount of data, we can run typical Big Data applications using MapReduce framework. The tutorial, which is available on the MTA Cloud's official website (https://cloud.mta.hu/apache-hadoop-klaszter-kiepitese), sets up a complete Apache Hadoop infrastructure with the help of Occopus orchestration tool. The built-in Apache Hadoop architecture will be established using Occopus tool, so we need to install Occopus first.
Data Avenue is a data storage management service that enables to access different types of storage resources (including S3, sftp, GridFTP, iRODS, SRM servers) using a uniform interface. The provided REST API allows of performing all the typical storage operations such as creating folders/buckets, renaming or deleting files/folders, uploading/downloading files, or copying/moving files/folders between different storage resources, respectively, even simply using 'curl' from command line.
Flowbster is a new cloud-oriented workflow system. It was designed to create efficient data pipelines in clouds by which very large data sets can efficiently be processed. The Flowbster workflow can be deployed in the target cloud as a virtual infrastructure through which the data to be processed can flow and meanwhile it flows through the workflow it is transformed as the business logic of the workflow defines it. The Flowbster workflow can be deployed in the target cloud on-demand based on the underlying Occopus cloud deployment and orchestrator tool.