Most of us are familiar with logstash. We use it to transfer data to multiple destinations. But how to transfer it to HDFS without using webHDFS ?
Here is the solution!!!!!!!
The following diagram explains the solution
A highly reliable message broker which is often used for real time streaming. Many data processing tools like spark, storm, flink has connectors to kafka, so that apart from transferring data to HDFS, it can be used for any analytics(both batch and streaming).
Logstash to kafka example configuration
The below logstash output configuration will transfer data to kafka
bootstrap_servers => “kafka brokers”
topic_id => “topic”
Flume is a data ingestion tool to transfer data from one place to another. In our case we will read from kafka and transfer it to HDFS.
Kafka => flume => hdfs example configuration
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = list of brokers in the kafka cluster
a1.sources.r1.kafka.topics = topic
a1.sources.r1.batchSize = 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://namenode/flume