Hi Guys,
Most of us are familiar with logstash. We use it to transfer data to multiple destinations. But how to transfer it to HDFS without using webHDFS ?

Here is the solution!!!!!!!

The following diagram explains the solution



A highly reliable message broker which is often used for real time streaming. Many data processing tools like spark, storm, flink has connectors to kafka, so that apart from transferring data to HDFS,  it can be used for any analytics(both batch and streaming).

Logstash to kafka example configuration

The below logstash output configuration will transfer data to kafka

kafka {
            bootstrap_servers => “kafka brokers”
            topic_id => “topic”


Flume is a data ingestion tool to transfer data from one place to another. In our case we will read from kafka and transfer it to HDFS.

Kafka => flume => hdfs example configuration

The below flume configuration will transfer data to HDFS

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = list of brokers in the kafka cluster
a1.sources.r1.kafka.topics = topic
a1.sources.r1.batchSize = 100

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://namenode/flume

Leave a Reply

%d bloggers like this: