Step-by-step guide of how to proceed with twitter analytics tasks using Elastic MapReduce, DynamoDB and Amazon Data Pipeline.
In this post I will use Flume agent configured in previous post to deliver raw JSON data to S3 storage. Also, saying Twitter analytics i mean some aggregations like “Top 100 users mentioned per day” and “Top 100 Urls mentioned per day”. Continue reading
This post is about advanced custom Twitter source for Apache Flume.
In previous post i’ve described flume installation and configuration. I will use the same EC2 node in this article. But everything i talk here will work for any other Apache Flume installation.
Here i will explain how to run Flume agent on windows machine and save the data to HDFS on remote cluster.
This is the first post in series about Twitter Data processing using Amazon EC2.
Here i will describe step-by-step how to set up the Apache Flume Ganglia monitoring on the single EC2 instance.