Excel is still commonly used to store some static data, which often needs to be analyzed as part of Power BI reports. There some guidelines about how to connect Excel with Power BI web app. But what if we need to have it in Power BI Desktop?
Also, Power BI is a great tool which allow user to mix different data sources in single report. So, our goal is to configure refresh for the workbook created in Power BI Desktop with mixed data sources: Excel file stored in OneDrive and traditional RDBMS source.
This post covers Amazon Data Pipeline configuration to load Twitter data from S3 to DynamoDB using EMR on daily basis.
In Part 1 i have described how to setup and deploy EMR cluster for our ETL process. Now is time to automate it with AWS Data Pipeline.
How to capture output from Hive queries in Oozie is an essential question if you’re going to implement any ETL-like solution using Hive. Most commonly used approach is a shell-action, however it requires Hive CLI to be installed on each node, also it doesnt works for remote clusters. Here i wanted to share more generic approach using custom Java action. Continue reading
Step-by-step guide of how to proceed with twitter analytics tasks using Elastic MapReduce, DynamoDB and Amazon Data Pipeline.
In this post I will use Flume agent configured in previous post to deliver raw JSON data to S3 storage. Also, saying Twitter analytics i mean some aggregations like “Top 100 users mentioned per day” and “Top 100 Urls mentioned per day”. Continue reading
This post is about advanced custom Twitter source for Apache Flume.
In previous post i’ve described flume installation and configuration. I will use the same EC2 node in this article. But everything i talk here will work for any other Apache Flume installation.
Here i will explain how to run Flume agent on windows machine and save the data to HDFS on remote cluster.
This is the first post in series about Twitter Data processing using Amazon EC2.
Here i will describe step-by-step how to set up the Apache Flume Ganglia monitoring on the single EC2 instance.