Refreshing OneDrive-based excel data source in Power BI

Excel is still commonly used to store some static data, which often needs to be analyzed as part of Power BI reports. There some guidelines about how to connect Excel with Power BI web app. But what if we need to have it in Power BI Desktop?

Also, Power BI is a great tool which allow user to mix different data sources in single report. So, our goal is to configure refresh for the workbook created in Power BI Desktop with mixed data sources:  Excel file stored in OneDrive and traditional RDBMS source.

Continue reading

Oozie – Capture output from Hive query

How to capture output from Hive queries in Oozie is an essential question if you’re going to implement any ETL-like solution using Hive. Most commonly used approach is a shell-action, however it requires Hive CLI to be installed on each node, also it doesnt works for remote clusters. Here i wanted to share more generic approach using custom Java action. Continue reading

Twitter analytics with Amazon EMR and DynamoDB. Part 1

Step-by-step guide of how to proceed with twitter analytics tasks using Elastic MapReduce, DynamoDB and Amazon Data Pipeline.

In this post I will use Flume agent configured in previous post to deliver raw JSON data to S3 storage. Also, saying Twitter analytics i mean some aggregations like “Top 100 users mentioned per day” and “Top 100 Urls mentioned per day”. Continue reading