Power BI supports three methods for connecting to data sources, called Live Connection, DirectQuery, and Import Data (also called Scheduled Refresh). In the following points we will analyze what each of these types means, their characteristics, pros, cons, and which one to choose in each situation.
Determining which type of connection to choose between these (or Composite mode that integrates DirectQuery + Import Data) is a major decision in defining the architecture of a Power BI solution, since it is expensive to change the connection type once when one of the ways is defined.
Import Data
This method is called “Import Data” because the data is loaded into Power BI. Some call it “Scheduled Refresh” since you must schedule the upload and refresh of data in the Power BI service. Loading the data into Power BI means consuming memory and disk space, either on the development machine’s memory and disk space (when using Power BI Desktop) or in the Power BI service (when the report is published to the cloud). The greater the number of tables and data, the greater the use of resources.
There is no simple, straight forward calculation to know how much your source data is going to occupy in Power BI memory (i.e., it is not rows by columns by data type), as Power BI uses the xVelocity compression engine, which compresses the data and stores them in a compressed format. A 1 GB Excel file might be as small as 10 MB in Power BI, depending on the number of unique values in the column, data types, and other variables. In short, the size of the data in Power BI will typically be much smaller than the size of the data source.
Import data important pros and cons
PRO – Power BI 100% functional
With this method you can use all Power BI features and components, both Power Query to combine data from multiple sources and DAX to write advanced time intelligence expressions or visualizations. There are no limitations on Power BI functionality with this method.
CON – Size limitation
Size and capacity limits apply with this method. On the one hand, each model or file in Power BI cannot occupy more than 1 GB (with Premium capacity or Premium licenses this limit is 100 GB), always considering the compressed model. Then, you have a size of up to 10 GB in the account, while Power BI Premium has a limit of 100 TB. If using Power BI Report Server (the on-premises version of the Power BI service), the size limit is 2 GB per file.
PRO – Fastest method
This connection method is the fastest viable option. Report queries will be evaluated against the uploaded data and used from server memory, with no latency issues.
DirectQuery
DirectQuery is a direct connection to the data source. The data is NOT stored in Power BI, so Power BI is primarily a visualization layer and queries the data from the data source whenever it requires it. Power BI only stores the metadata of the tables (table names, column names, relationships), but not the data. The Power BI file size will be much smaller and will most likely never reach the size limit because there is no data stored in the model.
DirectQuery is only feasible for certain data sources. Here are a few (full list in Power BI data sources):
- Amazon Redshift
- Azure HDInsight Spark
- Azure SQL Database
- Azure SQL Datawarehouse
- IBM Netezza
- Impala
- Oracle Database
- SAP Business Warehouse
- SAP HANA
- Snowflake
- Spark
- SQL Server Database
- Teradata Database
Important pros and cons of DirectQuery
PRO – Scalability
This method has no size limitation, because the data is not stored in the Power BI file. Data sources with petabytes of data can be used in SQL Server, Oracle, or any other supported data source and connected from Power BI.
CON – Limited functionality
This method does not make the full functionality of Power BI available. As seen in the image below, there will only be two tabs in Power BI Desktop: Report (for visualization design) and Model (for changing relationships if necessary).
Power Query is limited to folding operations (more information in Power Query query folding), but DAX can be used in the display layer.
CON – Slow connection
A big disadvantage of this method is that the connection is slower than other types of connection. This is because in a report, made up of multiple visuals, each visual sends a query to the data sources, and then the data comes back. It is particularly important to evaluate the latencies between the servers and analyze the performance of the data sources, making performance adjustments on them.
Live connection
Live Connection is like DirectQuery in the way it works with the data source since the data is not stored in Power BI and the data sources are queried each time. Live Connection is only compatible with these data sources:
- Azure Analysis Services
- SQL Server Analysis Services (SSAS) Tabular
- SQL Server Analysis Services (SSAS) Multi-Dimensional
- Power BI Dataset in the Service
Because these data sources are modeling engines themselves, Power BI just connects to them and gets all the model metadata (table names, column names, relationships). With this method, Power BI is used for visualization only, and all modeling requirements must be performed at the data source.
Important Pros and Cons of Live Connection
PRO – Large model
The great benefit of this model is that you can have a large data model and you can also take advantage of the SSAS (SQL Server Analysis Services) modeling layer, if you use the Tabular engine, you will have DAX, and if you use the OLAP Multidimensional model, you will have MDX. With any of these two languages, all calculation and modeling needs can be covered. This method has better modeling capabilities than DirectQuery.
CON – Visualization only
The big disadvantage of this method is that you do not have the Power Query transformation. As seen in the image, in Power BI Desktop there is only the Report tab (for visualization design).
PRO – Measurements at report level
As a complement to the model made in the data source, with this type of connection DAX measurements can be made at the report level. However, it is not the most appropriate, since the measures will only be visible in the report where they are developed and will not remain in the model for reuse.
Composite model
Power BI allows you to generate composite models that consist of the combination of DirectQuery sources with Import Data data sources, delivering the performance and flexibility of Import Data, along with the scalability and large data size of DirectQuery.
Composite models not only support DirectQuery for data sources such as SQL Server and other DirectQuery sources but are also supported for Power BI datasets (dataflows or datasets), allowing you to create a chained dataset from the Power BI main dataset.
What are the architecture scenarios to use for each method?
Import data for agility and performance
Using Import Data you will have a fully functional Power BI with exceptional performance, so if your data set is not huge then you can easily use this method and produce reports in an extremely fast development period.
Live connection for business solutions
Many companies already have pre-built models in tabular or multidimensional SSAS. These models can be easily used in Power BI Live Connection. Even if an enterprise has not yet started the AAS (Azure Analysis Services) or SSAS solution, and is dealing with a large data set, this option is better than Direct Query, since SSAS has MDX or DAX analytical expression languages for cope with multiple calculation and modeling challenges, plus higher performance. Here is a sample architecture that can be used with this method.
Additionally, in a business environment there are important benefits to using Live Connection. If you are working on an analysis project with multiple report authors and team members, the dataset can be shared with a live connection, and the report authors can consume the dataset simultaneously.
DirectQuery for non-Microsoft sources
The DirectQuery connection is not widely used in Microsoft’s solution architecture. The main reason is that if you have a Microsoft-based solution architecture, you will probably use SSAS as mentioned above to take advantage of their analytics engine. DirectQuery mode is mainly used in non-Microsoft architecture, when the data model is managed in Oracle system, MySQL, SAP HANA, or something like that.
Summary
In summary, the three types of connection have been described: Live Connection, Import Data and DirectQuery; along with their differences, pros and cons, and the scenarios in which each should be used. This is an introduction, and more information can be found about it.
Sources: