Large Semantic Models Storage Format

Architecture, Power BI, Power BI Service

Large Semantic Models Storage Format

Power BI Premium offers a powerful feature known as Large Semantic Models, which extends beyond the default 10 GB storage limit for semantic models. Whether you are new to this concept or already utilizing it, this article will provide an in-depth understanding of its benefits and technical background.

What is Large Semantic Model Storage Format?

This is one of Premium features that Power BI offers to users with Premium Per User License, or with access to a Premium Capacity. While the most well-known advantage is the ability to surpass the 10 GB model size limit, this feature provides several additional benefits:

Scalability: Models can exceed 10 GB, constrained only by Capacity limits.
XMLA Performance Enhancement: Potentially can improve the performance of XMLA write operations.
On-Demand Load: Faster loading of evicted Semantic Models.
Larger Segments: Default segment size increaser from 1 million to 8 million rows.
Scale-Out Capability: Enables read-only replicas for concurrent query processing.

Technical background

Before I jump into each benefit, I think it will be useful to explain what technically happens behind the scenes, that allow us to store larger models. If you look at the feature name, you can directly read from it that this is all about a specific Storage Format that allows to store Large Semantic Models. When going through Microsoft’s Documentation, you will find a reference to Azure Premium Files Storage. This Premium Tier Plan uses Solid State Drives (SSDs) to store our data, which are of course much faster than standard HDDs. Unfortunately, not much is documented on the exact file format that is used to store Large Semantic Models. If you are in possession of this valuable information, please, let me know 🙂 Standard Semantic Models on the other hand are stored as ABF files (Analysis Services Backup Files). If you consider backup scenarios for your Semantic Models using Azure Storage, this will be the format you will work with. Choosing a Premium Tier Storage seems like a better solution, but has a strict data residency policy, that I will describe closer to the end of this article. For now, let’s start with all the benefits mentioned above.

Handling Larger Models

In Power BI Premium/Fabric Capacities, by default you can’t have bigger Semantic Model than 10 GB, even though your Capacity Documentation could state something else:

*Figure 1. Power BI Semantic Model size per Capacity SKU.*

Now, the size matrix is a bit tricky and deserves a separate article. What we actually see here is not the size of our Semantic Model, but max memory in GB that can be allocated for the Model. It’s still technically possible to push your Semantic Model size close to 25 GB, but this requires longer explanation. In a nutshell, when Power BI Semantic Model is refreshed, Power BI Service allows report consumers to use the report based on old version of the Semantic Model, while refresh takes place in the background. Once it’s finished, it swapes the refreshed model with old version. This means that Semantic Model during refresh, and the Read Only Replica combined can’t exceed that 25 GB limit. Still, even for F64/P1 you could be able to expand the model size beyond 10 GB. To do this – you guessed it – you must enable the Large Semantic Model Storage Format. For larger Capacities, the limit goes up to 400 GB.

It is important to note here, that using Large Semantic Models still block you from publishing models that are larger than 10 GB. It means that you must publish smaller models and enable full load of the data once they are published to Power BI Service.

Performance boost for XMLA write operations

If you are completely new to this concept, XMLA is XML for Analysis, a protocol that allows communication between Semantic Models and external client tools (DAX Studio, Tabular Editor, Visual Studio, etc.). Depending on the settings you have in your Tenant / Capacity, it allows Read and Write operation. Write operation can vary from vast data processing to quick metadata updates. Here, documentation encourages you to consider enabling Large Semantic Models Storage Format even for smaller models, as it may significantly improve write operations via XMLA Endpoint.

On-Demand Load for evicted models

In order for Power BI Semantic Models to be processed / queried, first they must be loaded into memory. Power BI Service efficiently manages the memory by evicting from memory the data that is no longer being used. This means, that next time it is going to be used, it must be loaded back to memory, and this process may cause noticeable delays. Chris Webb has a fantastic article that touches this topic.

As we now know, at least on the concept level, what is the Eviction process, how do Large Semantic Models help with this? Large Semantic Models have On-Demand Load feature enabled by default. This means, that when there is a query against our Semantic Model, only relevant portion of the data is loaded into a memory. You may often here that it’s paged in to the memory. Standard Semantic Models on the other hand, when evicted, must be loaded entirely to memory, which may take precious time and resources time. Now when you know about On-Demand Load feature, you may also have a better understanding why write operations via XMLA Endpoint might be faster as well. This makes it a very important premium feature of Power BI.

Default segment size increased to 8 million rows

If you are like me, and your first interaction with building Analysis Services solutions was through Power BI, I bet that for many of you Segment Size was completely new concept. So, let’s start explaining what the Segment is and what is the benefit from increased segment size. You probably already know that the storage engine responsible for Import Mode in Power BI is VertiPaq. When data from a single table is being ingested, VertiPaq creates for each column these internal data structures called Segments. This is done for couple of reasons:

Faster Query Processing: during DAX Query execution, engine can leverage multiple CPU Cores and process several Segments in parallel and later combine the results.
Better Data Compression: VertiPaq is using 3 different data compression techniques to reduce the memory demand. Without going in too many details, two compression methods can benefit from segmentation: Value Encoding and Run-Length Encoding. Hash Encoding, which builds the dictionary of values doesn’t benefit from multiple segments, as dictionary is being created on a table level.

By default, standard Segment size for Power BI Semantic Models is 1 million rows. This created a challenge for large Enterprise grade models, especially comparing it to capabilities of Analysis Services, where the Segment Size is 8 million rows. Somewhere in 2021, Microsoft announced that defaults segment size grows to 8 million rows when Large Semantic Model is enabled:

Segments - Standard vs Large Semantic Models. — *Figure 2. Segments – Standard vs Large Semantic Models.*

Looking at this picture we can see, that having less segments equals to fewer cycles for CPU to process all of them in parallel. How each segment is being processed? Let’s look at the components of the equation:

Segment is being dispatched for processing
Processing the segment
Collecting the results

Processing segment is a variable here, as it depends on the size of the Segment, and of course it will take more time to process 8 million than 1 million of rows. However, components marked with red color are fixed values for each Segment processing. With our Large Semantic Model examples in the picture, engine must repeat these fixed costs only 7 times, while with Standard Model this must be done 50 times. This were we get the query processing saving.

On top of that, if larger Segments help us get better data compression, it will take less time to read them as there will be less memory to scan. However, as mentioned earlier, higher data compression rate is not guaranteed.

Quick thing to clarify before I wrap this topic up. I often hear Segments being confused with Partitions, but they are not the same thing. Partitions are table level structure, created to speed up data refresh and to enable incremental loading, all based on a user-defined logic. While Segments are a column level structure, created automatically during data ingestion and they aim to improve query performance and data compression. It’s also important to mention, that Partition defines the upper bound of a Segment Size. Meaning, if you have 50 million rows in a table and 50 partitions (1 million rows each), each segment will also have 1 million rows. Regardless of whether you enabled Large Semantic Models or not. Keep that in mind when you design your solution.

If you can spend ~30 minutes of your time to learn more about segments, Marco Russo has a great video explaining the concept of Segments:

Explaining segment size in Power BI Premium – Unplugged #29 – SQLBI

Semantic Model Scale-out

Very briefly about this one, as it’s quite complex topic. If you would like to read more, I leave a link to Microsoft Docs. In a nutshell, it helps to reduce the latency in peak times, when you have a lot of concurrent users querying your model, or you have a lot of complex queries that consume a lot of resources. In these scenarios, Power BI can create additional read-only replicas (copies) of your Semantic Model and distribute the query processing between them. This also separates your read operation from being impacted by refreshes, as original semantic model is handling write operations. You can’t enable this feature unless you first enabled the Large Semantic Model setting. It’s not clearly stated in the documentation, but I believe this mechanism also benefits from On-Demand load, which allows for more efficient memory management.

Where is the catch?

Great question! Entire article so far was all about the benefits. If this feature is so great, why is it not enabled by default in Premium Capacities? Previously, there was a quite important drawback of this feature. If you enabled that setting on your Semantic Model, you couldn’t download the pbix file from Power BI Service. However, this limitation is no longer there, why it’s not enabled by default? There are couple of reasons.

Depending on the organization you work for, Enterprise-grade solutions may be only a small percentage of all Semantic Models. In this case, smaller segment size makes a lot more sense to still leverage the distributed query processing.
Premium Files have a strict policy related to Data Residency. More on that one below.
Cost – I will finish with a pure guess here. But of course, Premium Files as a technology are more expensive in Azure than standard storage. Having this setup as a default one is probably cheaper for Microsoft.

What is up with data residency factor? First of all, to enable Large Semantic Model, your Capacity must be located in Azure Region that supports Azure Premium Files. When Semantic Model is moved to this storage system, it’s bound to this Region forever. In case you would like to move your Capacity to a different Region, Large Semantic Models will not be working anymore. There are two information shared in Microsoft Docs about it. First one coming from the feature documentation, stating that you can’t move workspace to different region if it contains Large Semantic Models. My understanding here would be, that system will block you when you try to move a workspace to a different region, but this is not what is happening. Second one, related setting-up Multi-geo support gives a more accurate explanation. Here it states: “Moving large-storage format semantic models from the region where they were created results in reports failing to load the semantic model.” This clearly indicates that it is possible, but the related reporting will not work. This happens because a report moves to a new region, while entire Semantic Model stayed in previous region. You will not be able to run a refresh or consume the report. In this case you can either move it back to original region, re-publish the report to new region (and run full refresh again). There is also another trick you could do here, but it’s possible only for Semantic Models below 10 GB. You can turn off the Large Semantic Model setting before you migrate the workspace to new region, and once it’s migrated, turn on the setting again. Power BI will not allow turning off this setting if size of your Semantic Model is above 10 GB, as standard storage format doesn’t support it. Still, it could be useful trick for the smaller models, that leverage Large Semantic Model feature.

How to enable Large Semantic Models

Let’s start with where you can’t do it and it’s Power BI Desktop. This setup is maintained in Power BI Service, and we have a couple of options to enable it. First, you can decide that Large Semantic Model will be the default storage format for all Semantic Models published to your Workspace. Go to your Workspace settings -> License Info, and modify the setting:

*Figure 3. Setup Large Semantic Models on a workspace level.*

If this is too generic setting, you may make this decision on a Semantic Model level. Go to Semantic Model settings and scroll down to the setting:

*Figure 4. Setup Large Semantic Models on a model level.*

Here you may also find the size of your model, without using the external tools – pretty cool. And maybe you’ve also noticed a Query scale-out setting right below Large Semantic Model Storage Format.

There are also more advanced methods that allow to setup the feature in programmatic way. If you are interested in this approach, you can check how to do it in PowerShell or using Power BI REST APIs.

Conclusion

Large Semantic Model is a very important feature that has a lot of benefits, but at the same time it’s equally important to know all the details and potential drawbacks. I hope this article helped you understand this concept.

As always, thank you for reading and see you in next article 🙂

References

- Large semantic models in Power BI Premium – Power BI | Microsoft Learn
- Explaining segment size in Power BI Premium – Unplugged #29 – SQLBI

Pawel Wrona

Lead author and founder of the blog | Works as a Power BI Architect in global company | Passionate about Power BI and Microsoft Tech

Did you enjoy this article? Share with others :)

5 2 votes

Article Rating

Comments notifications

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments