SQL endpoint access for remote users

MS Fabric, Synapse Lakehouse, Synapse Warehouse

SQL endpoint access for remote users

Hello Fabricators. I am still not sure if I like that term 🙂 but I hope it works for today. Today, I will not talk about Power BI, but Fabric Lakehouse, which is a part of Data Engineering Experience within MS Fabric.

I recently struggled with one setup, and since I have nothing to do with Network Security settings as part of my daily duties, I figured that I will share my experience with other people like me. I hope that it will help you to avoid spending too much time on troubleshooting.

What is SQL endpoint in Fabric

As always, let’s do some groundwork first. In Microsoft Fabric you may take different paths when it comes to data storage. Fabric Warehouse will be a safe choice for everyone working with SQL Servers in the past and it’s fairly safe to assume that the way to interact with the data is… SQL. It is not that obvious though when to comes to a Lakehouse. When you provision a Lakehouse within your workspace, not one, but three objects are being created: a Lakehouse, Default Semantic Model and SQL analytics endpoint. This concept could be known to Power BI Developers who were using Datamarts. In other part of the Microsoft Realm, which is the Dataverse, it was known as TDS endpoint. TDS stands for Tabular Data Stream, and it is a protocol that allows to connect to SQL Server.

This already suggests that SQL analytics endpoint allows to connect to a given data source as it was a SQL Server.

Why is this a big deal?

As long as a Power BI Developer could live without any of available Fabric experiences, OneLake is definitely a big deal. This is a game changer in terms of how we handle our data within workspaces, for many reasons. SQL endpoint is definitely one of these reasons. Thanks to this we gain much better access control than we had with standard dataflows. We can design Views in SQL, that will serve other people from our organization to utilize the data in best possible way. Thanks to SQL Endpoint we could leverage Query Folding, Incremental Refresh, and even connect to the data using Direct Query mode (or even Direct Lake mode! But more on that in future.

How to use SQL endpoint?

The easiest way (from technical perspective) is to use Power BI Desktop. In One Lake Data Hub, we can find our Lakehouse, or even more specifically, SQL analytics endpoint:

Why these are separate options if the SQL endpoint is created with the Lakehouse? It is because we could share only SQL analytics endpoint with other users, without sharing access to the Lakehouse. Lakehouse level access means that end user will see all the tables and could use the data with Apache Spark. If you share your Lakehouse using SQL endpoint option, you also give access to all the tables inside, but only SQL interaction is allowed. You might also share your Lakehouse without sharing anything, however that sounds. Once you do it, you could grant access to specific tables only through SQL endpoint, using DCL commands (Data Control Language). And this was the scenario I was using.

It is important to mention here, that you may interact with the data directly in Power BI Service (or now should it be Fabrice Service? doesn’t sound right…), and using the SQL connection string, consume the data using desktop tools like SQL Server Management Studio or Azure Data Studio. Let’s finish connecting to data through Power BI Desktop, because as easy as it is with One Lake Data Hub to find the data you need, for whatever reason Microsoft made it difficult to people to quickly understand what they are doing:

I will not blame anyone who thinks “let’s just hit that Connect button, default option is the best”. Not in this case. Standard “Connect” option creates a Live Connection to a Default Semantic Model (that is created together with a Lakehouse). When you select a side option – Connect to SQL endpoint – here is where you can choose between Direct Query and Import modes. Remember, Direct Query is not the same as Live connection, but this is also a bit broader topic. If we choose SQL endpoint and go with Import mode, here is what we will see in Power Query:

We can see that Power Query used Sql.Database connector to fetch the data. Took SQL connection string from Lakehouse as a SQL Server, and name of Lakehouse as the Database Name. If you would like to use Sql.Databases connector instead (very minor difference in connector naming), this is where Database parameter is optional, we could delete it and connect only to SQL connection string as a SQL Server. Here is what you get:

What we get here is a full list of objects related to SQL endpoint. This could be a Data Warehouse, other Lakehouse, or even Datamarts. To summarize, we could thing of SQL endpoint as a SQL Server instance, that contains multiple databases (Lakehouses, Warehouses, Datamarts). SQL Endpoint allows to enter the data on Workspace level, while specific artifacts are accessed by specifying the Database Name. This makes it a lot easier if we have multiple databases in our workspace, as consumers must store somewhere only one SQL connection string.

Sharing with remote users

Sharing the data through Lakehouse / SQL endpoint is very easy. It worked like a charm for all the users, except the remote ones… First of all, what do I mean by remote user? It is not external user, they usually have guest accounts created in Azure AD (now actually Entra AD), so they could access our cloud resources. Remote users are the ones who require VDI or Remote Desktops to connect to organizational infrastructure. In this case, they usually are hidden behind more strict security rules.

Step 1 – Check TCP Port

It all started with end-user contacting us and providing following error message:

The most important part of this error is at the end in the bracket: (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server). I found some issues described online, and one of the resources was related to Dataverse. Remote user must be asked to run following script in PowerShell:

Test-NetConnection -ComputerName NAME -port 1433

TCP Port 1433 is a default port, used to communicate with SQL Server. Replace the NAME with SQL Connection string you get from your Lakehouse/Warehouse:

Response you will get in PowerShell should like this:

What we are looking for here is the status of TCP connection. If TcpTestSucceeded is False, then it means that port 1433 must be opened. When we asked End User to check the connection to SQL Endpoint, he received following error:

(provider: TCP Provider, error: 0 – A connection attempt failed because the connected part did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)

Step 2 – Enable SQL Endpoint

Port is open, but we must ask to enable our SQL Endpoint to communicate on port 1433. If your Network allow wildcards to be used, it will be more convenient for you to provide the address *.datawarehouse.fabric.microsoft.com. If not, then you must provide the full SQL Connection string we got in previous step.

Step 3 – Check IP Address

Even though TCP Port is already open, it may be the case that specific IP Adress must be whitelisted. How to get it? It is already there in the screenshot above. Grab the IP Address from RemoteAddress section, and use the same PowerShell script, but use the RemoteAddress now as a NAME.

We again look at TcpTestSucceeded value to see if there is an issue with IP Address or not. If the value is False, IP Address must be included in the whitelist.

Step 4 – Enable more IP Addresses

The IP Address you see in the response here, is a public IP Adress and belongs to one of the Microsoft Datacenters. But of course, it is not only one IP Address related to this Datacenter. Therefore, you might need to include entire IP Subrange in firewall rules just to be sure. You might also track specific IP Addresses and find the exact values, if rules applied in your organization must be a bit more strict. After enabling IP Addresses related to Datacenter, we got different error:

(provider: TCP Provider, error: 0 – The semaphore timeout period has expired.)

When you read the full error message, at least there is a light in the tunnel – connection was successfully established. It means that we are getting lost somewhere in between.

Step 5 – Enable even more IP Addresses

During the troubleshooting we found out another IP Addresses that was involved in the entire process, which didn’t look anything like the IP Addresses related to SQL Endpoint. This was another public IP Address related to Microsoft Datacenter, this time located in Netherlands. Why is that? The answer that hit me after I found the location of IP Address, should be fairly obvious since the beginning… Our Premium Capacity is set in West Europe Region. Datacenter that is used to provide for West Europe Region is located in Netherlands. Therefore, this will again be specific to your case.

After all these steps were done, End User managed to connect to the SQL Endpoint using VDI. It took couple of days, but we learned important things down the road.

Conclusion

There is probably low chance, that you are going to run these steps all by yourself, it will rather be handled by the Network team. Still, I hope that with this article you manage to save some time and give the Network team more fighting power.

To summarize the steps that we took:

Make sure that port 1433 is open.
Enable SQL Endpoint to communicate on port 1433.
Allow connection for specific IP Addresses related to SQL Endpoint.
Allow connection through additional IP Addresses related to Datacenter, dedicated to Azure Region of your Premium Capacity.

I hope it will be useful. For now, thank you reading the article and see you next time 🙂

Pawel Wrona

Lead author and founder of the blog | Works as a Power BI Architect in global company | Passionate about Power BI and Microsoft Tech

Did you enjoy this article? Share with others :)

5 5 votes

Article Rating

Comments notifications

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments