Background

In my previous post, I discussed our reliance on Internet and our experiences with installing Starlink on our boat for the first time.

Since then, a lot has happened and our reliance on Starlink has only increased! As of this writing, we have been cruising the remote areas of Bristish Columbia (Princess Louisa Inlet, Desolation Sound and Broughtons) for over a month. Our work schedules don't allow us to take significant time off, so we do this while working full time from the boat. We work during the weekdays. We hop to different anchorages over the weekends and normally stay at those during the week. And if we need to travel for work, we get a float plane to the closest major airport!

There is no cell coverage in almost any of these places, so we rely completely on Starlink to perform two full time tech jobs. In practice, this means about 15 video conferences on a given workday between the two of us.

Starlink is performing pretty well in almost any condition for typical Internet usage (e-mail, web browsing, video streaming, etc.). It is also pretty good for somewhat regular voice and video calls. However, our demand for video calls and expectations are really really high. While we used one Starlink, it worked mostly good and at times great but we also encountered occasional disconnects and freezes in our video calls (anywhere from 0 to 10-20 seconds in total for a given 30 minute call). This may be acceptable depending on your job and your participation in those meetings but it is not an option for us. We are active participants and drivers for most of our meetings. So 10-20 second outage may not affect the work getting done, but also becomes pretty unprofessional. I quickly got tired of apologizing to people when part of my talk froze when the boat swung at anchor. So something needed to be done.

Normally, when you have an alternate connection (like LTE, or even a low quality marina Wifi) and you are using that connection along with Starlink, this is not much an issue, since that short disconnect or packet loss gets addressed by the alternate connection (this still depends on how you combine those connections and I explain this in more detail below).

But if you are truly in the middle of nowhere, there is no possibility of an alternate connection so Starlink becomes the single, sole connection option.

So we decided to get an additional Starlink and find ways to configure the two Starklinks differently so that the likelihood of both encountering a short glitch at the same time is lower. But how do you configure a Starlink differently, since there is no configuration at all?! Well, it turns out that, disabling motors of one of the dishes is the perfect way to do this!

Starlink dish is a phased array antenna, meaning it steers its tracking electronically over roughly 100 degrees. It also uses mechanical steering to point the Dishy north (unless you are really north, close to 58 degrees latitude yourself, then it will point south) where there is a higher concentration of satellites. I won't go into too much additional detail here, as I explained this in my previous article.

At some point, someone decided to test a Dishy by forcing it to stay flat/horizontal (there are several ways to do this without flat out breaking it) and see how well it performed when it looked straight up. The theory is that it doesn't need to point North, since with all the Starlink launches and many birds hanging out in space, there is always a satellite right above you anyways. And it worked, and somewhat strangely, it worked better! (I provide some comparative tests below)

I am not sure who came up with this originally, but at least I learned it from Marcus Tuck who does a bunch of cool hacks with Starlink on his truck.

Disabling motors was relatively easy with round dishes since there is a way to open them up. However there is no way to open a rectangular dishy, and there certainly is no software option. So the only option with a rectangular dishy is to drill a hole on your $700 Dish! There is a connector on the board to motor controls, roughly 5 inches from the edges. If you unplug it while the Dishy is flat, it stays that way! Needless to say, this was admittedly a bit nerve breaking but as you can see in the photos we did it!

There are enough instructions on the Internet, so I am not going to provide a set of additional instructions. For more information, see this video.

An obvious question is, how well or worse does Starlink perform when motors are disabled? We tested this over a 26 mile trip from Campbell River to Handfield Bay and at anchor in Handfield Bay.

Starlink with motors disabled performed surprisingly better, much better! Below screenshots show the difference. In order to start the test, I have rebooted both units at 11:26 and we left the dock at 11:40. We made a short stop at the fuel dock and made a slow pass to time the rapids. Then we dropped the hook at 16:18. Below screenshots are taken at 21:28.

After an initial startup time, when both Starlinks downloaded satellite schedules and got their bearings (roughly after 11:40), regular Starlink had outages for 74 seconds while the one with disabled motors had only 7! That is a whopping 10x difference in reliability!

One thing to note here is that, this is reported by the Starlink app itself. You can see the disclaimer there that these include "outages 2 seconds or longer", meaning not all packet losses are seen here. Depending on what you do, recurring outages of ~2 seconds can still be impactful for videoconferencing. So I wanted to conduct a spearate test to see actual packet loss on the network.

For packet loss, I simply used a ping test, which is admittedly not foolproof but since there is nothing super reliable and easy,  I just accepted the fact. I picked 3 different IP addresses (1.1.1.1, 1.0.0.1 and 8.8.8.8) that are offered as public DNS servers by Cloudflare and Google. These servers are distributed around the world using a technology called anycast and can be considered extremely reliable. It does't mean that could not introduce some errors into the picture. But for my simple ping test, they were acceptable. I then configured my router to route two of these IP addresses using different Starlink connections. I routed the third IP address through a SpeedFusion Cloud tunnel that uses WAN smoothing and hot-failover (more on these below). After this setup, I sent 30,000 tests packets to these three addresses simulatenously over ~8.5 hours, during a workday while the boat was swinging at anchor.  I sent these from my boat computer, which is connected to the router using a wired ethernet link, eliminating potential wireless reliability issues.

Below are the results:

Renaissance$ ping 1.1.1.1 #Routed through unmodified Starlink 
64 bytes from 1.1.1.1: icmp_seq=1 ttl=59 time=47.3 ms
....
64 bytes from 1.1.1.1: icmp_seq=29999 ttl=59 time=46.9 ms
64 bytes from 1.1.1.1: icmp_seq=30000 ttl=59 time=73.6 ms
^C
--- 1.1.1.1 ping statistics ---
30000 packets transmitted, 29330 received, 2.23333% packet loss, time 30841ms
rtt min/avg/max/mdev = 18.208/46.631/452.581/29.818 ms

Out of 30,000 packets sent through unmodified Starlink, 670 were lost, representing 2.23% packet loss.

Renaissance$ ping 1.0.0.1 #Routed through Starlink with disabled motors
64 bytes from 1.0.0.1: icmp_seq=1 ttl=58 time=35.9 ms
....
64 bytes from 1.0.0.1: icmp_seq=29999 ttl=58 time=59.2 ms
64 bytes from 1.0.0.1: icmp_seq=30000 ttl=58 time=55.3 ms
^C
--- 1.0.0.1 ping statistics ---
30000 packets transmitted, 29648 received, 1.17333% packet loss, time 30695ms
rtt min/avg/max/mdev = 17.182/46.576/368.723/26.625 ms

Out of 30,000 packets sent through the Starlink that has motors disabled, 352 of were lost, representing 1.17% packet loss. It wasn't the 10x difference observed earlier but still about 2X.

Renaissance$ ping 8.8.8.8 #Routed through SpeedFusion Cloud
64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=38.3 ms
....
64 bytes from 8.8.8.8: icmp_seq=29999 ttl=57 time=39.6 ms
64 bytes from 8.8.8.8: icmp_seq=30000 ttl=57 time=67.2 ms
^C
--- 8.8.8.8 ping statistics ---
30000 packets transmitted, 29948 received, 0.173333% packet loss, time 30883ms
rtt min/avg/max/mdev = 17.223/43.158/537.159/19.834 ms

Out of 30,000 packets sent through the SpeedFusion tunnel that bonds both Starlink connections and configured for WAN smooting, only 52 were lost representing a mere 0.17% packet loss, which is super impressive! (I discuss WAN smoothing and SpeedFusion in much more detail below)

What About Speedtest?

I try to avoid speedtests as a leading indicator of connection quality and reliability. Speed and reliability are different things. But I also recognize the difficulty of comparing two Starlinks without talking about throughput or speed (i.e. speedtest!). Trying to normalize these results is non-trivial since speedtest results are not very consistent (i.e. you will get varying results even in consecutive tests).

So instead of taking a single snapshot, I decided to take 10 different measurements in 10 minute intervals and provide a bit broader view. I wrote a script that ran over ~100 minutes and consolidated the results. While this is still not perfect, I think it is much better than a single screenshot. Here is the summary:

Regular Starlink Starlink with Disabled Motors
#
Latency Download Upload Latency Download Upload
1
29 ms 125.9 Mbps 20.9 Mbps 24 ms 218.4 Mbps 16.0 Mbps
2
40 ms 152.1 Mbps 20.3 Mbps 21 ms 212.5 Mbps 7.2 Mbps
3
40 ms 380.8 Mbps 15.6 Mbps 28 ms 215.6 Mbps 16.8 Mbps
4
24 ms 149.4 Mbps 15.9 Mbps 40 ms 172.9 Mbps 5.2 Mbps
5
20 ms 203.6 Mbps 23.1 Mbps 24 ms 115.2 Mbps 24.9 Mbps
6
37 ms 131.3 Mbps 17.7 Mbps 20 ms 122.6 Mbps 19.1 Mbps
7
33 ms 116.7 Mbps 20.8 Mbps 31 ms 143.5 Mbps 14.7 Mbps
8
25 ms 177.9 Mbps 18.2 Mbps 27 ms 122.3 Mbps 24.7 Mbps
9
21 ms 94.7 Mbps 16.2 Mbps 24 ms 97.8 Mbps 13.5 Mbps
10
31 ms 95.3 Mbps 16.1 Mbps 20 ms 123.9 Mbps 15.0 Mbps
AVG
30 ms 162.8 Mbps 18.5 Mbps 26 ms 154.5 Mbps 15.7 Mbps
MD
30 ms 140.4 Mbps 17.9 Mbps 24 ms 133.7 Mbps 15.5 Mbps

The results are not super consistent, which is somewhat typical with a speedtest. If I sum this up, maybe there is a small throughput degredation with the dish that has its motors disabled but it is not significant.

Getting two Starlinks is only half the issue (though admidteddly, an important one!). How do you then make simultaneous use of multiple connections?

First of all, you need a router that you can use to terminate both connections. Regular Starlink router is not designed for this, so first step is to get rid of it. You can put the Starlink router in a bypass mode and use an ethernet adaptor but that still is a cluttered solution for a variety of reasons (physical dimensions of the router and reliance on 120V AC being the two). So we eliminated both of the routers and powered them from 12 DC (I explained how to do this in detail in my previous article).

Then you need to choose a router. There are a ton of options here but I will shortcut this discussion. I think there is only one practical solution here, which I explain below.

If you use two Starlinks for reliable work, you need some key technologies known as Wide Area Network (WAN) Smoothing and Hot Failover. These are not actual names for the underlying technology per se but are the marketing names given for certain implementation methods. Regardless of the marketing angle, these are key to making a single reliable connection out of two or more connections.

Beyond a simple router, enabling these key technologies requires more than the purchase of a one time hardware. It requires a service offering, or you need to build that service yourself (I provide more details on this below). But before that, let's go into a bit of theory.

What Happens When You Have Multiple Connections?

Routers use something called a routing table to decide where to send packets. For a simple home (or boat) router, this is mostly trivial. There are a couple of clients connected via wired or wireless connections and there is usually a single outgoing connection (also known was Wide Area Network or WAN). So, all incoming packets get routed out of the WAN connection.

When there are multiple connections, there comes a notion of a metric. In simplistic terms, connection with a lower metric takes precendence. If the metric for multiple connections are equal, they are load-balanced.

What is Load-Balancing?

Let's say you have two WAN connections (say two Starlinks) with an equal metric, your connections will be load-balanced between them. In its most basic form, load-balancing means some of your packets will go out of one link (and responses to those packets will come back via the same link, unless you own your own IP address space and do some sophisticated routing, which is beyond the scope of our discussion) and others will go out-of the other link(s). But which packets go where and how is it decided?

There are different load-balancing algorithms but that is mostly irrelevant for you. For your practical intents and purposes, this will be what is known a flow-based (or connection based) load-balancing (you don't want to do per-packet load-balancing on your home router unless you really know what you are doing).

Flow is generally a 5-tuple identifying a connection (protocol, source address, destination address, source port and destination port), although it can be subset of these too in simpler implementations. When any of these 5 parameters are different, it means it is a different flow. With flow-based load-balancing, each flow is essentially routed  out of a different WAN link. If all 5 parameters stay the same, the same outgoing connection is used. This is important since it ensures that a single connection always stays on the same link. Not doing otherwise could for example cause routing every other packet in your Zoom call over different links. Imagine one of the links have higher latency than the other, you will have hell of an experience on that Zoom call!

So, in practice, if you have multiple WAN connections, some of your connections will go out of one link, and others will go out of another link. If some other device is doing the same on the local network, even if it is connecting to the exact same service on the exact same address, it may be going out of a different link (since the source address will be different, likely the source port too). Protocols and ports can be confusing, so ignore them. Basically imagine each service (like Netflix, Gmail, Zoom etc.) is associated with some address and port and that is enough for this discussion. Reality is much more complicated of course. But these companies  employ enough professionals to make this irrelavant for you!

A net implication of load-balancing for you though is the inability to use multiple links for a single connection (defined as a single flow with the same 5-tuple). So if all 5 parameters of the connection are the same, the speed of that connection is limited to the speed of one of your multiple WAN connections. There is no such thing as adding them up. When you have multiple connections (because of multiple devices or different ports), you can have the perceived notion of aggregated bandwidth over multiple links, since you will have many connections and some of them will go via one link while others will go through other links.

What Happens When You Lose a Connection?

When you have multiple connections and temporarily lose one (say you are having a 10 second outage on Starlink), things get interesting.

Routers are normally smart enough to not route packets out of connections that are unavailable. But how do they really know a connection is unavailable and more importantly how long does it take for a router to detect it?

Detecting something like an unplugged ethernet cable is easy. Since the physical signal is lost, the router almost immediately knows that the connection went down. But most outages are not that clear cut. In most outages, actual link (like the physical link, for example the ethernet connection to Dishy) likely stays up but packets may never reach the other side (maybe they get lost in space – pun intended). A router will not immediately know this. There are ways to configure  ping, DNS or HTTP tests over WAN links (usually known as health checks) but they also take some time to detect a down connection. You can (and should) configure these health checks but there is no ideal solution. Even in best case, it will take multiple seconds for a router to reliably conclude that a link is down (The same amount of time will be required to reliably conclude that a link came back up too).

So what happens during that time? Packets going out of that link will be lost. While you may have an alternative WAN connection, some packets are only sent via one of them. So if that link goes down even for some seconds, and those packets are for your Zoom call, you freeze out of that Zoom during that time (and then apologize to the people, since you were a ghost in their screen).

And even if your router detects the link failure and switches over to another link, it may take Zoom longer to recover. This is because in the event of a failover (router switching a flow to a different link) your source address as seen by the service will change. So this may require a session reinitiation betwen your computer and the service. While many applications are being optimized for these failures, not all will work seamlessly (as a result, you may need to reload, relaunch etc.). There are many more nuances here but we will skip those for this discussion, the basic theory holds true.

What is WAN Bonding?

In its most simplistic terms, bonding means aggregating multiple links to work as a single larger capacity link. Things get a bit complex here since this is not always as easy as sending each packet out of a different link. These links need to be very similar in  their characteristics or you need smarter algorithms. But ignoring how it is done, with bonding, even a single connection can use the aggregated capacity of multiple links, which is not possible with pure flow-based load-balancing.

We can argue how useful WAN bonding is on its own for a home use-case with something like Starlink since the bandwidth is usually quite high anyways. But ignoring that, in most implementations, it also forms the basis of additional features like WAN smoothing or hot failovers. I discuss them more below.

Bonding is also a topic that usually gets confusing for people. There are bonding technologies available from different vendors (like Mikrotik). When people see references to bonding in different documentations, they immediately think it is usable for making their own home or boat connections reliable with a simple configuration change on their router. Reality is more complex. Unless you have more control over both ends of that bonding setup (which requires running or relying on a service rather than turning on a switch on your router), it is practically not very useful for you. I explain this in more detail below.

What is WAN Hot Failover?

This is mostly a marketing term but in simplified terms, this means a connection is not visibly impacted when an underlying link becomes unavailable (assuming there are alternative links). Normally after connection is switched over to a different link, some sessions will need to be re-initiated (impact will depend on the application, but this could mean a drop and re-connect). With a hot failover, a session persistence is ensured and re-initiation becomes a non-issue for you, preventing drops and reconnects.

What is WAN Smoothing?

There usually is a difference and delta after a connection becomes unavailable and when your router detects it and brings the connection down (discussed in more detail above). You lose packets during this time.

Also some packets will naturally be lost on a link, even if doesn't fully become unavailable. No connection is 100% reliable at all times, especially something that needs to go up to a moving satellite and come back down using electronically steered antennas. Almost all applications tolerate some minimal packet loss but if it increases beyond a negligible rate, it becomes visible to you or the people working with you.

WAN smoothing is a mechanism to improve this. It essentially sends the same packet out of multiple connections to ensure redundancy and picks the packets that have the lowest latency if all of them end up arriving. So when there is no packet loss, it improves latency.

However if one duplicated packet arrives but the others do not (due to packet loss on alternate links), WAN smoothing essentially saves the day but decreasing packet loss.  

WAN smoothing essentially trades more bandwidth usage for greater connection reliability, reducing packet loss and latency.

As you can see in the test results I provided above, you see it really works!

Management of Multiple Connections

After a bit of theory, let's bring things together. Having two Starlinks (or any two alternate connections) is great but without doing something extra, it doesn't mean much for connection reliability in a practical way. Ensuring uninterrupted, non-freezing, high quality Zoom (or Google Meet for the Admiral) calls are a must for us and terminating those two connections on a router and doing pure load-balancing doesn't cut it.

Bonding is mostly meaningless for our use-cases since the bandwidth of a single connection is normally enough. But specifically WAN smoothing and to some degree hot-failover are crucial. They do depend on bonding and are not trivial to achieve. In order to handle these reliabily, you need to be able to handle both sides of a connection. The only practical way to do this is to do it over a Virtual Private Network (VPN) Tunnel. A tunnel has by definition two endpoints, one on your local network and one somewhere stable in the Internet, likely a cloud provider. There are different solutions for this requiring varying degrees of geekines: OpenMPTCProuter, Speedify or SpeedFusion Cloud.

One thing common among all options is the need for a remote endpoint. So you either need to have your own server on the Internet and run the necessary services on it yourself, or pay someone to build and run it as a service for you.

SpeedFusion Cloud is a service offered by Peplink to increase resiliency of multiple connections. It essentially creates a VPN tunnel between your router and a cloud location of your choosing. This VPN tunnel connection is then routed through multiple WAN links. By controlling both endpoints of this connection, Peplink is then able to offer a bunch of interesting things: WAN bonding, smoothing and hot-failover.

Needless to say this feature is only available on Peplink routers which is known to be a premium brand with higher prices. Also, by definition this is a service, as it requires cloud resources to be available to reliably terminate a VPN tunnel somewhere on the Internet. So you need to subscribe to a service for this which is linked to amount of data you pass through it. But considering what it enables, I actually find it very reasonably priced ($20 for 500 GB of data, discounts available for higher amounts).

As much as I like geeking out with DIY solutions, and even though I already have numerous servers operating in different cloud providers, basic connectivity for work is not something I wanted to spend a lot of time creating my solutions. So I went for SpeedFusion.

In order to use a SpeedFusion, you also require a Peplink router. So we picked  Peplink Balance One. Without SpeedFusion, I would normally pick a Mikrotik router, which gives much more granular, lower level controls. We use numerous Mikrotik routers for terminating our LTE connections and love them for the controls they offer. But Mikrotik has no equivalent for SpeedFusion and my requests from Mikrotik to consider such a service has so far been unanswered.

Anyway, Balance One comes with two WAN ports, which can be upgraded to 5 WAN ports with a separate feature license and has a separate USB port that can be used as another WAN port with an LTE modem. Before getting our second Starlink, we used all 6 WAN ports for numerous LTE connections but now streamlined them to two Starlinks and two LTE connections (to be used as further backups when we have cell coverage, which we don't have right now, while sitting at an anchorage in Broughtons).

Peplink Balance One with two Starlinks in Broughtons
SpeedFusion Cloud configuration with WAN Smoothing
Peplink Balance One with 6 WAN connections (prior to us getting our second Starlink)

Conclusion

It has been a dream for us to cruise while continuing our jobs. Previously we have been constrained to where we could find cell signals. Starlink has truly become a game changer for us. While a single Starlink worked pretty well, it did not always meet the extremely high expectations we have for high quality Internet (something that is cruical for our lifestyle).

Using two Starlinks and using WAN smoothing increased the reliability of our connections a lot, allowing us to truly be in the middle of nowhere and continue to do our jobs. While Starlink Maritime was announced, it targets a different segment at a significantly different price point. We managed to create our own coastal, and anchorage-specific Starlink solution combining two Starlink Dishes, a similar approach to what Starlink did for their maritime offering (which seems to use two of the high performance business dishes).

Two Starlinks obviously cost 2x (not to mention the extra power consumption) and even a single Starlink is not necessarily cheap depending on your lifestyle. We are lucky enough to continue our jobs during the week while cruising mostly over the weekends. Two Starlinks enable it for us so we find it reasonable. We understand it is not the same for everyone, with different trade-offs in life. We also make our trade-offs (working while sitting next to a waterfall in Princess Louisa can feel like an insult to where you are!) but we feel lucky and privileged to have the option.

And  lastly, with everything moving at a very fast pace in this space and at work, we don't know if or how long this will last but so far we have been very happy!