Thursday, May 7, 2020

How to Serve over 100K Web Pages a Day on a Slower Home Internet Connection

Server and Aliens

The cheapest way to run a website is from the home Internet connection that you already have. It's essentially free if you host your website on a very low-power computer like a Raspberry Pi. The problem with this approach is that upload bandwidth is often very limited on home Internet connections, so your website may occasionally miss potential traffic during "high-traffic" periods. For this reason, and because of the headache associated with hosting a website from a computer in your physical possession, the conventional wisdom is that you should not host a website from a server based in your home. However, there are ways of capturing high-traffic spikes even on many low-bandwidth home Internet connections.

Let's look at some numbers. The type of Internet service one can get in the United States varies widely from place to place. So, many in the US are stuck with home Internet connections that are less conducive to hosting a home webserver. Let's assume your only option is AT&T's 12 Mbit/s DSL service, which has an upload speed that fluctuates from second-to-second between 64 KBytes/s and 128 KBytes/s (between 512Kbits/s and 1 Mbit/s). This means that after the overhead associated with the connection, your average data upload speed is only about 90 KBytes/s. Now, let's say your website is a fairly average blog that gets around 3000 visitors and 6000 page views a month. On a 90 KB/s connection, your webserver should have no problem at all handling that kind of traffic on normal days.

On rare occasions, however, you post that golden article on social media that goes viral and many more people than usual want to read it. Let's assume that a particular article takes an average of 2.16 seconds to be accessed from your server by each of your readers. That means that it can only be read about 40,000 times in a 24-hour period. Things are a little more complicated by a number of factors, including the fact that readers will not come in a steady stream. But, let's say for this hypothetical scenario, which is a best-case scenario for you, that readers do come in a fairly steady stream. Let's say 70,000 people attempt to read the article in the first day after you post it. That means 30,000 readers will not be able to read your article during that first-day traffic spike. Since your website normally gets only around 72,000 page views per year, missing out on 30,000 page views during that one-day traffic spike means you lose a substantial portion of your potential readers for the year.

Don't give up hope just yet, however. There are ways of capturing those rare traffic spikes. Given that you can't increase your upload speed, and you can't tell your readers to wait until tomorrow to read your article, you generally have only two viable options for getting your article to more readers in a short amount of time: 1) reduce the amount of data that must be uploaded, and 2) maximize the percentage of time that your server is actually transmitting data, as opposed to doing other things like connecting to the visitor's browser, SSL handshaking, waiting for the visitor's browser to make the next request, etc.

Before we get into the details, let me talk generally about what happens when a visitor brings up one of your web pages in his browser. I'll only give generalities, because we are only really concerned about the chain of events and how to increase the speed of each event. First, the visitor's browser queries a DNS server to get your website's IP address. Then his browser connects to your webserver and a mostly one-way conversation takes place about the characteristics of his browser and his computer. This is called the connection portion of the page transfer. Then, if your website is an HTTPS website, an SSL handshake takes place where the TLS protocols that will be used are established, the webserver gives it's SSL certificate, and the browser verifies that it is legitimate. The SSL handshake alone can take as long as a third of a second. Once the browser trusts the webserver and a secure connection has been established, the data transfer is ready to begin.

The data on your web page is transmitted in stages, called requests. Each type of data is requested in a different request. First the HTML file for the page is requested by the visitor's browser. Then, data is sent one request at a time for each file that is referenced inside the HTML code. Examples of files that can be referenced in HTML code are CSS data files, picture files, video files, sound files, and favicon files. If your web page contains 50 pictures, the visitor's browser must make 50 requests to get all 50 pictures.

After each request, the browser waits for the webserver to receive the request and begin sending back data. Some amount of wait time is always required for the beginning of the data transmission to reach the browser. Depending on the distance from the webserver to the visitor's computer, the wait time associated with each request can be anywhere from a few milliseconds to a quarter of a second or more. If a web page has 50 pictures and the visitor is on the other side of the world from the webserver, it is possible that transmitting those 50 pictures one at a time could take over 12.5 seconds plus the time required to transmit the data in the picture files. Often, many requests occur in parallel, however, so the elapsed time to send 50 pictures may be significantly less than 12.5 seconds. But, it doesn't have to be.

Timelines can also be complicated even more by the fact that a webserver may have several visitors all trying to download the same web page at the same time, and many of the requests by different visitors may be occurring in parallel. This is another reason that predicting the exact time required to download a web page can be difficult.

Now that you have a basic understanding of what happens when a visitor's browser downloads web pages from a webserver, let's talk about 7 ways you can increase the amount of traffic your server can handle on a low-bandwidth connection.

1. Make Sure You don't have any Page Loading Issues:

You can make many types of mistakes in creating a web page, so I won't list them all here. The Google web console and web page load-speed testers like the Pingdom Website Speed Test will look for mistakes on your web page and show you any that they find. The first thing you need to do to increase the load speeds of your web pages is to remove all of these mistakes.

2. Create Static or Mostly Static Web Pages:

Static web pages generally load much faster than dynamic web pages and require less bandwidth. One reason is that static pages do not require JavaScript to be sent to a visitor's browser. If you must have some dynamic content, consider using PHP code that runs on your server instead of your visitor's computer. But remember, the more code you run on your server, the more powerful your server must be, and increased power means increased electricity costs. If you have a low-bandwidth Internet connection and don't want to have high electricity costs, you should forget about animation on your web pages. Forget about database programs. Forget about anything other than very simple code running inside your web pages. Most of the eye candy you see on the Internet these days is served via high-speed Internet connections by servers costing thousands of dollars each that are owned by large companies that can afford to pay for them and the electricity to run them. Unfortunately, nearly all the information that you are likely to find about the best approach for developing a website will be about these types of websites. Don't fall into the trap of thinking these are the only types of websites you can build.

If you absolutely must have some dynamic content, think very carefully about ways of minimizing it. For example, instead of a full analytics package, write your own PHP code that does nothing more than count the number of times a page is visited. Instead of running some commercial visitor comment code that also requires you to run a database program, write your own smaller, faster code. Or, use Babbleweb for visitor comments. And, consider disabling comments after 20 or 30 have accumulated on a particular article.

3. Reduce the Amount of Data on Each Web Page:

Reducing the amount of data per web page that your server must upload can increase page load speeds significantly. For example, going from three, 500 KB pictures on a page to one 50 KB picture can save over 16 seconds on a 90 KByte/s Internet connection. Even reducing the amount of HTML text by writing more succinctly or breaking a long article up into two or more shorter articles can help. And, if your CSS code is in a separate file from your HTML code, you should clean up your CSS code to get rid of anything that is no longer used and minimize the number of comments. It goes without saying that a text file takes much less time to upload than the same information presented in an audio or video file format.

Don't forget to use picture file formats that minimize the file size of your pictures. You should experiment with sites like Picresize and Compress.io to see how much compression you can get away with and still have pictures that will not detract from the overall appearance of your website.

4. Reduce the Number of Requests per Page:

As the introduction to this article notes, the number of requests a visitor's browser must make to get the entire contents of a web page can have a large effect on the page load time. So, it pays to minimize the number of unnecessary pictures and other types of data included in your web pages. Also consider putting CSS code inside HTML files, rather than in separate files. Putting CSS code in a separate file that is used by multiple HTML files allows you to more easily change the look of your website, but it also means that an additional request must be made for each web page that includes that CSS file.

Any visitor that tries to access a page on your website via HTTP and must be transferred to the HTTPS page takes up an additional request. Think of ways of minimizing the number of times that happens. Speaking of HTTP vs. HTTPS, no one seems to mention that the price you pay for the added security of HTTPS is increased bandwidth and CPU requirements. The SSL handshake alone can increase the page load time by more than a quarter of a second.

5. Compress Your Web Pages:

Most webservers can automatically compress content before uploading it to visitors' browsers. This can reduce the amount of data that must be uploaded for HTML and CSS code by 50 to 70%. As far as I can tell, compression does not take a lot of CPU power. I've also noticed that when one dynamic web page is receiving a large amount of traffic, the Lighttpd webserver appears to hold that dynamically-compressed page in RAM for a while, so it doesn't have to re-compress the page before each upload. Static web pages are only compressed once, no matter how many times they are uploaded.

You should compress your picture files before you include them in your web pages. As a result, asking your server to compress pictures again does not reduce their file sizes significantly. It only increases your CPU requirements. So, you should tell your webserver not to compress pictures.

6. Don't use Website Development Software to Create Your Website:

Not using commercial website development software goes against just about everything everyone else on the Internet is telling you. Nevertheless, it is the right thing to do for low-bandwidth Internet connections. To illustrate my point, I picked this web page from Matt Mullenweg's blog that runs on Wordpress. I chose this web page because it is mostly text with just a few pictures and no advertising. This page has 9669 bytes of text that the reader sees, but the HTML file contains 144591 bytes of HTML code, thanks to Wordpress! For comparison, I picked this article from my website that I wrote myself in HTML. It has one picture and no advertising. The reader sees 9698 bytes of text, and the HTML file contains only only 15074 bytes of code. That's about one tenth of the code in Matt's article! I could probably have reduced that to less than 12000 bytes if I had really wanted to. By the way, I also have a 3105 byte CSS file that is included in my HTML file. Matt's HTML file includes three CSS files that I did not bother to find the sizes of.

Now, let's look at the cost of letting Wordpress create your HTML code. The time required to upload the 15074 bytes of HTML code in my article over a 90 KB/s Internet connection would be 0.167 seconds. The time to upload Matt Mullenweg's HTML code over the same Internet connection would be 1.607 seconds. HTML code is usually compressed before it is sent to the reader, so the time difference would not be as large as that, but it would still be significant.

The average time to load just the compressed HTML portion of my article, the associated compressed CSS file, and a small favicon file according to https://ift.tt/1XDWZp3 with a slight adjustment to simulate a 90 KB/s connection is 0.496 seconds, and 0.076 seconds of that is due to uploading my HTML file that has been compressed to 6.8 KB. Assuming Matt's article is compressed by the same ratio, just the HTML code of his article would take about (0.496 - 0.076 + (6.8/15.1)*144.6/90) = 1.144 seconds to come up in the reader's browser. This means that without considering embedded pictures about 2.3 of my articles (1.144/0.496 = 2.31) can be uploaded to readers in the same amount of time as Matt's article. So, over an Internet connection that can upload 90 KBytes/s, readers could view the text-only portion of Matt's article about 76000 time per day and the text-only portion of my article about 174000 times per day. In other words, the text of my article can be viewed about 98000 times a day more than Matt's. That 98000 additional page views could be a significant portion of the yearly traffic for a small website. So, even with HTML code compression, it is still worthwhile to write your own HTML code.

Note that I have simplified my anlaysis by not taking into account the slight improvement that many visitors requesting pages at the same time could receive due to some amount of parallel delivery of web pages. However, the basic fact that my web page can be read by several tens of thousands more visitors per day than Matt's is still correct.

7. Use a Free CDN service to Temporarily Host Pictures in New Articles

For typical web pages, picture files often contain more data than HTML files, so it can be very beneficial in terms of bandwidth to host pictures somewhere other than on your webserver. One place you can move your pictures is to a CDN (Content Delivery Network). The HTML file of your web page will then have a link to the picture on the CDN, instead of a link to the picture on your webserver. For those of you who do not want to rely on a CDN to permanently host part of your content, there is another way. Most traffic spikes occur within the first week of posting a new article on social media. So, you will probably only need to host pictures in new articles on a CDN for the first week after publishing them. After that, you can put your pictures on your own webserver. For small websites, CDN's are likely to charge by the amount of content hosted, rather than by the number of downloads, so hosting only a few pictures at a time can mean you pay little or nothing for the use of the CDN service.

Summary

Here are the seven ways that I have suggested for capturing those peak traffic spikes that may occasionally occur on your webserver:

7 Tips for Hosting a Website on a Low-Bandwidth Internet Connection

  1. Make Sure You don't have any Page Loading Issues.
  2. Create Static or Mostly Static Web Pages.
  3. Reduce the Amount of Data on Each Web Page.
  4. Reduce the Number of Requests per Page.
  5. Compress Your Web Pages.
  6. Don't use Website Development Software to Create Your Web Pages.
  7. Use a Free CDN service to Temporarily Host Pictures in New Articles.

By using these methods, you may be able to increase the peak traffic that your webserver can handle on your low-bandwidth home Internet connection by a factor of four, five, or even more. This can allow you to continue hosting your website from home virtually for free, even when your website is getting occasional traffic spikes of well over a hundred thousand page views a day.

If you've found this article worthwhile, please share it on your favorite social media. You'll find links at the top of the page.

Related Articles:

Running a Small Website without Commercial Software or Hosting Services: Lessons Learned

How to have Your Own Website for $2 a Year

How to Host Your Own Decentralized Site for Free on ZeroNet

How to Stop Bad Robots from Accessing Your Lighttpd Webserver

A List of Free Online Tools for Web Developers

How to Create a Family Website



from Hacker News https://ift.tt/3cfVNSW

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.