In part one of this series I covered the original strategy, and in part two the RFP and site, and in part three the design.
In part four I will do a deep dive under the hood into our content management system (CMS) choices. This is the longest part, which reflects the importance of the decisions we made in this area to support the overall strategy.
The CMS
We chose MySource Matrix, provided by Squiz in Australia, as the foundation for the site. At the time this was chosen it was used by a couple of large radio networks in Australia, so it seemed like a good fit. In 2005 it was an ‘open core’ platform, with an open source version and a supported commercial version with extra features. We took up a support option to allow us to fund customisations and have those commercially supported.
One of these was the addition of an audio asset to the CMS, and software scripts to allow news and audio content (metadata only, the audio was stored elsewhere) to be imported.
The CMS was chosen because its design allowed us to add functionality to the site without requiring a developer. The system was made up of modules that could be ‘wired up’ to provide pages, lists, forms, embedded content, and so on. We certainly could not have grown as fast as we did without it.
We were very fortunate to have Colin Macdonald as our energy-drink powered sysadmin, and he applied many tweaks to keep the site stable as traffic quickly grew. He developed some very sophisticated cache management that allowed us to quickly publish, clear any cached content, and then refresh the cache again. Those in the know can pause and appreciate how hard that is, the rest of you can read on…
Later, when Squiz opened an office in New Zealand, we had Murray Fox as our locally-based sysadmin, also a genius, and like Colin, completely unflappable under pressure.
Around 2007 the single server was starting to struggle with load and we upgraded to multiple, and larger, servers. These in turn ran out of steam in late 2008, and by the end of 2009 we were struggling again. The suggested upgrade required over 100 cores of CPU, and this triggered a deeper philosophical review into our long-term needs.
In a nutshell, we had outgrown the software, both in terms of our publishing needs, but also from a performance perspective. We’d heard that the other media clients in Australia had dropped the platform for similar reasons, and we also made the decision to replace it.
Much work went into evaluating alternatives, including commercial and open source platforms. The commercial platforms—some used by other public broadcasters—were very expensive and all required customisation for each installation, essentially becoming bespoke in the process.
The open source platforms were capable of media publishing, but also required a lot of customisation that would have made upgrades complex, and ultimately we’d end up with a hard to maintain fork of the original project. I had seen this happen elsewhere, in one case they’d used Drupal as the base, adding bits and pieces on as needed, and in the end the codebase was crumbling under years of technical debt and was abandoned.
The lesson from this was that if you do build on an open source platform, it is vital to follow their conventions for adding features and customisations, otherwise it will become hard to get updates from the upstream source, and you will end up with an orphaned project that bears little resemblance to, and gets no benefit from, the original project. And in that case, you might as well have built something yourself from scratch and not have the overhead of what is inevitably a reworking of the existing code.
You also won’t be able to contribute back to the project, which surely is the whole point of open source.
An alternative to these were frameworks such as Django, and Ruby on Rails. We had considered Bricolage early on in the process, and while the approach of creating static pages based on known workflows was sound, the framework was not seeing much industry support. In the end we picked Rails because a number of local companies were able to provide services, and it easily solved our performance and content structuring problems, and had a solid future.
The other factor that tipped the scales in favour of Rails over (say) WordPress or Drupal, was that we only built what we needed, and we were in complete control, allowing for greater optimisation of workflows and tooling.
For example, in Rails we were able to have objects called ‘station’, ‘programme’, ‘episode’, ‘presenter’, and ‘recipe’, to name a few. These objects had associations with each other in the CMS, such as a station has many programmes, a programme has many episodes, and so on. This is much more powerful than the alternatives we looked at which are really just built on structured pages; there is no concept of a ‘programme’, and this is also something we struggled with in Matrix.
In June 2011 I wrote about some of the problems we had structuring content in Matrix.
The structure we created in Rails meant we never had to work around the generic constructs of an existing system. From a coding point of view, our approach was also much faster, both in terms of updating existing code and bug fixes, and building new features.
There was also a performance aspect to this. In Rails the query to fetch the audio items for an episode was extremely fast because both audio and episodes were native objects with a natural database association. In generic content platforms these objects and relationships don’t exist, sometimes making it impossible to get good performance.
I really cannot emphasise enough how important this is. An existing open source project might initially get you going quite quickly, but as many have found, when you depart from the ‘happy path’ things slow down very quickly.
There is an argument to say that you should do just what the platform provides, because all media businesses are basically the same. That is both true, and false. As I heard someone once say: generic business, generic platform, big savings. Many businesses are essentially the same—compare most news-only companies as a prime example, or companies that make biscuits (cookies in the US and Canada)—but if they want to differentiate, then a key component of this will be their choice and use of technology. (And many other things.)
And if all businesses were the same, then surely that could use Salesforce or SAP ‘out of the box’ too. 😉
If a business can differentiate themselves on a generic content platform, then that is, of course, a good strategy choice for them. I am going to give a shoutout here to The Kākā, an extremely innovative and successful venture started by Bernard Hickey. It runs on Substack, which is a publishing platform as a service, and The Kākā has differentiated itself by staying local and covering just politics, the economy, and the housing market. The content has a unique voice and is content with flair for thinking people (that’s my description). It currently has around 3,400 paid subscribers.
Likewise, if you can differentiate using Drupal or WordPress, and can continue to do so, that is a good strategy choice. You just have to go deep on the analysis to check all your assumptions first.
The corollary to that is that if you cannot differentiate on an existing platform, you should do something else.
In March 2010 work began on the new CMS. Because we had taken a modular approach to the technology stack, and to the way content was structured on the website, it was relatively easy to progressively replace Matrix. This modular approach, and the fact that Rails itself was modular, allowed us to continue to decouple audio hosting, and opened the way decouple image hosting should that later be needed.
The new CMS was called ELF. Dempsey Woodley, who was web producer at the time, came up with the name, which is an acronym for Eight Legged Freak. AKA a spider who lives (of course) on a web. Also ELFs are helpers, so a good double meaning.
An initial prototype of the recipes section was built by Nigel Ramsey and Marcus Baguley from AbleTech. The recipes section was chosen because it was independent of the rest of the site (it did not share any content), and we could therefore easily move it to ELF without any integration issues.
The great thing about building your own CMS is that you get exactly what you need, and no more, as you can see from this edit screen for the recipes section (at right).
The prototype was tested with siege (software for performance testing), and found to be able to serve about 100 times more traffic than our existing CMS, and without any optimisations such as caching. I cannot recall exactly when we put that into service, but it was probably around April 2010.
We set up an instance of nginx running as a proxy server in front of Matrix and ELF, and this routed requests for /recipes to the ELF, and everything else to Matrix. As sections or pages were replaced, the nginx config was updated to redirect the traffic to ELF.
Another advantage of nginx was that it could gzip compress pages passing through more efficiently (less server load) than the Apache software running Matrix. This was an unexpected benefit, and it bought us a little extra performance (and time) for free.
The recipes section had tagging, which was new for the site, and this was setup and managed by Helena Nimmo. Helena later started a number of recipe features, such as “what’s in season”, with a selection of recipes for the month, and tagging every single recipe with its ingredients, to help with search.
The most popular recipe when I left RNZ in 2016 was Alison Holst’s recipe for quince paste.
The National and Concert homepages came next, as these had mostly static content. They were followed by news, programme pages and the site homepage.
We used a small number of private RSS feeds to share content between the two CMSs during the migration process.
The whole process took three years, and while this seems like a long time, we had a limited budget, business-as-usual to cater for, and we were also making constant improvements to the site and the administration section of the ELF CMS.
The news section and site home page went live a week before 4 September 2010. This was the day of the first Christchurch earthquake, and the old CMS would certainly have collapsed under the load, which was about 20 times normal. ELF handled it without breaking a sweat.
The performance of ELF was continually tweaked, with various caching regimes being introduced as they were added to the Rails framework. These worked well, as cached content would automatically be updated when new content was published, ensuring that pages loaded fast.
One of the innovations in the new platform was the automatic creation of RSS and podcasts feeds. For each new news section or programme, we could switch on a feed as needed, and the CMS would show that on the relevant pages. A new series page, and podcast feed, could be created in under five minutes.
Another was the ability to import programme schedules from Word files. These files were compiled in a standard format to send out to The Listener. We leveraged off this format to reliably extract programme events, and to store and present them in a structured format. I wrote a parser in Ruby, and this was built into ELF.
The new CMS also allowed us to provide raw data of any schedule for reuse under a creative commons license. To see this in action, go to a schedule page like this one, and then append ‘.xml’ to the URL to get the raw data. (This works in August 2023.)
The schedule data drives all the “What’s On” sections around the website. The underlying code, which was built by Shevaun Cocker, had to allow for news bulletins at the top of the hour as well as other complex scenarios. The test suite for just that module covered dozens of permutations because obviously this is not something that we wanted to ever break!
It is worth mentioning that all the old content from Matrix had to be moved to ELF. This was a big challenge because of the way stories were stitched together and presented in Matrix. There was no way to extract the content based on URL, but because the IA was consistent across the whole site it was possible to design a web crawler to do this work.
I wrote custom software that visited each page after hours (to avoid load issues) and save the raw HTML to local files. It would then extract only the content portion of each, clean up any issues with the formatting automatically, and then add each item to the new CMS, associating it with the correct programme and date. In all, about 120,000 pages were migrated.
In a blogpost written at the time, I shared some of the code used to do the work.
ELF had a battery of technical SEO features built into it. These ensured that the site could be easily indexed by Google, but also that Google would come back frequently to index news content. As at 2016 every page on the site had embedded codes (from schema.org) to support search indexing and SEO, and this seems to largely be intact in August 2023.
These SEO features were added in 2014, and causes a large uptick in traffic coming to the site via search and Google news.
We also had a big focus on accessibility, so that the site worked well for screen reader technology, and for those with other disabilities. It is never possible to get it perfect, but our aim was to never give up working on this important area as time and resources allowed.
Performance was also always front of mind. When I left in 2016, it was the fastest loading media website in New Zealand by a considerable margin. I will cover some of this in a later post.
The RNZ search was, and probably still is, based on Apache SOLR. This was a technology that I was very familiar with, and the search engine I wrote could return a result within a second for any search, usually faster.
Updates to the ELF CMS could be deployed at any time, without any downtime, and during heavy periods of development this could be 10-15 times a day. The deployment system was set up to do this even if the site was under heavy load. I once deployed a bug fix on the night of a New Zealand general election, with the site under 10 times normal load.
There is one question I have not answered yet. How many full-time developers did we have working on the EFL CMS? The answer is none. We used AbleTech for the initial design, and to do large blocks of feature work three of four times a year, and I did the rest of the work. This arrangement worked well, as I could focus on small pieces of work that fitted between my other duties, and I could schedule them to do the larger blocks of work. We did hire a full-time junior developer in 2015.
Having said how great a bespoke solution was for us, I should note that I set the bar very high when considering whether to depart from off-the-shelf solutions. I have seen many times the downsides of the build-your-own approach, and the regret of off-the-shelf-customised-beyond-recognition solutions. Spending some time really calculating the total cost of ownership is just one tool to avoid bad decision-making in this area.
ELF-marks
One innovation we added to the CMS was the concept of ELF-marks. These were special codes which would be recognised by ELF as an instruction to do something special with the following line of text. This was used primarily by news editors to format text and reference content in the system without having to know any HTML. Also, the iNews system used by the news team had only limited HTML support. These are examples of some formatting codes:
[h] Heading
[b] Bold paragraph
[[text]] Italicize
An example of an instruction code is [s].
[s] Summary
In this case, the text after the [s] was extracted and used as the summary of the article, and the line was deleted from the story before it was added to the database. This was done to work around the lack of sufficient fields in iNews.
To add a half-sized image with its default caption:
[image:12345:half]
Which rendered like this:
There was also a code for audio content. This one-liner:
http://www.radionz.co.nz/national/programmes/morningreport/audio/201807917/grey-power-chapter-petitions-for-medical-marijuana “We want to grow it in our gardens” – Beverley Adridge on Morning Report
rendered like this:
There was a code for a pull-quote:
[pullquote] The whare is more to us than wood and nails. It’s an ancestral house. -Pare Sannyasi
That rendered like this:
The system was able to render external media such as Tweets, YouTube and Facebook video, and Instagram. For example this:
was converted to a YouTube player embedded in the page. These embeds are cached so that the text of embed can still be displayed if the remote service is down, a benefit in the early days of Twitter.
ELF-marks removed the need for editors to cut and paste embed codes from other sites, avoiding errors, and it gave us control over how and when any third party Javascript was loaded. For example, the site would only load the Facebook code when there was Facebook video on a page, something that was done to limit leakage of visitors’ personal data whenever possible.
Most CMSs allow raw HTML, images or links to be added manually into pages. This is error-prone and everyone likes to have a play to make it look how they think it should. ELF-marks eliminated these problems as all HTML code for ELF-marks was rendered consistently and without error across the whole site.
One by-product of our approach is that links based on ELF-marks for audio or related stories would never break because the system checked to see if that content was still available. If the original audio or news headline is updated, that change is also reflected everywhere, unless overriding text is provided, because the mark is a reference to the original content object.
Having control over the rendered HTML is a huge advantage: it was possible to completely restyle all content during a redesign without having to touch any markup. The screenshots I included above are from an old blog post of mine. If you go to the site today, you won’t see this old audio player on any page, because the rendering of this is independent, so they were all updated automatically when the design was changed. In August 2023 it looks like this:
ELF also has the ability to render a quotes inside an audio player differently, but no one uses it.
For editors using the CMS directly, they could still use the basic formatting in the WYSIWYG editor, as well as the ELF-marks to insert other content.
There is a lot of extra functionality built into ELF that has never been used, and plenty that was on the roadmap for development such as a fully integrated news editor, live blogging and features stories. Technical planning for these was partially complete when I left.
While we were building ELF I was blogging about it. RNZ were very generous in allowing me to do this, including posting code snippets under an MIT license, and screen shots of non-public parts of our systems.
If you are interested here are links to the 12 episodes of Rebuilding Radio NZ:
Part 4: Content Extraction & Recipes
Part 7: iPhone App Data (and an iPhone app)
Part 10: Going treeless and Modules
Site Speed
Once we had our own hosting and CMS, it was possible to start fine-tuning the performance of the site. It is well-known that faster loading sites are perceived as having higher credibility. Also, because this was a service funded by the public, I wanted it to work well with dial-up, so if we could make it fast on dial-up, it would even better on fixed connections.
This probably seems an odd goal given quickly declining dial-up numbers, even in 2008. Despite that, a lot of people had slow ADSL, and 3G is not dissimilar to dial-up in terms of performance. If the RNZ site worked well for dialup—notwithstanding that it should work well for everyone, being publicly funded—it might also be preferred over other sites.
There was some other secret sauce applied to get the site to be so fast.
The site used a combination of staggered caching, with triggered rebuilds in various places to ensure that frequently visited pages would be served faster. This is built-in to Ruby On Rails, and only needed to be planned and coded.
For example, if only one story on the home page was updated, then only that story would be fetched from the database when the page was requested, the rest of the content would come from faster memory cache.
The whole site also sat behind Varnish—an http cache designed to maximise performance. Varnish was set to cache the home page for only 10 seconds, and had a stale-while-revalidate time of five seconds. In Varnish this is called Grace Mode, and ensures that visitors never see slow page rebuilds, or any page rebuilds. Longer times were set on less visited pages, or pages that got less frequent updates.
Having a range of cache times across the site, combined with random visitation patterns, ensured that the site never ran into cold-cache-meltdown conditions. These meltdowns can occur when the cache time for all pages is set to the same value, and certain traffic conditions occur. These were well-understood at the time, so we tuned accordingly to avoid them.
The simple case of a radio host telling listeners to visit the site to see content related to the show was easily handled: the first visitor would cause the page to be built and served, the remainder of requests, sometimes in the tens of thousands, would be served out of Varnish. As long as requests continued, Varnish would make asynchronous requests back to ELF, based on the cache and grace mode settings, to refresh its copy of the page.
The difficult case of large amounts of traffic arriving more randomly, and then moving across pages in the site, was handled by differential cache times on different pages, and careful tuning of the caching in ELF.
This general approach meant that during the hours New Zealanders were awake, the home page and content for ‘today’ was always served from a fast cache, and was never more than (at worst) 15 seconds old.
We operated two Varnish caches, one in Auckland and one in Wellington. These caches were connected via dark fibre directly to the FX networks backbone. This backbone was, in turn, connected to every other major ISP in as many places as possible.
The settings on the Varnish cache in Wellington had headers that said it was powered-by “One small piece of fairy cake”, with the server-type set as “The Total Perspective Vortex”. The Auckland cache was powered-by “The Infinite Improbability Drive”.
We were solving a problem in a way that is now used by modern CDNs, the difference at the time being the diversity of the connections between RNZ’s varnish servers and all other NZ ISPs far exceeded that of any CDN at the time.
In 2005 all content delivery networks had only one node in New Zealand, in Auckland, and none of these were optimally connected for local speed across the whole country. This did save bandwidth for anyone using them, and avoided having to run the required infrastructure, but were typically less performant than the approach we took. Yes, our approach does not scale, but was good enough, at least until 2016/7.
In 2023 many CDNs now have more than one node in NZ—Fastly, for example, has three, one each in Auckland, Wellington, and Christchurch—which is an acknowledgement from the industry of the need for greater local diversity.
There are many optimisations that can be used with modern CDNs that can make a huge difference in performance, but many sites choose to just run the basic out-of-the-box settings.
Our approach meant that most people in New Zealand who requested a page would see at least the text and images in less than 700 milliseconds. Our target was to achieve a Speed Index (SI) of 1.5 seconds, which is considered very quick, and we were always ahead of most other media websites, both globally and locally.
To put that into context today in August 2023, RNZ’s SI 1.6 seconds. By way of comparison NZ Herald is 9.5 seconds and Stuff is 5.6 seconds, so RNZ still has a significant advantage. I have to acknowledge the impressive work done by the team at RNZ to maintain this edge.
The ELF Rails application was running under Phusion Passenger Pro edition using nginx, which is a robust and highly performant combination, that also allows you to do seamless updates to code on the live site.
The RNZ site was (and still is) so fast that it is quicker to browse the website via the iPhone’s web browser than to use the native app that was released in 2017!
Development Process
Some companies will have a seperate product strategy for their website, and may even have a seperate team working within the IT department. This is a mistake. A major choice we made was to have and integrated strategy and a cross-functional team working on CMS problems alongside the staff using the system.
My team would also talk to public users of the site on a weekly basis.
This ensured a closed-loop connection between those using the site, those creating the content, and those making the features to support the story-telling. In this model we were focussed on creating solutions to solve the problems of our colleagues (and the public), and this was done collaboratively.
Anyone could report a problem that needed solving, and we would work on finding a solution. One process I used a lot (and still use) was story-boarding a proposed solution on paper. This is a fast way to prototype, and has the advantage that you can quickly redraw any frame in the story and get immediate feedback, iterating until something workable is arrived at. The paper-based approach also has the benefit of being rough enough that it does not create expectations in the way that computerThe ELF CMS allowed us to create working prototypes too, for testing on real content.
The time-to-live for a small features was typically a day or two. Larger pieces of work might take a week, but generally things would be broken into small pieces and released incrementally.
In the next post I will cover choices we made for audio, programmes and schedules.
from Hacker News https://ift.tt/nXKaz9b
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.