Tuesday, April 5, 2022

Design of This Website

  • Gitit wiki: I preferred to edit files in Emacs⁠/​Bash rather than a GUI/​browser-based wiki.

    A Pandoc-based wiki using Darcs as a history mechanism, serving mostly as a demo; the requirement that ‘one page edit = one Darcs revision’ quickly became stifling, and I began editing my Markdown files directly and recording patches at the end of each day, and syncing the HTML cache with my host (at the time, a personal directory on code.haskell.org). Eventually I got tired of that and figured that since I wasn’t using the wiki, but only the static compiled pages, I might as well switch to Hakyll and a normal static website approach.

  • jQuery sausages: unhelpful UI visualization of section lengths.

    A UI experiment, ‘sausages’ add a second scroll bar where vertical lozenges correspond to each top-level section of the page; it indicates to the reader how long each section is and where they are. (They look like a long link of pale white sausages.) I thought it might assist the reader in positioning themselves, like the popular ‘floating highlighted Table of Contents’ UI element, but without text labels, the sausages were meaningless. After a jQuery upgrade broke it, I didn’t bother fixing it.

  • Beeline Reader: a ‘reading aid’ which just annoyed readers.

    BLR tries to aid reading by coloring the beginnings & endings of lines to indicate the continuation and make it easier for the reader’s eyes to saccade to the correct next line without distraction (apparently dyslexic readers in particular have issue correctly fixating on the continuation of a line). The A/​B test indicated no improvements in the time-on-page metric, and I received many complaints about it; I was not too happy with the browser performance or the appearance of it, either.

    I’m sympathetic to the goal and think syntax highlighting aids are underused, but BLR was a bit half-baked and not worth the cost compared more straightforward interventions like reducing paragraph lengths or more rigorous use of ‘structural reading’ formatting. (We may be able to do typography differently in the future with new technology, like VR/​AR headsets which come with eye tracking technology intended for foveated rendering—forget simple tricks like emphasizing the beginning of the next line as the reader reaches the end of the current line, do we need ‘lines’ at all if we can do things like just-in-time display the next piece of text in-place to create an ‘infinite line’?)

  • Google CSE: site search feature which too few people used.

    A ‘custom search engine’, a CSE is a souped-up site:gwern.net/ Google search query; I wrote one covering Gwern.net and some of my accounts on other websites, and added it to the sidebar. Checking the analytics, perhaps 1 in 227 page-views used the CSE, and a decent number of them used it only by accident (eg. searching “e”); an A/​B testing for a feature used so little would be powerless, and so I removed it rather than try to formally test it.

  • Tufte-CSS Sidenotes: fundamentally broken, and superseded.

    An early admirer of Tufte-CSS for its sidenotes, I gave a Pandoc plugin a try only to discover a terrible drawback: the CSS didn’t support block elements & so the plugin simply deleted them. This bug apparently can be fixed, but the density of footnotes led to using sidenotes.js instead.

  • DjVu document format use: DjVu is a space-efficient document format with the fatal drawback that Google ignores it, and “if it’s not in Google, it doesn’t exist.”

    DjVu is a document format superior to PDFs, especially standard PDFs: I discovered that space savings of 5× or more were entirely possible, so I used it for most of my book scans. It worked fine in my document viewers, Internet Archive & Libgen preferred them, and so why not? Until one day I wondered if anyone was linking them and tried searching in Google Scholar for some. Not a single hit! (As it happens, GS seems to specifically filter out books.) Perplexed, I tried Google—also nothing. Huh‽ My scans have been visible for years, DjVu dates to the 1990s and was widely used (if not remotely as popular as PDF), and G/​GS picks up all my PDFs which are hosted identically. What about filetype:djvu? I discovered to my horror that on the entire Internet, Google indexed about 50 DjVu files. Total. While apparently at one time Google did index DjVu files, that time must be long past.

    Loathe to take the space hit, which would noticeably increase my Amazon AWS S3 hosting costs, I looked into PDFs more carefully. I discovered PDF technology had advanced considerably over the default PDFs that gscan2pdf generates, and with JBIG2 compression, they were closer to DjVu in size; I could conveniently generate such PDFs using ocrmypdf⁠. This let me convert over at moderate cost and now my documents do show up in Google.

  • Darcs/​Github repo: no useful contributions or patches submitted, added considerable process overhead, and I accidentally broke the repo by checking in too-large PDFs from a failed post-DjVu optimization pass (I misread the result as being smaller, when it was much larger).

  • Spaces in URLs: an OK idea but users are why we can’t have nice things.

    Gitit assumed ‘titles = filenames = URLs, which simplified things and I liked spaced-separated filenames; I carried this over to Hakyll, but gradually, by monitoring analytics realized this was a terrible mistake—as straightforward as URL-encoding spaces as %20 may seem to be, no one can do it properly. I didn’t want to fix it because by the time I realized how bad the problem was, it would have required breaking, or later on, redirecting, hundreds of URLs and updating all my pages. The final straw was when The Browser linked a page incorrectly, sending ~1500 people to the 404 page. I finally gave in and replaced spaces with hyphens. (Underscores are the other main option but because of Markdown, I worry that trades one error for another.) I suspect I should have also lower-cased all links while I was at it, but thus far it has not proven too hard to fix case errors & lower-case URLs are ugly.

    In retrospect, Sam Hughes was right: I should have made URLs as simple as possible (and then a bit simpler): a single word, lowercase alphanum, with no hyphens or underscores or spaces or punctuation of any sort. I am, however, locked in to longer hyphen-separated mixed-case URLs now.

  • AdSense banner ads (and ads in general): reader-hostile and probably a net financial loss.

    I hated running banner ads, but before my Patreon began working, it seemed the lesser of two evils. As my finances became less parlous, I became curious as to how much lesser—but I could find no Internet research whatsoever measuring something as basic as the traffic loss due to advertising! So I decided to run an A /​ ​ ​B test myself⁠, with a proper sample size and cost-benefit analysis; the harm turned out to be so large that the analysis was unnecessary, and I removed AdSense permanently the first time I saw the results. Given the measured traffic reduction, I was probably losing several times more in potential donations than I ever earned from the ads. (Amazon affiliate links appear to not trigger this reaction, and so I’ve left them alone.)

  • Bitcoin/​PayPal⁠/​Gittip/​Flattr donation links: never worked well compared to Patreon.

    These methods were either single-shot or never hit a critical mass. One-off donations failed because people wouldn’t make a habit if it was manual, and it was too inconvenient. Gittip/​Flattr were similar to Patreon in bundling donators, and making it a regular thing, but never hit an adequate scale.

  • Google Fonts web fonts: slow and buggy.

    Google Fonts turned out to introduce noticeable latency in page rendering; further, its selection of fonts is limited, and the fonts outdated or incomplete. We got both faster and nicer-looking pages by taking the master Github versions of Adobe Source Serif/​Sans Pro (the Google Fonts version was both outdated & incomplete then) and subsetting them for Gwern.net specifically.

  • MathJax JS: switched to static rendering during compilation for speed.

    For math rendering, MathJax and KaTeX are reasonable options (inasmuch as MathML browser adoption is dead in the water). MathJax rendering is extremely slow on some pages: up to 6 seconds to load and render all the math. Not a great reading experience. When I learned that it was possible to preprocess MathJax-using pages, I dropped MathJax JS use the same day.

  • <q> quote tags for English syntax highlighting: divisive and a maintenance burden.

    I like the idea of treating English as a little (not a lot!) more like a formal language, such as a programming language, as it comes with benefits like syntax highlighting. In a program, the reader gets guidance from syntax highlighting indicating logical nesting and structure of the ‘argument’; in a natural language document, it’s one damn letter after another, spiced up with the occasional punctuation mark or indentation. (If Lisp looks like “oatmeal with fingernail clippings mixed in” due to the lack of “syntactic sugar”, then English must be plain oatmeal!) One of the most basic kinds of syntax highlighting is simply highlighting strings and other literals vs code: I learned early on that syntax highlighting was worth it just to make sure you hadn’t forgotten a quote or parenthesis somewhere! The same is true of regular writing: if you are extensively quoting or naming things, the reader can get a bit lost in the thickets of curly quotes and be unsure who said what.

    I discovered an obscure HTML tag enabled by an obscurer Pandoc setting: the quote tag <q>, which replaces quote characters and is rendered by the browser as quotes (usually). Quote tags are parsed explicitly, rather than just being opaque natural language text blobs, and so they, at least, can be manipulated easily by JS/​CSS and syntax-highlighted. Anything inside a pair of quotes would be tinted a gray to visually set it off similarly to the blockquotes. I was proud of this tweak, which I’ve seen nowhere else.

    The problems with it was that not everyone was a fan (to say the least); it was not always correct (there are many double-quotes which are not literal quotes of anything, like rhetorical questions); and it interacted badly with everything else. There were puzzling drawbacks: eg. web browsers delete them from copy-paste, so we had to use JS to convert them to normal quotes. Even when it was worked out, all the HTML/​CSS/​JS had to be constantly rejiggered to deal with interactions with them, browser updates would silently break what was working, and Said Achmiz hated the look. I tried manually annotating quotes to ensure they were all correct and not used in dangerous ways, but even with interactive regexp search-and-replace to assist, the manual toil of constantly marking up quotes was a major obstacle to writing. So I gave in. It was not meant to be.

  • Typographic rubrication: a solution in search of a problem.

    Red emphasis is a visual strategy that works wonderfully well for many styles, but not Gwern.net that I could find. Using it on the regular website resulted in too much emphasis and the lack of color anywhere else made the design inconsistent; we tried using it in dark mode to add some color & preserve night vision by making headers/​links/​drop-caps red, but it looked like, as one reader put it, “a vampire fansite”. It is a good idea, but we just haven’t found a use for it. (Perhaps if I ever make another website, it will be designed around rubrication.)

  • wikipedia-popups.js: a JS library written to imitate Wikipedia popups, which used the WP API to fetch article summaries; obsoleted by the faster & more general local static link annotations.

    I disliked the delay and as I thought about it, it occurred to me that it would be nice to have popups for other websites, like Arxiv/​BioRxiv links—but they didn’t have APIs which could be queried. If I fixed the first problem by fetching WP article summaries while compiling articles and inlining them into the page, then there was no reason to include summaries for only Wikipedia links, I could get summaries from any tool or service or API, and I could of course write my own! But that required an almost complete rewrite to turn it into popups.js.

  • Link screenshot previews: automatic screenshots too low-quality, and unpopular.

    To compensate for the lack of summaries for almost all links (even after I wrote the code to scrape various sites), I tried a feature I had seen elsewhere of ‘link previews’: small thumbnail sized screenshots of a web page or PDF, loading using JS when the mouse hovered over a link. (They were much too large, ~50kb, to inline statically like the link annotations.) They gave some indication of what the target content was, and could be generated automatically using a headless browser. I used Chromium’s built-in screenshot mode for web pages, and took the first page of PDFs.

    The PDFs worked fine, but the webpages often broke: thanks to ads, newsletters, and the GDPR, countless webpages will pop up some sort of giant modal blocking any view of the page content, defeating the point. (I have extensions installed like AlwaysKillSticky to block that sort of spam, but Chrome screenshot cannot use any extensions or customized settings, and the Chrome devs refuse to improve it.) Even when it did work and produced a reasonable screenshot, many readers disliked it anyway and complained. I wasn’t too happy either about having 10,000 tiny PNGs hanging around. So as I expanded link annotations steadily, I finally pulled the plug on the link previews. Too much for too little.

    • Link Archiving: my link archiving improved on the link screenshots in several ways. First, SingleFile saves pages inside a normal Chromium browsing instance, which does support extensions and user settings. Killing stickies alone eliminates half the bad archives, ad block extensions eliminate a chunk more, and NoScript blacklists specific domains. (I initially used NoScript on a whitelist basis, but disabling JS breaks too many websites these days.) Finally, I decided to manually review every snapshot before it went live to catch bad examples and either fix them by hand or add them to the blacklist.
  • Auto-dark mode: a good idea but “users are why we can’t have nice things”.

    OSes/​browsers have defined a ‘global dark mode’ toggle the user can set if they want dark mode everywhere, and this is available to a web page; if you are implementing a dark mode for your website, it then seems natural to just make it a feature and turn on iff the toggle is on. There is no need for complicated UI-cluttering widgets with complicated implementations. And yet—if you do do that, users will regularly complain about the website acting bizarre or being dark in the daytime, having apparently forgotten that they enabled it (or never understood what that setting meant).

    A widget is necessary to give readers control, although even there it can be screwed up: many websites settle for a simple negation switch of the global toggle, but if you do that, someone who sets dark mode at day will be exposed to blinding white at night… Our widget works better than that. Mostly.

  • Multi-column footnotes: mysteriously buggy and yielding overlaps.

    Since most footnotes are short, and no one reads the endnote section, I thought rendering them as two columns, as many papers do, would be more space-efficient and tidy. It was a good idea, but it didn’t work.

  • Hyphenopoly: it turned out to be more efficient (and not much harder to implement) to hyphenate the HTML during compilation than to run JS clientside.

    To work around Google Chrome’s 2-decade-long refusal to ship hyphenation dictionaries on desktop and enable justified text (and incidentally use the better TeX hyphenation algorithm), the JS library Hyphenopoly will download the TeX English dictionary and typeset a webpage itself. While the performance cost was surprisingly minimal, it was there, and it caused problems with obscurer browsers like Internet Explorer.

    So we scrapped Hyphenopoly, and I later implemented a Hakyll function using a Haskell version of the TeX hyphenation algorithm & dictionary to insert at compile-time a ‘soft hyphen’ everywhere a browser could usefully break a word, which enables Chrome to hyphenate correctly, at the moderate cost of inlining them and a few edge cases.

    Desktop Chrome finally shipped hyphen support in early 2020, and I removed the soft-hyphen hyphenation pass in April 2021 when CanIUse indicated >96% global support.

    • Knuth-Plass Linebreaking: not to be confused with Knuth-Liang hyphenation discussed before, which simply optimizes the set of legal hyphens, Knuth-Plass linebreaking tries to optimize the actual chosen linebreaks.

      Particularly on narrow screens⁠, justified text does not fit well, and must be distorted to fit, by microtypographic techniques like inserting spaces between/​​within words or changing glyph size. The default linebreaking that web browsers use is a bad one: it is a greedy algorithm, which produces many unnecessary poor layouts, causing many stretched out words and blatant rivers⁠. This bad layout gets worse the narrower the text, and so on Gwern.net lists on mobile, there are a lot of bad-looking list items when fully-justified with greedy layout.

      Knuth-Plass instead looks at paragraphs as a whole, and calculates every possible layout to pick the best one. As can be seen in any TeX output, the results are much better. Knuth-Plass (or its competitors) would solve the justified mobile layout problem.

      Unfortunately, no browser implements any such algorithm (aside from a brief period where Internet Explorer, of all browsers, apparently did); there is a property in CSS4⁠, text-wrap: pretty, which might someday be implemented somehow by some browsers and be Knuth-Plass, but no one has any idea when or how. And doing it in JavaScript is not an option, because the available JS prototypes fail on Gwern.net pages. (Bramstein’s typeset explicitly excludes lists and blockquotes, Bramstein commenting in 2014 that “This is mostly a tech-demo, not something that should be used in production. I’m still hopeful browser will implement this functionality natively at some point.” Knight’s tex-linebreak suffers from fatal bugs too. Matthew Petroff has a demo which uses the brilliantly stupid brute-force approach of precalculating offline the Knuth-Plass linebreaks for every possible width—after all, monitor widths can only range ~1–4000px, so it’s a small finite number of possibilities—but it’s unclear, to say the least, how I’d ever use such a thing for Gwern.net, and doubtless has serious bugs of its own.) There are also questions about whether the performance on long pages would be acceptable, as the JS libraries rely on inserting & manipulating a lot of elements in order to force the browser to break where it should break, but those are largely moot when the prototypes are so radically incomplete. (My prediction is that the cost would be acceptable with some optimizations and constraints like considering a maximum of n lines.)

      So the linebreaking situation is insoluble for the foreseeable future. We decided to disable full justification on narrow screens, and settle for ragged-right.

  • Autopager keyboard shortcuts: binding Home/​PgUp & End/​PgDwn keyboard shorcuts to go to the ‘previous’/​‘next’ logical page turned out to be glitchy & confusing.

    HTML supports previous/​next attributes on links which specify what URL is the logical next or previous URL, which makes sense in many contexts like manuals or webcomics/​webserials or series of essays; browsers make little use of this metadata—typically not even to preload the next page! (Opera apparently was one of the few exceptions.)

    Such metadata was typically available in older hypertext systems by default, and so older more reader-oriented interfaces like pre-Web hypertext readers such info browsers frequently overloaded the standard page-up/​down keybindings to, if one was already at the beginning/​ending of a hypertext node, go to the logical previous/​next node. This was convenient, since it made paging through a long series of info nodes fast, almost as if the entire info manual were a single long page, and it was easy to discover: most users will accidentally tap them twice at some point, either reflexively or by not realizing they were already at the top/​bottom (as is the case on most info nodes due to egregious shortness). In comparison, navigating the HTML version of an info manual is frustrating: not only do you have to use the mouse to page through potentially dozens of 1-paragraph pages, each page takes noticeable time to load (because of failure to exploit preloading) whereas a local info browser is instantaneous.

    After defining a global sequence for Gwern.net pages, and adding a ‘navbar’ to the bottom of each page with previous/​next HTML links encoding that sequence, I thought it’d be nice to support continuous scrolling through Gwern.net, and wrote some JS to detect whether at the top/​bottom of page, and on each Home/​PgUp/​End/​PgDwn, whether that key had been pressed in the previous 0.5s, and if so, proceed to the previous/​next page.

    This worked, but proved buggy and opaque in practice, and tripped up even me occasionally. Since so few people know about that pre-WWW hypertext UI pattern (as useful as it is), would be unlikely to discover it, or use it much if they did discover it, I removed it.



from Hacker News https://ift.tt/Ypxduok

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.