Sunday, March 28, 2021

Kanji Club: Search Kanji by Parts with Instant Feedback

Last month I released Kanji Club, a kanji search site with a unique interface. It scratches an itch I'd had for a long time and the response on Twitter was encouraging.

Every so often I'll come across a kanji where I don't know how to read the whole thing, but I can read individual parts of it. While radical search sites are pretty common, it's often easier to punch the parts into a generic search engine to find the answer than to think about what the radicals that make up a kanji are. Still, that process is somewhat hit and miss; Kanji Club takes the uncertainty out of the process and gives me the answer I'm looking for directly.

If you speak Japanese I encourage you to check out the site, any feedback would be appreciated. In the rest of this article I'm going to go over some of the technical details and design decisions.

Photo by Yifeng Lu on Unsplash

Parts vs. Radicals

When describing Kanji Club I'm careful to use the term "parts" or 部品 rather than "radicals". Radicals, or 部首, are somewhat arbitrarily defined components of kanji with a long history. While there are exceptions, one key property of radicals is that they can't be further broken down - so 木 is a radical, but 夏 isn't, because technically it's made up of 一夂目自 (according to Kanjidic2, see jisho.org). Determining which radicals make up a kanji isn't usually very hard, but you do have to think about it a bit.

In contrast it's often obvious that a kanji is a juxtaposition of two other kanji. For example 榎 (enoki, a kind of tree), is obviously a combination of 木 and 夏, but most dictionaries don't allow you to search for 榎 by entering 木 and 夏. Luckily there are a few databases that collect this kind of compositional information that are freely available; for Kanji Club I use data from Wikimedia Commons.

Another advantage of searching by parts rather than radicals is that it's easy to type normal characters, but harder to type radicals. Radical search is usually implemented by having you select radicals from a menu. Even if it's possible to type radicals, it's necessary to remember their often arbitrary names and dig through possible conversions for them. This means that if you search by characters instead of radicals you can just use the keyboard for input, which I find to be much faster and more convenient.

Technical Details

Kanji Club is a static site. The search is powered by a single JSON file, roughly 2MB in size, and all search happens in the browser. The pages for the individual kanji are rendered ahead of time from a JSON file with slightly more information than the one used for search.

This design means I don't have to worry about the server going down, and even when traffic peaked over the first few days after release there were no ill effects of any kind. The only issue I had was a misunderstanding over how gzip settings work in nginx - fixing them so the search JSON was compressed drastically improved load times and reduced bandwidth usage.

Kanji Club is hosted on a $10/month VPS with some other services I host for clients, though you couldn't tell since none of them have any performance effects on each other.

The one place the system isn't efficient is the build process. My main focus was adjusting details to get the site working, so I haven't worked adequately on setting up partial builds for when I need to tweak only a subset of the data. Even so a build only takes three to six minutes so it's not so bad.

The one ongoing technical issue I have is that Google refuses to read the sitemap. I think there are characters in the list of kanji I'm working with that Google considers invalid, but I can't get a detailed error report so I need to spend more time figuring out what's up there.

Future and Improvements

Kanji Club satisfies my initial goal of making the fastest, most natural kanji search site out there, but there's still a lot of improvements I'd like to make. I'm busy with other projects now so I don't have too much time to spend on it, but since it's not hard to keep it running I should be able to tinker with it now and again as time allows.

If you have any thoughts or suggestions, feel free to send me a mail - I'd love to hear from you. Ψ



from Hacker News https://ift.tt/3w5kwDI

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.