As most reasonable people are familiar with the Harry Potter books, their content serves as ideal material for building a mnemonic system. The mnemonic major system, in particular, is used to memorize number sequences.
In order to implement the steps outlined in this post you need the content of the Harry Potter books (or other book(s) if you prefer). In a previous post I showed you how to download fantasy books and extract their text. Among the downloaded data were the Harry Potter books which I will use in this post.
Step 1: Learn the sound-number mapping
In the mnemonic major system each number from 0 to 9 is associated with one or more consonant sounds. Use the following table as a reference.
Number
Sounds (IPA)
Letters with example words
0
s, z
s (see), c (city), z (zero), x (xylophone)
1
t, d, ð
t (tee), d (dad), th (though)
2
n, ŋ
n (nail)
3
m
m (monster)
4
r
r (right), l (colonel)
5
l
l (left)
6
ʤ, ʧ, ʃ, ʒ
ch (cheese), j (juice), g (ginger), sh (shell), c (cello, special),
cz (czech), s (tissue, vision), sc (fascist), sch (eschew),
t (ration), tsch (putsch), z (seizure)
7
k, ɡ
k (kid), c (cake), q (quarter), g (good), ch (loch)
8
f, v, θ
f (face), ph (phone), v (alive), gh (laugh), th (think)
9
p, b
p (power), b (baby)
In English, letters are pronounced in different ways depending on the context, that’s why some letters are repeated in different rows. But in the end, only the sound matters, not the spelling.
Here are a few examples for words, their IPA representation and the number they encode.
Word
IPA
Number
action
ækʃən
762
muddy
mədi
31
midday
mɪddeɪ
311
accept
æksɛpt
7091
fax
fæks
870
exam
ɪgzæm
703
anxious
æŋkʃəs
2760
luxury
ləgʒəri
5764
pizza
pitsə
910
ghost
goʊst
701
enough
inəf
28
fear
fɪr
84
To familiarize yourself with the IPA notation, try to read the following excerpt.
Consider the same lines converted to number sequences.
I’m going to show you how to write a training program to internalize the concept of mapping sounds to numbers in Step 4. But first, you need the ability to convert text automatically to numbers. This happens in 2 steps. First, the text is converted to IPA. Then, the IPA is converted to numbers.
Step 2: Converting IPA to numbers
The process of converting IPA to numbers is very simple. I iterate through the IPA chars and if there is a number associated with the char I append the number to the result.
For example, major_decode_from_ipa('dɪfəkəlt') yields [1, 8, 7, 5, 1].
Additionally, I define a couple functions for converting number sequences to and from strings.
For example, numseq_to_str([1, 8, 7, 5, 1]) yields '18751'.
Step 3: Converting text to IPA
In order to automatically convert text to IPA (and then to numbers) you need to use an IPA dictionary.
Python’s eng-to-ipa package is able to convert text to IPA using the Carnegie-Mellon University Pronouncing Dictionary.
This yields:
According to the docs, eng-to-ipa will reprint words that cannot be found in the CMU dictionary with an asterisk. Thus, “i’ll and sandwiches.” have not been found. Clearly, the punctuation is the issue. I preprocess the text in order to ease the conversion to IPA.
And…
As you can see, there are 3 ways to pronounce this sentence depending on whether you like to pronounce sandwich with m, n or nd. In order to use the major system effectively, you should use the version that sounds most natural to you.
Let’s look at another example.
The word gryffindor was not found in the CMU dictionary, which can be expected. After a quick search for the word’s pronunciation I found YouGlish which uses YouTube videos to find IPAs. While their API is not free, a limited number of IPA’s can be scraped for our purpose.
As you can see, this function is semi-interactive. Without user intervention it will get stuck on a CAPTCHA. Even then, you’ll eventually reach their daily usage limit and won’t be able to continue. For our purpose this shall be good enough though.
I have shown that for many words there are several possible pronunciations, from which you need to choose your preferred one, and that some words are not in the CMU dictionary and require scraping the IPA from another source or are not available at all. For these 2 reasons, you will need to build your own personal IPA dictionary.
I’m going to build my IPA dictionary by iterating through the words of the Harry Potter books and adding each word and the corresponding IPA to my dictionary.
First, I define some functions for managing my dictionary (a simple JSON file in this case).
Next, I iterate through the words of the books and enter each word and whatever is returned by eng-to-ipa into my dictionary.
Each value in the dictionary is now a list of possible IPAs as that is what eng-to-ipa returned. Before the dictionary can be used, we need to ensure that each word has exactly one IPA. If the different IPAs for a word decode to different numbers we need to ask the user for their preferred pronunciation. Otherwise, the choice is of no consequence and we simply choose the first one.
Example:
Next, I search the IPA dict for values followed by an asterisk (which as you’ve seen earlier, indicates that no IPA was found in the CMU dictionary) and attempt to scrape it from YouGlish.
And that’s all, my IPA dictionary is ready! I define a few functions for converting text to IPA conveniently.
For example, text_to_ipa(load_ipa_dict(), 'Well done, Harry!') yields wɛl dən hɛri.
Using the major_decode_from_ipa function from the previous step I can now convert text to numbers.
To make it even more convenient, I define a function major_decode_from_text.
This function has the added benefit that it can group the result by the words in the source sentence which is useful when you’re printing the number sequence for a longer text.
Step 4: Practice decoding words
In order to use the major system effectively, you need to be able to quickly convert text to numbers in your mind.
Now that you can convert text to numbers using the computer, it’s easy to write a simple training program for practicing doing the same thing in your mind.
Example:
Continue practicing until the text-to-number conversion becomes second nature to you.
Step 5: Practice encoding numbers
When you want to memorize a number sequence using the major system, you need to find an appropriate encoding for it. The following code helps you practice this concept.
Example:
Finding good (memorable) encodings for a given number sequence is a much more creative (and laborious) process than decoding text. Let’s see if we can use the content of the Harry Potter books to encode number sequences.
Step 6: Find encodings automatically
Given a number sequence I’m going to search the Harry Potter books for suitable encodings. This processes involved can be slow, thus precomputing all possible encodings and saving it to index files makes sense.
Here are a few convenience functions that allow me to create index files from lists of strings and query them for number sequences.
Next, I’m going to create several indexes for different types of text chunks extracted from the Harry Potter books.
Words
Well duh! All words than can be encoded are already in our IPA dictionary so finding words for number sequences is straightforward.
To use this index, I defined the following function.
Example:
Nouns
Nouns are generally easier to imagine (and thus memorize) than other types of words. For this reason, it makes sense to look for nouns specifically when trying to find encodings for a number sequence.
I’m going Python’s NLTK package to process the Harry Potter text and identify nouns. The process required for this is called part-of-speech tagging (POS tagging).
Here is an example to give you an idea of what the POS tagger does.
Now, I’m not a linguist but unknown seems like a noun to me in this case, yet it is marked as an adjective (JJ). Anyway, NLTK’s POS tagger does a good job identifying nouns generally. You can see the tag for nouns starts with NN.
Note: If you remove the call to preprocess, the tag for Dumbledore will be NNP (proper noun), not NN. That is because preprocess lowercases the entire text and NLTK uses casing to determine the correct tag.
Thus, the following code will extract all nouns from a piece of text.
Then I can build my noun index like so.
Noun phrases
Noun phrases are nouns with modifiers, e.g. ridiculous muggle protection act, restricted section, giant gryffindor hourglass, slow-acting venoms, mrs weasley, ….
Noun phrases are useful for my purpose because they are as easy to remember as individual nouns (if not easier due to being more concrete) and have the potential to encode longer number sequences.
The technique I’m going to use to find noun phrases is called Chunking. Chunking is the segmentation and labelling of multi-token sequences. In other words it takes tokens (e.g. a tokenized sentence) as input and produces non-overlapping subsets of those tokens.
The way a list of tokens is chunked is defined by a chunk grammar. Here’s the grammar I’m going to use to find noun phrases.
The first rule in this grammar says that an NBAR chunk should be formed whenever the chunker finds zero or more adjectives (of all types, that’s why it’s JJ.* not just JJ) followed by one or more nouns (of all types).
The second rule says that an NP chunk should be formed whenever the chunker finds an NBAR chunk and whenever it finds two NBAR chunks connected by a preposition.
Examples for phrases that would be caught by the second rule but not by the first are head protruding over ron, death eater in disguise, ministry of magic, sound of laughter, ray of purest sunlight, cup of strong tea.
The following is the code I use for extracting noun phrases from any text.
The first function is just a helper function that preprocesses, tokenizes and POS tags the text, then feeds it to the chunker, iterates over the produced subsets and reassembles the chunks I’m interested in into text.
Simple nouns would also be considered noun phrases by the grammar I used, thus I only consider the subsets consisting of more than one token.
And I build the index for noun phrases.
Clauses
A clause is a group of words containing a subject and a verb and functions as a member of a sentence.
I used the same technique as for noun phrases but with a different grammar. I took the grammar I used from the NLTK docs.
A few example clauses:
And I build the index.
Sentences
Sentences are easily extracted by a simple call to nltk.tokenize.sent_tokenize.
Double quotation marks need to be removed otherwise they affect sentence tokenization. Consider the following example.
This would be sent-tokenized like so:
The quotation mark following the exclamation mark in the third sentence prevents the sent tokenizer from breaking it up into two sentences.
With double quotation marks removed this problem is solved.
Now that we have all the indexes we can define the remaining find functions.
Let’s do a quick integrity check and search for a number sequence mentioned earlier in this post.
All good.
Measuring the coverage of the indexes
Try finding encodings for a random number sequence and you’ll quickly realize that there won’t be any results for many cases.
In order to determine how useful our indexes are we need to measure the probability of at least one encoding being found for a random number sequence of a certain length.
And the result is:
Not very impressive! You are very unlikely to find a match even for something as simple as a phone number. To improve the numbers one might use more books (and thus content) and repeat the previous steps. One could also use more complicated NLP techniques to recombine text chunks into new phrases and sentences.
I will however show you a technique to encode number sequences of any lengths that don’t rely purely on a high coverage of encodings for number sequences.
Step 7: Find noun sequences
In order to encode number sequences of an arbitrary length you need to combine several encodings and link them together. For example, to encode 2184775142 you could use the words wand frog cauldron. Then, create a story with those words to link them. For example, to memorize the word sequence wand frog cauldron you could imagine using a wand to conjure a frog inside a cauldron.
I’m using nouns since they are especially well suited for building stories.
Here’s a function to find noun sequences automatically from our indexes. The function is interactive, it asks you to choose a word from a list of possible nouns repeatedly until the entire number sequence is encoded.
Step 8: Build your personal word list
You don’t always have your computer at hand to find good encodings for you. Plus, picturing new encodings every time you need to memorize a number requires a higher mental effort and is time-consuming.
It’s better (and faster) to reuse the same encodings over and over again. In order to memorize number sequences efficiently, you should have a list of encodings for all numbers from 00 to 99.
For this purpose, I created a few functions that will help you build your personal word list and save it to a JSON file.
Step 9: Memorize your word list with the help of the Leitner system
Memorizing a hundred number-word pairs does require a significant effort but with the help of the Leitner system nothing shall stop you.
The Leitner system is an implementation of the good old principle of spaced repetition. Spaced repetition is a strategy for learning facts. The main idea is that you review facts less frequently the more often you remembered them correctly.
You may use real flashcards of course but here are a few functions that implement a simple Leitner system.
Run interactive_leitner() to start training. It will show you the number of facts in each box and ask you which box you would like to review.
In this case, I used box 0 as a staging box for new cards. This way you can add hundreds of facts into the system at once without overwhelming your learning capacity.
When reviewing box 0, if you answer correctly the fact jumps to box 2, otherwise to box 1. If you think you introduced enough new facts (I suggest 5-10), press Ctrl-C to stop.
When reviewing the other boxes, if you answer correctly the fact jumps to the next-higher box, otherwise back to 1. Finish reviewing the box or press Ctrl-C to stop.
Once a fact reaches the box after max_box, so box 6 in this case, it is considered successfully learned and does not appear to be reviewed again.
Review the boxes with decreasing frequency e.g. box 1 daily, box 2 every other day, box 3 once a week and so on.
Adding the word list created in the previous step to our Leitner system is straightforward.
Continue practicing until all facts are beyond box 5.
Step 10: Practice memorizing number sequences
You are now fully ready to use the major system to memorize number sequences. Consider the following training program. It shows you a number to memorize, then distracts you with a mental math exercise before it asks you to enter the memorized number.
I used curses for some advanced user interaction in the terminal.
Step 11: Apply the major system to real life situations
While the mnemonic major system seems tedious at first, with enough practice it becomes an incredibly efficient technique for memorizing long (or short) number sequences quickly.
In everyday life, there are many opportunities for memorizing number sequences. For example, the next time you’re eating out memorize the prices for each dish you order and impress everyone at the end when it’s time to split the check and no-one knows what they owe.
Add numbers you want to memorize long-term to the Leitner system created in Step 9. For example, to add important emergency phone numbers in case you loose your phone:
Conclusion
In this post, I introduced the mnemonic major system, showed you how to manually and automatically decode text to number sequences, find encodings in the content of your favorite fantasy books, build your personal wordlist and use it to memorize number sequences of arbitrary lengths and how to write your own code to train all these concepts.
I hope you found the information in this post useful. In future posts, I will explore other ways to apply data science to your favorite fantasy literature and maybe have a look at other memory techniques as well.
Share your feedback in the comments and most importantly, start memorizing!
from Hacker News https://ift.tt/39RjccZ
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.