Number Words
Number Words will build numeric expressions for natural numbers, percentages and fractions. For example:
0.231
will be converted toless than a quarter
,102
toover one hunderd
.
Supports multiple languages.
The implementation is based on ideas expressed in Generating Numerical Approximations.
Numerical Approximations
Numerical approximations are all over texts based on the data:
- Water temperature is bellow 10C (input data would be 9.53C) - A third of students failed the exam (34.3%) - Q2 sales were around 1M$ (1,002,184 $)
Numeric data providing information about some metrics of interest is often a number with the precision we do not need. If we see 9.382%, it is likely that the information we need is - almost 10% - instead of the precise number. Furthermore, different approximation strategies are often used in the report involving the same metrics. At the beginning of the report we might say almost 10% or "below 10%" while later in the text, we might choose a more precise expression - around 9.4%.
Number Words will help you build such numerical approximations making them available for the text generation systems.
Features
Number Words uses the following abstractions:
- Actual Value is a number which needs to be approximated - an input to the approximation function. In the examples above it is the temperature -
9.53C
, or the percentage34.3%
. - Scale of approximation. It is a snapping grid across the range of numbers along which the approximation is done. The scale to use is determined by the domain. For example:
1/4
scale, will form approximation steps starting at0
then1/4
,1/2
,3/4
ending with1
;1/10
scale will express percentages with one precision point;- scales which are multiples of
10
are useful for natural number approximation. The10
will round to tens:1007
->1010
, the100
to hundreds:1003
->1000
, and so on.
The result of actual value approximation to a given scale provides:
- Given Value a discrete value along the scaled number range to which actual value is the closest.
- Hedge a common use word describing the relation between actual and given values. Actual Value of
9.5
is below given value of10
. Actual Value of101
is over given value of100
. - Text a textual spell out of the given value. A
2666
isTwo thousand six hundred sixty six
. - Favorite Number expresses some common language names for certain numbers. A
0.25
is a favorite number in that that it has the name -a quarter
.
A full approximation result returns three such approximation data structures for a given value which is:
- smaller than the actual value on the scaled number range.
- greater than the actual value on the scaled number range.
- around the actual value on the scaled number range. For this a is chosen from the above two which is closer to the actual value.
Languages
Numeric approximation has two functionality points which are language dependent
- Hedges which will differ from language to language. See Configuration section to see how this can be controlled.
- Text number to text translation for a given value. For this translation Number Words relies on ICU4J.
Currently supported languages:
Usage
Number Words exposes approximation functionality through approximations
function which takes on the following parameters:
language
-:de
or:en
actual-value
- the number to approximatescale
- at which the approximation is to be performed.
(require '[numberwords.core :as nw]) (nw/approximations :en 0.258 1/4) => #:numwords{:around #:numwords{:hedges #{"approximately" "about" "around"}, :text "zero point two five", :given-value 1/4, :favorite-number #{"a quarter"}}, :more-than #:numwords{:hedges #{"over" "more than"}, :text "zero point two five", :given-value 1/4, :favorite-number #{"a quarter"}}, :less-than #:numwords{:hedges #{"nearly" "under" "less than"}, :text "zero point five", :given-value 1/2, :favorite-number #{"a half"}}}
Configuration
Hedges, favorite numbers can be modified and new languages added via changes to a configuration file - resources/numwords.edn
{;;Configuration is strucutured by the language :en { ;;Hedges section specifies which words are associated with given actual to given value relations :hedges {:equal #{"exactly"} :around #{"around" "approximately" "about"} :more #{"more than" "over"} :less #{"less than" "under" "nearly"}} ;;Favourite numbers map a special number with its textual expressions :favorite-numbers {1/4 #{"a quarter" "a fourth"} 1/2 #{"a half"}}}}
License
Copyright © 2020 TokenMill UAB.
Distributed under the The Apache License, Version 2.0.
from Hacker News https://ift.tt/2xQ2EDG
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.