Buch ; Online: Types, Tokens, and Hapaxes
A New Heap's Law
2018
Abstract: Heap's Law states that in a large enough text corpus, the number of types as a function of tokens grows as $N=KM^\beta$ for some free parameters $K,\beta$. Much has been written about how this result and various generalizations can be derived from Zipf's ...
Abstract | Heap's Law states that in a large enough text corpus, the number of types as a function of tokens grows as $N=KM^\beta$ for some free parameters $K,\beta$. Much has been written about how this result and various generalizations can be derived from Zipf's Law. Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher $n$-legomena. |
---|---|
Schlagwörter | Computer Science - Computation and Language |
Erscheinungsdatum | 2018-12-31 |
Erscheinungsland | us |
Dokumenttyp | Buch ; Online |
Datenquelle | BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl) |
Volltext online
Zusatzmaterialien
Kategorien
Fernleihe an ZB MED
Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.