Inverse Text Normalization ¶

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

Overview ¶

This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.

These examples were produced by running this script.

Installation ¶

This package supports Python versions >= 3.7

To install from PyPI:

pip install itnpy2

To install locally:

pip install -e .

Tests ¶

To run tests, use pytest in the root folder of this repository:

pytest

Issues ¶

This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!

Citation ¶

If you find this work useful, please consider citing it.

@misc{hsu2022itn,
  title        = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
  author       = {Brandhsu},
  howpublished = {https://github.com/barseghyanartur/itnpy},
  year         = {2022}
}

Reason for this “fork” and PyPI resurrection ¶

The original itnpy project has been removed from PyPI and GitHub. That destructive action broke some dependant packages.

It has been recovered from cached version of PyPI (through Bing) and this early fork.

Maintainer ¶

Artur Barseghyan <artur.barseghyan@gmail.com>

Inverse Text Normalization ¶

Overview ¶

Installation ¶

Tests ¶

Issues ¶

Citation ¶

Reason for this “fork” and PyPI resurrection ¶

Maintainer ¶

Project documentation ¶

Indices and tables ¶

itnpy

Navigation

Related Topics