texttaglib’s documentation!¶

Warning

⚠️ THIS PROJECT HAS BEEN RENAMED AND ARCHIVED. ALL FUTURE DEVELOPMENT WILL BE ON speach LIBRARY ⚠️

Migration to speach¶

Migration from texttaglib to speach should be trivial

# just change import statements from something like
from texttaglib import elan
# to the new package name
from speach import elan

Installation

# change
pip install texttaglib
# into
pip install speach

For more information, please visit: https://speach.readthedocs.io/

(Legacy) Introduction¶

texttaglib is a Python library for managing and annotating text corpuses in different formats.

https://readthedocs.org/projects/texttaglib/badge/?version=latest&style=plastic

Main functions are:

Multiple storage formats (text files, JSON files, SQLite databases)
TTLIG - A human-friendly intelinear gloss format for linguistic documentation
Manipuling transcription files directly in ELAN Annotation Format (eaf)

Installation¶

texttaglib is availble on PyPI.

pip install texttaglib

Basic usage¶

>>> from texttaglib import ttl
>>> doc = ttl.Document('mydoc')
>>> sent = doc.new_sent("I am a sentence.")
>>> sent
#1: I am a sentence.
>>> sent.ID
1
>>> sent.text
'I am a sentence.'
>>> sent.import_tokens(["I", "am", "a", "sentence", "."])
>>> >>> sent.tokens
[`I`<0:1>, `am`<2:4>, `a`<5:6>, `sentence`<7:15>, `.`<15:16>]
>>> doc.write_ttl()

The script above will generate this corpus

-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_concepts.txt
-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_links.txt
-rw-rw-r--.  1 tuananh tuananh      20  3月 29 13:10 mydoc_sents.txt
-rw-rw-r--.  1 tuananh tuananh       0  3月 29 13:10 mydoc_tags.txt
-rw-rw-r--.  1 tuananh tuananh      58  3月 29 13:10 mydoc_tokens.txt

ELAN support¶

texttaglib library contains a command line tool for converting EAF files into CSV.

python -m texttaglib eaf2csv input_elan_file.eaf -o output_file_name.csv

For more complex analyses, texttaglib Python scripts can be used to extract metadata and annotations from ELAN transcripts, for example:

from texttaglib import elan

# Test ELAN reader function in texttaglib
eaf = elan.open_eaf('./data/test.eaf')

# accessing metadata
print(f"Author: {eaf.author} | Date: {eaf.date} | Format: {eaf.fileformat} | Version: {eaf.version}")
print(f"Media file: {eaf.media_file}")
print(f"Time units: {eaf.time_units}")
print(f"Media URL: {eaf.media_url} | MIME type: {eaf.mime_type}")
print(f"Media relative URL: {eaf.relative_media_url}")

# accessing tiers & annotations
for tier in eaf.tiers():
    print(f"{tier.ID} | Participant: {tier.participant} | Type: {tier.type_ref}")
    for ann in tier.annotations:
        print(f"{ann.ID.rjust(4, ' ')}. [{ann.from_ts.ts} -- {ann.to_ts.ts}] {ann.value}")

SQLite support¶

TTL data can be stored in a SQLite database for better corpus analysis. Sample code will be added soon.

Contents:

Useful Links¶

pyInkscape documentation: https://texttaglib.readthedocs.io/
pyInkscape on PyPI: https://pypi.org/project/texttaglib/
Soure code: https://github.com/letuananh/texttaglib/

texttaglib’s documentation!¶

Migration to speach¶

(Legacy) Introduction¶

Installation¶

Basic usage¶

ELAN support¶

SQLite support¶

Useful Links¶

Indices and tables¶

texttaglib

Navigation

Related Topics