BoNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Boyig (TIbetan) language.
pip install bonltk
Todo:
- Tokenizers:
- [ ] Hugging face tokenizers
- [x] sentencepiece tokenizer
- [ ] Compare above tokenizers with botok
- WordVectors:
- [x] Word2Vec with gensim
- [ ] Emlo
- Language Models:
- [ ] Huggingface transformers
- [ ] UMLFit Language model with fastai
- Text Similarity: