BookNLP is a natural language processing pipeline for books

One of the most challenging things about natural language processing (NLP) is how broad and expansive it is for understanding natural language. Where do you start? One of the best things about NLP is how you can narrow it down with things like trained models and pipelines.

BookNLP is an NLP pipeline that works specifically on books and long-form documents in English. Features include:

  • Part-of-speech (POS) tagging
  • Dependency parsing
  • Named entity recognition (NER)
  • Character name clustering and coreference resolution
  • Event tagging
  • Referential gender inference

BookNLP comes with small and large BERT language models for different needs. The smaller one is better for personal use while the large model works best for more powerful computers and bigger projects.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.