👍🏻 表库 ========== 语言库 ------ Node.js 和 Javascript ~~~~~~~~~~~~~~~~~~~~~ Node.js NLP 的库 - `Twitter-text `__ - Twitter 的文本处理库的 JavaScript 实现 - `Knwl.js `__ - JS 中的自然语言处理器 - `Retext `__ - 用于分析和操纵自然语言的可扩展系统 - `NLP Compromise `__ - 浏览器中的自然语言处理 - `Natural `__ - 节点的一般自然语言设施 - `Poplar `__ - 用于自然语言处理(NLP)的基于 Web 的注释工具 Python ~~~~~~ Python NLP 库 - `TextBlob `__ - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of `Natural Language Toolkit (NLTK) `__ and `Pattern `__, and plays nicely with both :+1: - `spaCy `__ - Industrial strength NLP with Python and Cython :+1: - `textacy `__ - Higher level NLP built on spaCy - `gensim `__ - Python library to conduct unsupervised semantic modelling from plain text :+1: - `scattertext `__ - Python library to produce d3 visualizations of how language differs between corpora - `AllenNLP `__ - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. - `PyTorch-NLP `__ - NLP research toolkit designed to support rapid prototyping with better data loaders, word vector loaders, neural network layer representations, common NLP metrics such as BLEU - `Rosetta `__ - Text processing tools and wrappers (e.g. Vowpal Wabbit) - `PyNLPl `__ - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for `FoLiA `__, but also ARPA language models, Moses phrasetables, GIZA++ alignments. - `jPTDP `__ - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages. - `BigARTM `__ - a fast library for topic modelling - `Snips NLU `__ - A production ready library for intent parsing - `Chazutsu `__ - A library for downloading&parsing standard NLP research datasets - `Word Forms `__ - Word forms can accurately generate all possible forms of an English word - `Multilingual Latent Dirichlet Allocation (LDA) `__ - A multilingual and extensible document clustering pipeline - `NLP Architect `__ - A library for exploring the state-of-the-art deep learning topologies and techniques for NLP and NLU - `Flair `__ - A very simple framework for state-of-the-art multilingual NLP built on PyTorch. Includes BERT, ELMo and Flair embeddings. - `Kashgari `__ - Simple, Keras-powered multilingual NLP framework, allows you to build your models in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS) and text classification tasks. Includes BERT and word2vec embedding. C++ ~~~ C++ 库 - `MIT Information Extraction Toolkit `__ - C, C++, and Python tools for named entity recognition and relation extraction - `CRF++ `__ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks. - `CRFsuite `__ - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data. - `BLLIP Parser `__ - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser) - `colibri-core `__ - C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way. - `ucto `__ - Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format. - `libfolia `__ - C++ library for the `FoLiA format `__ - `frog `__ - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer. - `MeTA `__ - `MeTA : ModErn Text Analysis `__ is a C++ Data Sciences Toolkit that facilitates mining big text data. - `Mecab (Japanese) `__ - `Moses `__ - `StarSpace `__ - a library from Facebook for creating embeddings of word-level, paragraph-level, document-level and for text classification Java ~~~~ Java NLP 库 - `Stanford NLP `__ - `OpenNLP `__ - `NLP4J `__ - `Word2vec in Java `__ - `ReVerb `__ Web-Scale Open Information Extraction - `OpenRegex `__ An efficient and flexible token-based regular expression language and engine. - `CogcompNLP `__ - Core libraries developed in the U of Illinois’ Cognitive Computation Group. - `MALLET `__ - MAchine Learning for LanguagE Toolkit - package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. - `RDRPOSTagger `__ - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages. Kotlin ~~~~~~ Kotlin NLP 库 - `Lingua `__ A language detection library for Kotlin and Java, suitable for long and short text alike - `Kotidgy `__ — an index-based text data generator written in Kotlin Scala ~~~~~ Scala NLP 库 - `Saul `__ - Library for developing NLP systems, including built in modules like SRL, POS, etc. - `ATR4S `__ - Toolkit with state-of-the-art `automatic term recognition `__ methods. - `tm `__ - Implementation of topic modeling based on regularized multilingual `PLSA `__. - `word2vec-scala `__ - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy. - `Epic `__ - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models. R ~ R NLP 库 - `text2vec `__ - Fast vectorization, topic modeling, distances and GloVe word embeddings in R. - `wordVectors `__ - An R package for creating and exploring word2vec and other word embedding models - `RMallet `__ - R package to interface with the Java machine learning tool MALLET - `dfr-browser `__ - Creates d3 visualizations for browsing topic models of text in a web browser. - `dfrtopics `__ - R package for exploring topic models of text. - `sentiment_classifier `__ - Sentiment Classification using Word Sense Disambiguation and WordNet Reader - `jProcessing `__ - Japanese Natural Langauge Processing Libraries, with Japanese sentiment classification Clojure ~~~~~~~ - `Clojure-openNLP `__ - Natural Language Processing in Clojure (opennlp) - `Infections-clj `__ - Rails-like inflection library for Clojure and ClojureScript - `postagga `__ - A library to parse natural language in Clojure and ClojureScript Ruby ~~~~ - Kevin Dias’s `A collection of Natural Language Processing (NLP) Ruby libraries, tools and software `__ - `Practical Natural Language Processing done in Ruby `__ Rust ~~~~ - `whatlang `__ — 基于三元组的自然语言识别库 - `snips-nlu-rs `__ - 用于意图解析的生产就绪库 服务 ---- NLP 作为具有更高级功能的 API,例如 NER,主题标记等 - `Wit-ai `__ - 应用和设备的自然语言界面 - `IBM Watson 的自然语言理解 `__ - API 和 Github 演示 - `亚马逊领悟 `__ - NLP 和 ML 套件涵盖了最常见的任务,如 NER,标记和情感分析 - `谷歌云自然语言 API `__ - 至少 9 种语言的语法分析,NER,情感分析和内容标记包括英语和中文 (简体和繁体). - `平行点 `__ - 高级文本分析 API 服务,从情感分析到意图分析 - `Microsoft 认知服务 `__ - `文字剃刀 `__ - `罗塞特 `__ 注释工具 -------- - `GATE `__ - 通用架构和文本工程已有 15 年历史,免费和开源 - `Anafora `__ 是免费的开源,基于 Web 的原始文本注释工具 - `brat `__ - brat rapid annotation tool 是一个用于协作文本注释的在线环境 - `tagtog `__, costs $ - `prodigy `__ 是一种由主动学习提供支持的注释工具, costs $ - `LightTag `__ - 为团队提供托管和管理的文本注释工具, costs $