Natural Language Processing
Natural Language processing (NLP) is a field of Computer Science and linguistics concerned with the interactions between computers and human (natural) languages; it began as a branch of Artificial Intelligence. In theory, natural language processing is a very attractive method of human–computer interaction. Natural language understanding is sometimes referred to as an AI-complete problem because it seems to require extensive knowledge about the outside world and the ability to manipulate it. Whether NLP is distinct from, or identical to, the field of computational linguistics is a matter of perspective. The Association for Computational Linguistics defines the latter as focusing on the theoretical aspects of NLP. On the other hand, the open-access journal "Computational Linguistics", styles itself as "the longest running publication devoted exclusively to the design and analysis of natural language processing systems" (Computational Linguistics (Journal)) Modern NLP algorithms are grounded in Machine Learning, especially statistical machine learning. Research into modern statistical NLP algorithms requires an understanding of a number of disparate fields, including linguistics, computer science, and statistics. For a discussion of the types of algorithms currently used in NLP, see the article on pattern recognition. http://en.wikipedia.org/wiki/Natural_language_processing
Here are 5 case studies using Sci Kit Learn specifically for text & document classification... Scikit is an open source Machine Learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Edited: | Tweet this!