Machine Learning for Text Classification
Machine Learning for Text Classification
Machine learning can used to classified different category of texts, by using various classifiers. The common machine learning classifiers are support vector machine, Naive Bayes and even ensemble models, such as stochastic gradient descent. Besides, deep learning neural network is also a choice for the text classification. Those classifiers and neural network enable us to category the text and even to find the author for the specific text.
The general idea is that we input the text of specific category or author, and then we extract the count or the frequency of the words in the text. Then we can use these information to build the classifier and classify the future input.
In the real world, text classification enable people to classier the text to different category. For example, social media company can use text classification to identify the type of commercial product that they may desire to buy by using the text that they wrote on the social media. Additionally, text classification is widely used for spam detection and customers feedback, which enable human to save time, in the comparison with checking the text by themselves.
One of the most common classier for text classification is using support vector machine. It general has a high accuracy and reliable in most times. To use the support vector machine, we need to use TD-IDF (frequency–inverse document frequency) to record and represent the frequency of the words for the specific category of text. Then we build the linear support vector machine based on TD-IDF. If we put the TF-IDF of a new text into the SVM, it will automatically output the category of the text.
Apart form the above, multi-layer perceptron is also a good choice for text classification. It requires the word bedding of the input text, such as transforming text to Word2vec. Then we can build multi-layers, such as softmax function layer or sigmoid layer, to train for the parameter of the neural networks. After training period, we can use the neural network to classify new texts.
Machine learning enable companies to have many useful applications for text classification, such as spam detection, interest category and customer feedback. These applications save the money for he company and enable company perform more efficient in more commercial area. With the development of machine learning on text classification, industry are getting substantial benefits from it.
Peizhen Tong