NLTK is a python based language processing toolkit. This tool is quite popular
for text processing tasks. Some of the text processing tasks are:
1. Sentence detection:
If you have a large corpus which is noisy. You need to extract true language sentences, then you can do:
from nltk.tokenize import sent_tokenize
text = open('input.txt','r').read()
sentences = sent_tokenize(text)
Now, you need to write output in separate data, for that you can do as:
out = open('out.txt','w')
for line in sentences:
out.write(line)
for text processing tasks. Some of the text processing tasks are:
1. Sentence detection:
If you have a large corpus which is noisy. You need to extract true language sentences, then you can do:
from nltk.tokenize import sent_tokenize
text = open('input.txt','r').read()
sentences = sent_tokenize(text)
Now, you need to write output in separate data, for that you can do as:
out = open('out.txt','w')
for line in sentences:
out.write(line)
No comments:
Post a Comment