Text Processing: Sentence & Word Tokenisation, Stemming and Lemmatisation

⚡️Hudson Ⓜ️endes
4 min readJan 17, 2023
"robot eating a lot of text in a fantasy world" , generated by https://stablediffusionweb.com/

“Data are represented in ways natural to problems from which they were derived”[1]. Consequently, it is not hard to understand that we don’t encode our love letters into a vectorial representation of our feelings and its meaning before writing it down.

Data Scientists must undergo a process named Text Processing in order to prepare the…

--

--