Throughout this course we will look at many types of data structures.
We have already seen a handful of structures in CC 310. In that course, we looked at the structures, how they worked such as accessing elements and determining sizes. We also looked at how we would implement our own versions of those structures. This course will follow a similar pattern as we work through new data structures.
To start the course, we begin with an often overlooked but important data structure: strings. They are a very natural choice of data structure as there exists a lot of text based data. This data can be social media posts, product reviews, abstracts, and so much more.
The field of data science is rapidly growing and has many applications that benefit from text analysis. These tasks include: sentiment analysis, recommendation networks, categorization and classification, just to name a few.
In terms of real world application, Tensorflow is a Python package for natural language processing. Many researchers in industry and academia use Tensorflow and its pre-trained models to various tasks, like sentiment analysis and categorization. The developers of Tensorflow have built this visualizer at projector.tensorflow.org which is quite interesting to look at. They have used machine learning to determine which words are most related. Here we can look at ‘cat’ and we see some words that are very intuitively related to cat: mouse, dog, tiger, animal… Then there are some that maybe aren’t immediately clear: blue (for blue catfish), like (for cat like reflexes)
While these tasks have very different goals, they are similar in the fact that they require an efficient way to work with text based data. In the second section of the module, we will discuss how to handle strings and some complications they present.