✅ LAB 1: Introduction to NLP Text Processing | MLT Sprint 5 - Case Study 1
🧠 Task 1: NLP - Python - Processing Raw Text | MLT Sprint 5 – LAB 1
Are you ready to dive into the fundamentals of Natural Language Processing (NLP) using Python? In this case study from Sprint 5, we’ll explore how to process raw text data from the web, tokenize it, and extract meaningful insights using Python’s popular NLTK library.
This is a perfect hands-on mini-project for anyone who’s starting with Machine Learning and Text Analytics.
📌 Objective:
Build a function called processRawText
that:
- Fetches and processes raw text from a URL.
- Tokenizes the text.
- Counts total and unique words.
- Computes word coverage and frequency distribution.
LAB 2: Case Study 2 - NLP - Text Representation
Task 1: Use Count Vectorizer to find the vocabulary for the given data set and store it in the variable S1. Note: Output must be dataframe and it’s column name should be ‘order’.
Task 2: Find the Bag of words for the given data set and store it in the variable S2. Note: Output must be dataframe and it’s column names should be the feature of words(get_feature_names).
Task 3: Find the Term Frequency (TF) with norm ‘l1’ and disable use_idf for the given dataset and store it in the variable S3. Note: Output must be dataframe and it’s column names should be the feature of words(get_feature_names)
Task 4: Find the Term Frequency (TF) with norm ‘l2’ and disable use_idf for the given dataset and store it in the variable S4. Note: Output must be dataframe and it’s column names should be the feature of words(get_feature_names).
Task 5: Find the TF*IDF (TFIDF) value for the given dataset and store it in the variable S5. Note: Output must be dataframe and it’s column names should be the feature of words(get_feature_names).
Task 6: Find the Inverse Document Frequency (IDF) value with soomth_idf as false for the given dataset and store it in the variable S6.
LAB 3: Welcome to MLT - Sprint 5 - Case Study 3 - NLP Sentiment Analysis
Case Study 3 - Sentiment Analysis
Read the question then perform the solution and assign the answer to the respective variables given in the cells below
Import packages and read dataset.
🧠 Use Case: Sentiment Analysis on Text Data with CSV Output
In this hands-on Natural Language Processing (NLP) task, we’ll walk through how to perform sentiment analysis on a dataset and export the final result as a CSV file. The goal is to classify each text as either Positive or Negative, and store it with a corresponding numeric value in a new file named sentiment.csv
.
🎯 Objective:
Build an end-to-end text classification pipeline that:
- Reads the dataset from a CSV file.
- Cleans and preprocesses the textual content.
- Removes unnecessary noise like numbers and special characters.
- Filters out common stop words.
- Predicts sentiment (positive or negative) using a trained model or rule-based logic.
- Stores the sentiment label and its corresponding numeric value (1 for Positive, 0 for Negative) in a CSV file.
0 Comments