S-STEM Scholar Bao Linh's Blog

Posts

Showing posts from October, 2021

#4 Project progress: Building a basic machine learning successfully

October 28, 2021

In this journal, we will continue to build our first basic machine learning after importing and cleaning data. Besides pandas module, sklearn.feature_extraction.text and sklearn.metrics.pairwise are two more modules that we will use to import TfidfVectorizer and linear_kernel methods. TfidfVectorizer comes from sklearn module, that uses to convert string (words) to matrix (vector) shape. Linear_kernel also comes from sklearn module, that uses to generate the linear similarity between two vectors. Cosine_similarity method can do the same work as linear_kernel, but we use linear_kernel because computer compiles data faster. The third new method that we use in this project is pandas Series. We use this method to assign index for a dataframe, you can take a look at line 31. Line 36 displays the code for creating a system. We use enumerate method for enumerate similarity and title, we put list() method in front of it because we want to create it as a list. Then we use sorted meth...

#3 Project Progress: Import and Clean the Data in Python

October 21, 2021

I am so excited to share with you about my first project in Python related to Data Science. In order to complete this project, first of all, we need to import the data. Jupyter notebook is a interactive computing notebook environment, that is came with Anaconda - a Python distribution for scientific computing. We will run our code on this notebook. We will use read_csv() method in pandas package to read the csv files. My file here is Data.csv, that I download from Kaggle ( TMDB 5000 Movie Dataset | Kaggle ). df.head() uses for printing some head rows of data. df.shape() uses for printing the shape of data, you can see in this data set, we have 8403 rows and 20 different features. After importing the data, we need to clean them to make sure we won't mess them up when we manipulate to build a model. We have total 20 features here but we won't use them all. So, we need to decide what features are important and what are not. Because I want to build a content-based movie recommenda...

#2 Project update - General information regarding tools and packages in Python

October 15, 2021

In this journal, I will go over some general information and the tool that I will use to do this project. This is what I've searched and learned so far. In order to complete this project, we will go over 7 steps. That is also considered to be the typical steps to build a machine learning model in general. 1. Import the Data Kaggle.com is a great source for dataset, where we can find a lot of precious dataset. In this project, I will use the Movie List dataset. In this step, we will use Numpy package in Python for creating dataframe to hold the dataset. Numpy stands for Numerical Python, this is a general-purpose array-processing package in Python. We will utilize this package to deal with array. I will attach here the link of Numpy in case someone wants to get to know more about it: https://www.w3schools.com/python/numpy ...

#1 Project Overview – Movie Recommendation System Utilizing Python

October 15, 2021

Utilizing data as a tool in order to giving the effective suggestion of human activities in the 4.0 modern life allows users save a ton of time for similar research. We aren’t certainly strangers to recommendation algorithms on YouTube or Facebook, which always shows us the list of video suggestions or posts/pages based on what we already watched or liked. Ignoring the personal privacy matters, I think it is an intelligent tool for both companies in marketing and users in making easier choice. My project is about building a content-based movie recommendation system utilizing Python language and SQL. In this project, I plan to use the dataset from Full MovieLens Dataset including 45.000 movies featured, that I will use for analyzing and manipulating in order to build the system. For content-based recommendation system, we will give the users some suggestions of movies based on what they already watched. In other words, we will track users’ interesting in genres (romantic/horror/humo...