S-STEM Scholar Bao Linh's Blog

Posts

Showing posts from November, 2021

#8 Sklearn - Python package - Model evaluate metrics for regression

November 24, 2021

One of the most important steps in building prediction model is evaluating how accurate our model is by the way using metrics to calculate the amount of deviation between predicted value and actual value. In this journal, we will dive into 4 metrics, they are Mean Absolute Error, Mean Square Error, Root Mean Spare Error, and R-squared score. All the functions we use that come from sklearn Python package (from sklearn import metrics). Mean Absolute Error (MAE) is the mean of the absolute value of the errors. In other words, we use the differences of each actual value with predicted value then divide by its number of observations. It measures the average of the residuals in the dataset. That means the weight of all the errors are the same. For example, the difference of first pair yi1 and y^i1 is 5, the difference of second pair yi2 and y^i2 is 1. The average of them will be 3. Then the error will be off by 3. So, "MAE is bes...

#7 Sklearn - Python Package - Linear Regression (Part 2)

November 17, 2021

In this week, I continue working on Linear Regression project about predicting house pricing in Washington state. Comparing to the previous journal, this one is elaborated with more explanation and steps. Linear Regression algorithm generally means we will use a straight line to display the relationship between 2 variables (dependent and independent variables, or features and target). In other words, we will use this algorithm to predict the output (dependent variables) bases on the input (independent variable). For packages in needed, we will import Linear Regression from sklearn.linear_model (using for creating linear regression and making prediction), train_test_split from sklearn.model_selection (using for splitting out data into 4 subset), import metrics from sklearn (using for evaluating accuracy of our model), import matplotlib.pyplot (using for visualizing data), and %matplotlib.inline for displaying the graph interactively in jupyter notebook. I downloaded dataset from Ka...

#6 Sklearn - Python package - Linear Regression

November 14, 2021

In this week, I am learning an algorithm in sklearn package (python) names Linear Regression. Linear Regression "is a linear approach for modelling the relationship between a scalar response and one or more explanatory variable (also known as dependent and independent variables). In these pictures below, I was using Linear Regression to predict the house pricing in Washington State. I have also exposed the new graphic packages are seaborn and matplot, I won't plan to inquire too much in these, but I found it's helpful to visualize the data. To me, the best way to learn something is practicing and working through specific project. That's why I always try to find something related to what I want to learn then start it. It looks pretty messy now, but I will give an update soon in next week after I elaborate my code beautifully.

#5 Social Media: Anonymity

November 04, 2021

Social media has facilitated unlimited connection between people all around the world, albeit at a price. One of the features of social media, that has gained notoriety recently is anonymity, which allows internet users to hide their real identity when going online. A heated debate has formed regarding the benefits and consequences of anonymity, and no consensus has been made as to whether it should be preserved or banned. Its advocates believe this feature is an essential part of online privacy and free speech in an age of online surveillance and self-censorship (Murphy & O’Leary). The opposite side argues anonymous accounts on social media have done nothing more than being a proliferating germ of fake news, spams and trolls (Imbellino). A survey done by Rainie Lee and his colleagues featured in their article “Anonymity, Privacy, and Security Online” shows that 59 percent of users believed they cannot be completely anonymous on...