Skip to main content

Posts

Showing posts from June, 2024

Let's Brew the Soup aka BeautifulSoup - A web scraping journey

P ython is a versatile programming language that has gained widespread popularity for its diverse applications, including web scraping and data extraction. It was during the COVID-19 pandemic in 2020 that I developed a keen interest in Robotic Process Automation (RPA). Intrigued by the potential of RPA, I considered acquiring a UiPath license, which is a leading RPA platform. However, due to financial constraints, I was unable to proceed with the purchase. Initially, I felt disheartened by the realization that web scraping through traditional means might not be as efficient as the automated bots offered by RPA solutions. These bots not only possess robust web scraping capabilities but also offer additional features such as email notifications and seamless file conversion to formats like CSV and XLS. It was then that I discovered the BeautifulSoup module in Python, a powerful tool designed specifically for web scraping. The installation process for this module is straightforward, making...

Variance and Bias!

Ever wondered what bias and variance are and how they affect our Machine Learning models? I was on the lookout for the basic definition of Bias and Variance in the ML language. I stumbled upon a site that had a beautiful explanation, and this is what I learned about it. Bias : Bias is the error that occurs when the model fails to meet the expectations. Say, I have trained a model, and while testing it, I need to measure the accuracy. While doing so, the prediction and testing data are used. When it predicts with an accuracy of 96%, the remaining 4% would be the bias error, error of bias, or simply bias. In order to decrease this error, we should probably introduce variance. Variance : Variance is the spread of data around the mean point. We can see how it acts when the Machine Learning model changes or becomes sensitive to different datasets apart from the trained values or data. Now, we need to remember that low bias and high variance can result in overfitting of the model, while the ...

Astype vs pd.to_datetime

Astype and pandas date time Ever wondered that we could be using date time conversion in python could lead us to two different methods that perform same job but little do we know that their real working principle. As we can see below that astype and pd.to_datetime are used for converting a column of dtype say from string or object to Datetime format. By doing so we can separate them for days, week, no of days or day of week by using .dt.dayofweek as an example.  astype Purpose:  General type conversion. Usage:  Converts a pandas object (like a DataFrame column) to a specified dtype. Example:  If you have a column of strings representing dates and you want to convert them to datetime objects, you can use  astype . df[ 'date_column' ] = df[ 'date_column' ].astype( 'datetime64[ns]' ) pd.to_datetime Purpose:  Specialized function for parsing date and time strings to datetime objects. Usage:  Converts argument to datetime, optionally with more control over ...