Usually, businesses collect data with a location component, but we didn’t use it a lot during the data analysis. When we add geodata to our investigation, it gives us a unique opportunity to gain an edge and deliver better results. In this year, I worked with some datasets where the location was a crucial point. I can call a few, for example, COVID19 and realtors information.

We know that JavaScript Object Notation (JSON) is a way to represent the values of variables, arrays, and dictionaries in a text file. Usually, JSON files have a standardized format for different types of…

The world pandemic brought a lot of problems into every city. Food insecurity is one of them. The city of Los Angeles decided to work more with data science enthusiasts and create a hackathon to get another look at the problem. The goal of this event was to help connect residents with on-the-ground resources and organizations that can help. I attend this event to learn more about our city and bring my vision of the data. Unfortunately, I can finish my work with the team in the allotted time for technical reasons, but we managed to do some investigations. …

Our world is full of information and data for research. For data scientists, each piece of knowledge can be helpful in their projects. We can download massive data from the internet or collect our dataset. Usually, companies offer a CSV format of dataset or provide access via an Application Programming Interface (API). But occasionally, we need data from webpages without a convenient option to download. How can we collect them?

Web Scraping is the technique of automatically extracting data from websites using software or script.

Web scraping structuring data into a more convenient format, which needs you for work. How…

Everyday data scientists create thousands of plots and tables to represent their work. Sharing results of analysis sometimes become a big challenge. One of the ways to deliver information to people is by using dashboards. Usually, all tutorials sound like plug-and-play. When you try to dive into this theme, you found out that you need to know a little more. For example, CSS helps to make styling and page-layout. In this blog, I want to share my experience in creating dashboards.

For work, I use Python. So, I decided to check some libraries for creating dashboards. Very fast, I found…

During the study of Data Science, I met a batch of new algorithms and libraries useful for data analysis and predictions. All of them have their cons and pros. It is not a surprise. But only one algorithm has a reputation as lazy and greedy from the first word. You can put in this definition some different variants. Also, we can debate about the laziest algorithm! But today, I talk about the k-nearest neighbors’ algorithm, where symbol k means the number of nearest neighbors. A lot of blogs started from weaknesses and forgot about the advantages.

KNN is an effective…

Every project brings Data Scientist a lot of questions and challenges. When we face a new data set, we don’t know how to describe it clearly, which features correlate with each other, how many outliers we have, and more questions. We start to understand the data set from quick exploratory data analysis (EDA), where we should summarize hundreds of rows and columns. For this step better to have a simple and powerful tool is a plus.
One of the tools which help us save time and avoid frustration is the pivot table. It is useful for the slice, filter, and…

How to choose the right answer for the SQL interview question?

Every time you come into a data science interview, you can hear questions related to databases. Usually, most workflows involve quick slicing and dicing of data. Companies ask to write basic queries which have relevant in real-life settings.

  1. First of all, the interviewer wants to know how you understand the main concepts of working with a given database. Could you have any idea how to take the necessary data?
  2. Then they check how you understand the data. Do you know how different types of data works?
  3. Last, language features…

Python is an interpreted and high-level programming language with advantages. How we know, open-source libraries with variants amount of tools make it a useful tool for Data Scientist, Data Engineers, and Data Analysts. Python supports various databases like SQLite, MySQL, Oracle, MongoDB, PostgreSQL, etc. A lot of enterprise businesses stored their data in a relational database. Relation databases are intuitively understandable for users and have an efficient and powerful way to create, read, update and delete data of all kinds.

So, if we marry Python, and SQL it will give us some advantages working with data for projects. Usually, each…

I often hear the different variants of questions about multicollinearity in linear regression on interviews. They can sound like: How would you tackle multicollinearity in multiple linear regression? How do you solve for multicollinearity? Why is multicollinearity a potential problem? I decided to unite this question and find a precise answer for it.

Let’s start from the beginning. What is multiple linear regression? Multiple linear regression (MLR) or multiple regression is an extension of simple linear regression. We use it to estimate the relationship between two or more independent variables and one dependent variable. …

How often do you see maps in real life? How often do you work with maps if you are a data scientist? Sometimes we did not use information about the location from the data set for visualization at all. But a map can bring an understanding of the situation for your research. People better get at investigation result looking on the image.

Examples of data where can be use geo maps:

  • political dataset, where you may demonstrate the voting results by state or city;
  • visualization from epidemiologists for people, you may explain the distribution paths of a virus;
  • dataset for…

