Evaluating bids

Parsing on Python + Visualisation

Published on the June 07, 2020 in IT & Programming

About this project

Open

Step 1. Select a dataset that You would be interested in exploring.
For parsing a dataset from any source to your taste – 20 points
For collecting a dataset from different sources (several ready-made datasets, but collected from different sources and United by a common logic) – 10 points
For just a ready – made dataset-5 points
Don't forget to insert links to everything you use and describe your actions in detail.

A few ideas for where to get data sets:
https://www.kaggle.com/datasets
https://datasetsearch.research.google.com/
https://archive.ics.uci.edu/ml/datasets.php (but there are not the most interesting / new datasets)
https://data.mos.ru/ (data from the Moscow government. They are, however, used more often for urbanism, but you can take a look)
https://registry.opendata.aws/ (it's hard to get them out, it's hard to find easy-to-learn datasets, but if you want to do it anyway, we'll add another 7 points for downloading them from there, i.e. 17 for step 1)
These are just ideas! You can take any source, but be sure to specify it and describe Your actions.


Step 2. Pre-treatment.
Analyze the columns.
What do they contain? Are there passes and where?
To handle the gaps. If a column has a majority of omissions, you can delete it. Process the omissions themselves: decide to delete them or not, fill them in with a median/average or something else, and write down why You chose this or that option.
For this +5 points.

Step 3. Creating new attributes.
Sometimes you can create new ones based on existing attributes.
For example, from the date column, you can get: whether the day is a working day, whether the month is the end of the quarter, whether the day is a national holiday, and so on. Create attributes that fit logically into Your data set. The maximum score is up to 15 points, depending on how many features and how well they logically fit into the study.
The score is set by the assistant, You can consult how to do better.

Step 4. Visualization
I strongly recommend that you remember workshop 6.Your task is to build beautiful and interesting visualizations, on the basis of which you can put forward hypotheses. The maximum score is 20, depending on the number of graphs, the complexity of their construction, and the interesting ideas that can be noticed from them.
In addition to the graphs created during the workshop, you can search for ideas in the gallery of the matplotlib library. Similarly, you can consult with teaching assistants.

Step 5. Hypotheses
Put forward and test hypotheses based on the collected data. You can use queries to tables and groupby.
The Maximum score for the task is 20, depending on the number and quality of the tested hypotheses. The more interesting finds You make, the higher the score for the task you will get.

Project overview

parsing, python, visualization, hypothesis

Category IT & Programming
Subcategory Other
Project size Small
Is this a project or a position? Project
I currently have I have an idea
Required availability As needed

Delivery term: June 10, 2020

Skills needed

Other projects posted by R.