Commit 36882c53 authored by Erxleben, Fredo's avatar Erxleben, Fredo
Browse files

Improve the exercise for loading the data

parent cdd31481
Pipeline #148693 passed with stage
in 28 seconds
# Episode 30 - Pandas
# Pandas Exercises
This is a stand-alone open-end exercise to practise working with _pandas_ on a real-live data set.
> It is highly recommended to have the _pandas_ documentation open for this exercise, it will be needed a lot.
> Not all required functions will be listed in the exercise document, finding out what to use is part of the intended training.
Please take your time to read the instructions and hints carefully to avoid getting stuck on minor details.
## Getting the Data
The [NOAA][noaa] provides open weather data for stations around the world.
The [NOAA][noaa] provides open weather data from stations around the world.
* Here is a [list of all stations][stations] and their respective codes
* This list in itself already makes for an interesting data set to explore
* This is the [weather data archive][archive].
......@@ -23,13 +25,13 @@ The [NOAA][noaa] provides open weather data for stations around the world.
## Loading the Data
* For loading, the `pandas.read_csv()`-function can be used.
* [`read_csv()` documentation][pandas-read-csv-doc]
* Note that for these data sets the seperator is not a comma, but multiple whitespaces
* The [regular expression][regex] `"\s+"` can be used to describe the separator
* The downloaded data is compressed in a `gz`-archive. You _could_ decompress it before working with it (especially useful if you want to inspect the data beforehand with a plain text editor or other tool/programs), the `read_csv()`-function itself however can handle a such an archive just fine
* Note that for these data sets the seperator for the data fields is not a comma, but multiple whitespaces. You can use the [regular expression][regex] `r"\s+"` to express this in python.
* Note the parameter `parse_dates` which can come in extremely handy
* Note that the data set as provided has **no header**
### Tasks
* Consider **first** what the loaded dats should look like
* Consider **first** what the loaded data should look like
* Load the data set
* Display the loaded data, compare with your expectations and do a plausability check
* Assign a proper header based on the information from the [data documentation][documentation]
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment