PCDE Course: Module 3 Content

Hello World!

Knowledge Check 3.1: Pandas Series

Knowledge Check 3.2: Reading and Writing in Pandas

Knowledge Check 3.3: Pandas Dataframes

Self-Study Drag & Drop Activity 3.1: Series vs. Dataframes

Series and (dataframes) are both fundamental pandas (data structures) that are designed to store data.

Both pandas series and dataframes are (indexed), meaning that each element can be accessed by specifying an index.

Naturally, in order to access a specific element, the index associated with each element must be (unique).

The main difference between pandas series and dataframes is that the first one stores data in (one dimnesion), whereas the latter stores data in (two dimensions).

Because series represent data one-dimensionally, they can be visualized as indexed arrays with one (column) and as many elements as needed for the many observations that were recorded to define the data.

It follows that dataframes are just an extension of multiple series side by side.

Therefore, pandas dataframes can be viewed as (tables), where each row corresponds to an observation of the data collected in the dataframe, and each column represents a (label) for each measurement or record taken.

It is important to keep in mind that, although the data across multiple columns can be of (different) data types, all the values in a single column must store data of the (same) type.

Knowledge Check 3.5: Time and Date Functionality in Pandas

Discussion 3.1: Applications of Time and Date Functionality in Pandas

Prompt

As you become an expert Python programmer, you will often come across data that is in a date or time format.

The Python datetime module contains a wide range of functions to manipulate dates and times in your program. Therefore, it’s important that you become familiar with this module and its functions.

First, review this complete list of Python datetime built-in functions (Links to an external site.).

Next, select three built-in Python functions from the datetime module that you have not yet familiarized yourself with in this module’s videos. For each, provide a summary of its configurations and applications. Be sure to include an example of a practical use for each function that you choose.

This initial post should be between 75 and 100 words.

Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.

Suggested Time: 45 minutes

Suggested Length: 75-100 words

This is a required activity and will count toward course completion.

Response

datetime.datetime.now(tz=None)

This function will use the system time and either the given tz=VALID_TIMEZONE_STRING timezone string or if no timezone string is given, the operating system specified timezone, ie the local time, gets used by default. The returned object is a datetime object which can represent a timestamp for any given moment in time, in this case, a moment in time the millisecond this function is executed. A function that takes immediate timestamps is useful in many situations and here are just a few: operating system logging, measuring time for a python function to run, creating a transaction ledger that needs exact times, leaving, and many more.

from datetime import datetime

timestamp1 = datetime.now()
print(timestamp1)

which outputs

2022-08-01 18:41:17.134578

datetime.datetime.timedelta

The timedelta class with its associated constructor gets used to calculate intervals of time. This is done with the timedelta() constructor which allows you to specify differences in time with various kinds of units of time. Those time unit parameters are weeks, days, hours, minutes, seconds, milliseconds, microseconds. The default values of all those parameters are 0. With a timedelta object you can perform basically every mathematical operator standard to Python on those objects. This is useful for any situation where you either need to modify a timestamp, or make comparisons with them. Let's say you're creating a timed event. You'd take a datetime created from the now function from before. Then you'd create a timedelta object with whatever timer offset you need, then you'd keep checking if the current timestamp is less than the initial timestamp plus the timedelta.

from datetime import datetime, timedelta
timer_start = datetime.now()
print(timer_start)
timer_period_future = timedelta(days=1, hours=12, minutes=45)
timer_period_past = timedelta(seconds=-1)
print(timer_period_future)
print(timer_period_past)
if datetime.now() >= timer_period_future + timer_start:
print('Timer has not reached the timedelta yet')
elif datetime.now() >= timer_period_past + timer_start:
print('Timer has reached timedelta')

which outputs

2022-08-01 18:41:17.134578
1 day, 12:45:00
-1 day, 23:59:59
Timer has reached timedelta

datetime.strptime()

This parses a string with correctly formatted, according to the format string, datetime information into a datetime object. It takes a datetime string as its first positional argument and a formatting string as the second positional argument. There's a lot of format codes that would be too long to summarize in this discussion, but you can find them here. This is useful whenever you need to create custom time formats from incoming data. Very useful when ingesting data into datastructures like pandas and performing computations on them if python can treat as properly formatted datetime objects. Say for example if we wanted to parse a short version of the ISO8601 format from a string, as in years to seconds, but no unit of time less than seconds. You'd do something like this.

isotimestamp = datetime.strptime('2022-03-29T19:59:01', "%Y-%m-%dT%H:%M:%S")
print(isotimestamp)

which gives:

2022-03-29 19:59:01

Knowledge Check 3.6: Designing Dataframes and Indexing in Pandas

Self Study Drag & Drop Activity 3.3: Different Ways to Modify a Dataframe

Sometimes, when data is presented to you in different files, it may be convenient to (combine) your dataframes for easier analysis.

There are two techniques that one can use to combine dataframes: the (union) and the join.

The union is performed whenever you want to (append) the (columns) of one dataframe to another. Naturally, in order for this technique to work, you must ensure that the dataframes contain (exactly) the same columns.

In pandas, the union can be performed by using the (concat()) function.

On the other hand, the operation of combining columns in different dataframes that contain common values is called the (join).

There are four different types of joins: inner, outer, left, and (right).

All of the joins work by combining two dataframes based on a (join key).

In the case of the inner join, the resulting dataframes will contain only the rows that have (matching) values in both of the original dataframes.

Conversely, when performing an outer join, the resulting dataframe will contain (all) the rows from the original dataframes and (NaNs) where data is missing in one of the dataframes.

Knowledge Check 3.7: Data Analysis & Time Zones in Pandas

References

Web Links

Note Links