removed some columns in 0.5.1 #152

koftezz · 2024-12-28T13:52:17Z

Hey, I was going to upgrade to a newer version since I was using 0.4, however, you stopped returning words, weekday hour columns in get_df method. I was wondering why is there such a case.

0.5.1
100%|█████████████████████████████████| 8340/8340 [00:00<00:00, 14199.79it/s]
28.12.2024 16:49:47 INFO Finished parsing raw messages.
Index(['timestamp', 'author', 'message'], dtype='object')

0.5.0
100%|█████████████████████████████████| 8340/8340 [00:00<00:00, 14126.80it/s]
28.12.2024 16:50:47 INFO Finished parsing raw messages.
Index(['timestamp', 'author', 'message', 'weekday', 'hour', 'words',
'letters'],
dtype='object')

joweich · 2025-01-04T14:30:00Z

Hey @koftezz, we decided to remove these aggregations from the default calculation as they can be inferred from the message field later (see commit cbd31ba). This way, the dataframe is leaner.

You can simply use this polars snippet to bring them back:

df = df.with_columns([
    pl.col("timestamp").dt.weekday().map_dict({
        0: "Monday", 1: "Tuesday", 2: "Wednesday",
        3: "Thursday", 4: "Friday", 5: "Saturday", 6: "Sunday"
    }).alias("weekday"),
    pl.col("timestamp").dt.hour().alias("hour"),
    pl.col("message").str.split(" ").list.len().alias("words"),
    pl.col("message").str.len_chars().alias("letters")
])

If you prefer to use pandas, you'll do this:

df = df.to_pandas()
df["weekday"] = df["timestamp"].dt.day_name()
df["hour"] = df["timestamp"].dt.hour
df["words"] = df["message"].apply(lambda s: len(s.split(" ")))
df["letters"] = df["message"].apply(len)

Let me know if you have any further concerns!

joweich closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removed some columns in 0.5.1 #152

removed some columns in 0.5.1 #152

koftezz commented Dec 28, 2024

joweich commented Jan 4, 2025

removed some columns in 0.5.1 #152

removed some columns in 0.5.1 #152

Comments

koftezz commented Dec 28, 2024

joweich commented Jan 4, 2025