Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback Dec 2024 - all sessions #58

Open
kbjarkefur opened this issue Dec 3, 2024 · 16 comments
Open

Feedback Dec 2024 - all sessions #58

kbjarkefur opened this issue Dec 3, 2024 · 16 comments

Comments

@kbjarkefur
Copy link
Contributor

kbjarkefur commented Dec 3, 2024

Let's use one issue for all sessions as we do not tend to have that much feedback anymore

@kbjarkefur
Copy link
Contributor Author

kbjarkefur commented Dec 3, 2024

We should in the beginning say something about Python not being built around dataset (like R and especially Stata) as it is a general programming language. So before we can talk about data sets we will have to cover some of the basics before it will make sense how one interact with a dataset in Python.

We can frame it as the biggest benefit of learning Python is related to the biggest challenge when coming from R and Stata is that Python is general purpose. This makes it very powerful, but one need to think slightly different about it when starting to use it.

@kbjarkefur
Copy link
Contributor Author

We should also say that since Python is open source and general, there are many contexts where we can run Python code. We should say that notebooks tend to work very similarly in Colab, Jupyter Notebooks and Databricks. But there are other ways to run code such as scripts. And we should stick to notebooks when answering to not make the intro to python more complex than it already is for someone coming from Stata or R

@weilu
Copy link
Member

weilu commented Dec 3, 2024

A participant was confused by isnumeric It's a neat string method, but indeed confusing for beginners. Consider replacing it with something basic like "hello".startswith("h")

@kbjarkefur
Copy link
Contributor Author

We should not include in dir() in this session. It is likely to take a long time before any of the participants will encountering a case where this is the required solution

@weilu
Copy link
Member

weilu commented Dec 3, 2024

Consider updating the negative index example to something more practically useful, like arr[-1] for indexing from the end of the list

@luisesanmartin
Copy link
Member

consider moving f string and .format() to the bonus materials. Check if they're used later in any session, if not they could be moved to allow enough time until dictionaries

@weilu
Copy link
Member

weilu commented Dec 4, 2024

Session 2: consider convert this string concat to use f-string instead
image

@luisesanmartin
Copy link
Member

Session 2: include examples of if conditions with parentheses to show it's possible to group them

@luisesanmartin
Copy link
Member

task 8 should go before because it's an easier exercise than task 7

@weilu
Copy link
Member

weilu commented Dec 4, 2024

A participant pointed out: Task 2 and Task 10 are practically the same?

@luisesanmartin
Copy link
Member

general comment: we might want to reconsider if we want to give people time to try exercises/tasks on their own. I noticed a slight dropout in participants connected and a few in the room left while we were waiting. This is a good method for the most engaged to participate, but I think some are just not going to try the exercises and will become uninterested. My sense is it produces a trade off between attendance and learning, we need to decide which we want to prioritize

@weilu
Copy link
Member

weilu commented Dec 11, 2024

Session 3:

"converting data types" section code should be better formated:

cars_monthly_converted = cars_monthly_reordered.astype(
    {
        'number': 'int',
        'year': 'int',
        'month': 'int'
    }
)
cars_monthly_converted.dtypes

Above crosstab: "Note the resulting DataFrame cars_monthly_pivoted has make as its row index, and year its column names." make & year should be formatted as code

crosstab: "The Same can be achieve with the pd.crosstab function, but with different arguments:" Same should be lowercased

"groupby": existing code results in a warning cars_monthly_nonzero.groupby(['make', 'year']).sum()[['number']].unstack() update it to cars_monthly_nonzero.groupby(['make', 'year'])[['number']].sum().unstack()

"exercise 3.3.6" step 4, instead of casefold, use lower

@luisesanmartin
Copy link
Member

Session 3: we should consider changing the name to only "Pandas" given that we're not covering NumPy anymore

@luisesanmartin
Copy link
Member

luisesanmartin commented Dec 11, 2024

session 3: Databricks was asking to install xlrd to load the Excel dataset for some participants. We should mention it in the notebook.
This is already in the beginning of the nb but apparently not everyone ran it or remembered it, it can be mentioned again for ex 1

@weilu
Copy link
Member

weilu commented Dec 11, 2024

Session 3: we should consider changing the name to only "Pandas" given that we're not covering NumPy anymore

There's still the python library part though...

@luisesanmartin
Copy link
Member

libraries and Pandas maybe then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants