Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give examples using easier libraries such as rows #15

Open
turicas opened this issue Jul 19, 2016 · 2 comments
Open

Give examples using easier libraries such as rows #15

turicas opened this issue Jul 19, 2016 · 2 comments

Comments

@turicas
Copy link

turicas commented Jul 19, 2016

Hello,

I'm working on a library which makes the use of tabular data pretty easy, no matter the format: CSV, XLS, XLSX, HTML etc. It's called rows.
I think it would be great to add a section with examples using this kind of library, since the learner can access data with simple commands and don't need to understand about the format upfront.

An example: reading the CSV file from coding-for-journalists/2_web_scrape/completed/fun_with_csv_done.py with rows is as easy as:

import rows
for row in rows.import_from_csv('my_test.csv'):
    print row.FIRSTNAME, row.CITY

If the same was only available on XLS, you could use this code:

import rows
for row in rows.import_from_xls('my_test.xls'):
    print row.FIRSTNAME, row.CITY

So the interface is the same, no matter the format. I think it helps who is learning the basics -- then, they can dig deeper and learn more about each specific format.

Note: rows will automatically identify and convert the data (in this case there are just strings, but it will convert automatically to int, float, datetime.date, datetime.datetime, among other types if it detects there is information of this kind inside the file -- and this is true for all formats available), so you don't need to explain data conversion upfront but can actually show some examples of converted data being analyzed which is very motivational.

@richardsalex
Copy link
Contributor

Thanks for the suggestion; it's something to definitely think about going
forward as this evolves.
On Tue, Jul 19, 2016 at 7:52 AM Álvaro Justen [email protected]
wrote:

Hello,

I'm working on a library which makes the use of tabular data pretty easy,
no matter the format: CSV, XLS, XLSX, HTML etc. It's called rows
https://github.com/turicas/rows.
I think it would be great to add a section with examples using this kind
of library, since the learner can access data with simple commands and
don't need to understand about the format upfront.

An example: reading the CSV file from
coding-for-journalists/2_web_scrape/completed/fun_with_csv_done.py with
rows is as easy as:

import rowsfor row in rows.import_from_csv('my_test.csv'):
print row.FIRSTNAME, row.CITY

If the same was only available on XLS, you could use this code:

import rowsfor row in rows.import_from_xls('my_test.xls'):
print row.FIRSTNAME, row.CITY

So the interface is the same, no matter the format. I think it helps who
is learning the basics -- then, they can dig deeper and learn more about
each specific format.

Note: rows will automatically identify and convert the data (in this case
there are just strings, but it will convert automatically to int, float,
datetime.date, datetime.datetime, among other types if it detects there
is information of this kind inside the file -- and this is true for all
formats available), so you don't need to explain data conversion upfront
but can actually show some examples of converted data being analyzed which
is very motivational.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#15, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACYRidui7-ce4UPkBRSDWn35Q2kZia8gks5qXOSogaJpZM4JPz9A
.

@dannguyen
Copy link

Adding my input for what I teach journalism students in a quarter-long course: I like to keep things as "plain" as possible. Stick to builtins when possible, e.g. csv, and work with the most common Python data structures, e.g. a list of dict objects returned from csv.DictReader().

The rows library looks cool, like other nifty data-wrangling wrappers such as pandas, agate, pudo/dataset etc, but novice programmers don't need easier ways to access attributes of a row object. They need affirmation of lists vs. dicts, str vs int, etc. More importantly, they need to know that data is text. And this requires fundamentally understanding what CSV purports to be, and how this relates to the actual realities of computing: computers don't have "intelligence", they need formats to be able to turn text into data structures, and there is a huge difference between a giant string, and that string deserialized as list/dicts.

Getting to that concept, and understanding for-loops and iteration, is all I ask of my students for my class. We don't get into making web apps, understanding OOP, doing data analysis or statistics, visualization, etc. -- it's all understanding text, patterns, and loops.

This is not just informed from how I've seen programming-learners struggle, but top-flight award-winning investigative journalists not have a clue about that the CSV they open up in Excel is just text. This misunderstanding of something fundamental is not just a non-trivial thing, but it leads to measurable problems when it comes to using that data for investigations.

So, this is all a long way of just saying, text, str, dict, list is just fine for students, IMHO :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants