forked from araastat/BIOF085
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03_python_vis.Rmd
709 lines (501 loc) · 24.4 KB
/
03_python_vis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
---
jupyter:
jupytext:
formats: ipynb,Rmd
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Data visualization using Python
## Introduction
Data visualization is a basic task in data exploration and understanding. Humans are mainly visual creatures, and data visualization provides an opportunity to enhance communication of the story within the data. Often we find that data and the data-generating process is complex, and a visual representation of the data and our innate ability at pattern recognition can help reveal the complexities in a cognitively accessible way.
### An example gallery
Data visualization has a long and storied history, from Florence Nightangle onwards. Dr. Nightangle was a pioneer in data visualization and developed the *rose plot* to represent causes of death in hospitals during the Crimean War.
![](graphs/rose.jpg)
John Snow, in 1854, famously visualized the cholera outbreak in London, which showed the geographic proximity of cholera prevalence with particular water wells.
![](graphs/snow_map.png)
In one of the more famous visualizations, considered by many to be an optimal use of display ink and space, Minard visualized Napoleon's disastrous campaign to Russia
![](graphs/map-full-size1.png)
In more recent times, an employee at Facebook visualized all connections between users across the world, which clearly showed geographical associations with particular countries and regions.
![](graphs/facebook-high-res-friendship-world-map-paul-butler.png)
### Why visualize data?
We often rely on numerical summaries to help understand and distinguish datasets. In 1973, Anscombe published an influential set of 4 datasets, each with two variables and with the means, variances and correlations being identical. When you graphed these data, the differences in the datasets were clearly visible. This set is popularly known as Anscombe's quartet.
![](graphs/anscombe.png)
A more recent experiment in data construction by Matejka and Fitzmaurice (2017) started with a representation of a dinosaur and created 10 more bivariate datasets which all shared the same univariate means and variances and the same pairwise correlations.
![](graphs/datasaurus.png)
These examples clarify the need for visualization to better understand relationships between variables.
Even when using statistical visualization techniques, one has to be careful. Not all visualizations can discriminate between statistical characteristics. This was also explored by Matejka and Fitzmaurice.
| Strip plot | Boxplot | Violin plot |
| :------------------: | :------------------: | :------------------: |
| ![](graphs/box1.png) | ![](graphs/box2.png) | ![](graphs/box3.png) |
### Conceptual ideas
#### Begin with the consumer in mind
+ You have a deep understanding of the data you're presenting
+ The person seeing the visualization **doesn't**
+ Develop simpler visualizations first that are easier to explain
#### Tell a story
+ Make sure the graphic is clear
+ Make sure the main point you want to make "pops"
#### A matter of perception
+ Color (including awareness of color deficiencies)
+ Shape
+ Fonts
#### Some principles
1. Data-ink ratio
2. No mental gymnastics
1. The graphic should be self-evident
2. Context should be clear
3. Is a graph the wrong choice?
4. Focus on the consumer
> See [my slides](http://araastat.com/BIOF439/lectures/01-DataViz.pdf) for some more opinionated ideas
## Plotting in Python
Let's take a very quick tour before we get into the weeds. We'll use the mtcars dataset as an exemplar dataset that we can import using `pandas`
```{python 03-python-vis-0, fig.show=TRUE}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```
```{python 03-python-vis-style1, eval = knitr::is_html_output()}
sns.set_context('paper')
sns.set_style('white', {'font.family':'Futura', 'text.color':'1'})
```
```{python 03-python-vis-style2, eval = knitr::is_latex_output()}
sns.set_context('paper')
sns.set_style('white', {'font.family':'Futura Medium'})
```
```{python 03-python-vis-1, fig.show=TRUE}
mtcars = pd.read_csv('data/mtcars.csv')
```
### Static plots
We will demonstrate plotting in what I'll call the `matplotlib` ecosystem. `matplotlib` is the venerable and powerful visualization package that was originally designed to emulate the Matlab plotting paradigm. It has since evolved and as become a bit more user-friendly. It is still quite granular and can facilitate a lot of custom plots once you become familiar with it. However, as a starting point, I think it's a bit much. We'll see a bit of what it can offer later.
We will consider two other options which are built on top of `matplotlib`, but are much more accessible. These are `pandas` and `seaborn`. The two packages have some different approaches, but both wrap `matplotlib` in higher-level code and decent choices so we don't need to get into the `matplotlib` trenches quite so much. We'll still call `matplotlib` in our code, since both these packages need it for some fine tuning. Both packages are also very much aligned to the `DataFrame` construct in `pandas`, so makes plotting a much more seamless experience.
```{python 03-python-vis-2}
mtcars.plot.scatter(x = 'hp', y = 'mpg');
plt.show()
# mtcars.plot(x = 'hp', y = 'mpg', kind = 'scatter');
```
```{python 03-python-vis-3}
sns.scatterplot(data = mtcars, x = 'hp', y = 'mpg');
plt.show()
```
There are of course some other choices based on your background and preferences. For static plots, there are a couple of emulators of the popular R package `ggplot2`. These are `plotnine` and `ggplot`. `plotnine` seems a bit more developed and uses the `ggplot2` semantics of aesthetics and layers, with almost identical code syntax.
> You can install `plotnine` using `conda`:
>
> ```shell
> conda install -c conda-forge plotnine
> ```
```{python 03-python-vis-4}
from plotnine import *
(ggplot(mtcars) +
aes(x = 'hp', y = 'mpg') +
geom_point())
```
### Dynamic or interactive plots
There are several Python packages that wrap around Javascript plotting libraries that are so popular in web-based graphics like D3 and Vega. Three that deserve mention are `plotly`, `bokeh`, and `altair`.
> If you actually want to experience the interactivity of the plots, please use the "Live notebooks" link in Canvas to run these notebooks. Otherwise, you can download the notebooks from the GitHub site and run them on your own computer.
`plotly` is a Python package developed by the company [Plot.ly](https://www.plotly.com) to interface with their interactive Javascript library either locally or via their web service. Plot.ly also develops an R package to interface with their products as well. It provides an intuitive syntax and ease of use, and is probably the more popular package for interactive graphics from both R and Python.
```{python 03-python-vis-5, results='hide', eval=FALSE}
import plotly.express as px
fig = px.scatter(mtcars, x = 'hp', y = 'mpg')
fig.show()
```
`bokeh` is an interactive visualization package developed by Anaconda. It is quite powerful, but its code can be rather verbose and granular
```{python 03-python-vis-6, results='hide', eval=FALSE, error=TRUE}
from bokeh.plotting import figure, output_file
from bokeh.io import output_notebook, show
output_notebook()
p = figure()
p.xaxis.axis_label = 'Horsepower'
p.yaxis.axis_label = 'Miles per gallon'
p.circle(mtcars['hp'], mtcars['mpg'], size=10);
show(p)
```
`altair` that leverages ideas from Javascript plotting libraries and a distinctive code syntax that may appeal to some
```{python 03-python-vis-7, results='hide', eval=FALSE}
import altair as alt
alt.Chart(mtcars).mark_point().encode(
x='hp',
y='mpg'
).interactive()
```
We won't focus on these dynamic packages in this workshop in the interests of time, but you can avail of several online resources for these.
| Package | Resources |
| ------- | ------------------------------------------------------------ |
| plotly | [Fundamentals](https://plotly.com/python/) |
| bokeh | [Tutorial](https://mybinder.org/v2/gh/bokeh/bokeh-notebooks/master?filepath=tutorial%2F00%20-%20Introduction%20and%20Setup.ipynb) |
| altair | [Overview](https://altair-viz.github.io/getting_started/overview.html) |
## Univariate plots
We will be introducing plotting and code from 3 modules: `matplotlib`, `seaborn` and `pandas`. As we go forth, you may ask the question, which one should I learn? Chris Moffitt has the following advice.
A pathway to learning ([Chris Moffit](https://pbpython.com/effective-matplotlib.html))
1. Learn the basic matplotlib terminology, specifically what is a `Figure` and an `Axes` .
2. Always use the object-oriented interface. Get in the habit of using it from the start of your analysis. (*not really getting into this, but basically don't use the Matlab form I'll show at the end, if you don't have to*)
3. Start your visualizations with basic pandas plotting.
4. Use seaborn for the more complex statistical visualizations.
5. Use matplotlib to customize the pandas or seaborn visualization.
### pandas
#### Histogram
```{python 03-python-vis-8}
mtcars.plot.hist(y = 'mpg');
plt.show()
# mtcars.plot(y = 'mpg', kind = 'hist')
#mtcars['mpg'].plot(kind = 'hist')
```
#### Bar plot
```{python 03-python-vis-9}
mtcars['cyl'].value_counts().plot.bar();
plt.show()
```
#### Density plot
```{python 03-python-vis-10}
mtcars['mpg'].plot( kind = 'density');
plt.show()
```
### seaborn
#### Histogram
```{python 03-python-vis-11}
ax = sns.distplot(mtcars['mpg'], kde=False);
plt.show()
```
#### Bar plot
```{python 03-python-vis-12}
sns.countplot(data = mtcars, x = 'cyl');
plt.show()
```
```{python 03-python-vis-13}
diamonds = pd.read_csv('data/diamonds.csv.gz')
ordered_colors = ['E','F','G','H','I','J']
sns.catplot(data = diamonds, x = 'color', kind = 'count', color = 'blue');
plt.show()
```
#### Density plot
```{python 03-python-vis-14}
sns.distplot(mtcars['mpg'], hist=False);
plt.show()
```
## Bivariate plots
### pandas
#### Scatter plot
```{python 03-python-vis-15}
diamonds = pd.read_csv('data/diamonds.csv.gz')
diamonds.plot(x = 'carat', y = 'price', kind = 'scatter');
plt.show()
```
#### Box plot
```{python 03-python-vis-16}
diamonds.boxplot(column = 'price', by = 'color');
plt.show()
```
### seaborn
#### Scatter plot
```{python 03-python-vis-17}
sns.regplot(data = diamonds, x = 'carat', y = 'price', fit_reg=False);
plt.show()
```
```{python}
sns.scatterplot(data=diamonds, x = 'carat', y = 'price', linewidth=0);
# We set the linewidth to 0, otherwise the lines around the circles
# appear white and wash out the figure. Try with any positive
# value of linewidth
```
#### Box plot
```{python 03-python-vis-18}
ordered_color = ['E','F','G','H','I','J']
sns.catplot(data = diamonds, x = 'color', y = 'price',
order = ordered_color, color = 'blue', kind = 'box');
plt.show()
```
#### Violin plot
```{python 03-python-vis-19}
g = sns.catplot(data = diamonds, x = 'color', y = 'price',
kind = 'violin', order = ordered_color);
plt.show()
```
#### Barplot (categorical vs continuous)
```{python 03-python-vis-20}
ordered_colors = ['D','E','F','G','H','I']
sns.barplot(data = diamonds, x = 'color', y = 'price', order = ordered_colors);
plt.show()
```
```{python 03-python-vis-21}
sns.barplot(data = diamonds, x = 'cut', y = 'price');
plt.show()
```
#### Joint plot
```{python 03-python-vis-22}
sns.jointplot(data = diamonds, x = 'carat', y = 'price');
plt.show()
```
```{python 03-python-vis-23}
sns.jointplot(data = diamonds, x = 'carat', y = 'price', kind = 'reg');
plt.show()
```
```{python 03-python-vis-24}
sns.jointplot(data = diamonds, x = 'carat', y = 'price', kind = 'hex');
plt.show()
```
## Facets and multivariate data
The basic idea in this section is to see how we can visualize more than two variables at a time. We will see two strategies:
1. Put multiple graphs on the same frame, with each graph referring to a level of a 3rd variable
1. Create a grid of separate graphs, with each graph referring to a level of a 3rd variable
This strategy also can work any time we need to visualize the data corresponding to different levels of a variable, say by gender or race or country.
In this example we're going to start with 4 time series, labelled A, B, C, D.
```{python 03-python-vis-25}
ts = pd.read_csv('data/ts.csv')
ts.dt = pd.to_datetime(ts.dt) # convert this column to a datetime object
ts.head()
```
For one strategy we will employ, it is actually a bit easier to change this to a wide data form, using `pivot`.
```{python 03-python-vis-26}
dfp = ts.pivot(index = 'dt', columns = 'kind', values = 'value')
dfp.head()
```
```{python 03-python-vis-27}
fig, ax = plt.subplots()
dfp.plot(ax=ax);
plt.show()
```
This creates 4 separate time series plots, one for each of the columns labeled A, B, C and D. The x-axis is determined by `dfp.index`, which during the pivoting operation, we deemed was the values of `dt` in the original data.
Using `seaborn`...
```{python 03-python-vis-28}
sns.lineplot(data = dfp);
plt.show()
```
However, we can achieve this same plot using the original data, and `seaborn`, in rather short order
```{python 03-python-vis-29}
sns.lineplot(data = ts, x = 'dt', y = 'value', hue = 'kind');
plt.show()
```
In this plot, assigning a variable to `hue` tells seaborn to draw lines (in this case) of different hues based on values of that variable.
We can use a bit more granular and explicit code for this as well. This allows us a bit more control of the plot.
```{python 03-python-vis-30}
g = sns.FacetGrid(ts, hue = 'kind', height = 5, aspect = 1.5)
g.map(plt.plot, 'dt', 'value').add_legend()
g.ax.set(xlabel = 'Date',
ylabel = 'Value',
title = 'Time series');
plt.show()
## All of this code chunk needs to be run at one time, otherwise you get weird errors. This
## is true for many plotting commands which are composed of multiple commands.
```
The `FacetGrid` tells `seaborn` that we're going to layer graphs, with layers based on `hue` and the hues being determined by values of `kind`. Notice that we can add a few more details like the aspect ratio of the plot and so on. The documentation for [FacetGrid](https://seaborn.pydata.org/generated/seaborn.FacetGrid.html), which we will also use for facets below, may be helpful in finding all the options you can control.
We can also show more than one kind of layer on a single graph
```{python 03-python-vis-31}
fmri = sns.load_dataset('fmri')
```
```{python 03-python-vis-32}
plt.style.use('seaborn-notebook')
sns.relplot(x = 'timepoint', y = 'signal', data = fmri);
plt.show()
```
```{python 03-python-vis-33}
sns.relplot(x = 'timepoint', y = 'signal', data = fmri, kind = 'line');
plt.show()
```
```{python 03-python-vis-34}
sns.relplot(x = 'timepoint', y = 'signal', data = fmri, kind = 'line', hue ='event');
plt.show()
```
```{python 03-python-vis-35}
sns.relplot(x = 'timepoint', y = 'signal', data = fmri, hue = 'region',
style = 'event', kind = 'line');
plt.show()
```
Here we use color to show the region, and line style (solid vs dashed) to show the event.
#### Scatter plots by group
```{python 03-python-vis-36}
g = sns.FacetGrid(diamonds, hue = 'color', height = 7.5)
g.map(plt.scatter, 'carat', 'price').add_legend();
plt.show()
```
Notice that this arranges the colors and values for the `color` variable in random order. If we have a preferred order we can impose that using the option `hue_order`.
```{python 03-python-vis-37}
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]
sns.scatterplot(x="carat", y="price",
hue="clarity", size="depth",
hue_order=clarity_ranking,
sizes=(1, 8), linewidth=0,
data=diamonds);
plt.show()
```
### Facets
Facets or trellis graphics is a visualization method where we draw multiple plots in a grid, with each plot corresponding to unique values of a particular variable or combinations of variables. This has also been called *small multiples*.
We'll proceed with an example using the `iris` dataset.
```{python 03-python-vis-38}
iris = pd.read_csv('data/iris.csv')
iris.head()
```
```{python 03-python-vis-39}
g = sns.FacetGrid(iris, col = 'species', hue = 'species', height = 5)
g.map(plt.scatter, 'sepal_width', 'sepal_length').add_legend();
plt.show()
```
Here we use `FacetGrid` to indicate that we're creating multiple subplots by specifying the option `col` (for column). So this code says we are going to create one plot per level of species, arranged as separate columns (or in effect along one row). You could also specify `row` which would arrange the plots one to a row, or, in effect, in one column.
The `map` function says, take the facets I've defined and stored in `g`, and in each one, plot a scatter plot with `sepal_width` on the x-axis and `sepal_length` on the y-axis.
We could also use `relplot` for a more compact solution.
```{python 03-python-vis-40}
sns.relplot(x = 'sepal_width', y = 'sepal_length', data = iris,
col = 'species', hue = 'species');
plt.show()
```
A bit more of a complicated example, using the `fmri` data, where we're coloring lines based on the subject, and creating a 2-d grid, where region of the brain in along columns and event type is along rows.
```{python 03-python-vis-41}
sns.relplot(x="timepoint", y="signal", hue="subject",
col="region", row="event", height=3,
kind="line", estimator=None, data=fmri);
plt.show()
```
In the following example, we want to show how each subject fares for each of the two events, just within the frontal region. We let `seaborn` figure out the layout, only specifying that we'll be going along rows ("by column") and also saying we'll wrap around to the beginning once we've got to 5 columns. Note we use the `query` function to filter the dataset.
```{python 03-python-vis-42}
sns.relplot(x="timepoint", y="signal", hue="event", style="event",
col="subject", col_wrap=5,
height=3, aspect=.75, linewidth=2.5,
kind="line", data=fmri.query("region == 'frontal'"));
plt.show()
```
In the following example we want to compare the distribution of price from the diamonds dataset by color, and so it makes sense to create density plots of the price distribution and stack them one below the next so we can visually compare them.
```{python 03-python-vis-43}
ordered_colors = ['E','F','G','H','I','J']
g = sns.FacetGrid(data = diamonds, row = 'color', height = 1.7,
aspect = 4, row_order = ordered_colors)
g.map(sns.kdeplot, 'price');
plt.show()
```
You need to use `FacetGrid` to create sets of univariate plots since there is no particular method that allows univariate plots over a grid like `relplot` for bivariate plots.
### Pairs plots
The pairs plot is a quick way to compare every pair of variables in a dataset (or at least, every pair of continuous variables) in a grid. You can specify what kind of univariate plot will be displayed on the diagonal locations on the grid, and which bivariate plots will be displayed on the off-diagonal locations.
```{python 03-python-vis-44}
sns.pairplot(data=iris);
plt.show()
```
You can achieve more customization using `PairGrid`.
```{python 03-python-vis-45}
g = sns.PairGrid(iris, diag_sharey=False);
g.map_upper(sns.scatterplot);
g.map_lower(sns.kdeplot, colors="C0");
g.map_diag(sns.kdeplot, lw=2);
plt.show()
```
## Customizing the look
### Themes
There are several [themes](https://matplotlib.org/3.2.1/gallery/style_sheets/style_sheets_reference.html) available in the modern `matplotlib`, some of which borrow from `seaborn`. You can see the available themes and play around.
```{python 03-python-vis-46}
plt.style.available
```
See some examples below.
```{python 03-python-vis-47}
plt.style.use('fivethirtyeight')
sns.scatterplot(data = iris, x = 'sepal_width', y = 'sepal_length');
plt.show()
```
```{python 03-python-vis-48}
plt.style.use('bmh')
sns.scatterplot(data = iris, x = 'sepal_width', y = 'sepal_length');
plt.show()
```
```{python 03-python-vis-49}
plt.style.use('classic')
sns.scatterplot(data = iris, x = 'sepal_width', y = 'sepal_length');
plt.show()
```
```{python 03-python-vis-50}
plt.style.use('ggplot')
sns.scatterplot(data = iris, x = 'sepal_width', y = 'sepal_length');
plt.show()
```
```{python 03-python-vis-51}
plt.style.use('Solarize_Light2')
sns.scatterplot(data = iris, x = 'sepal_width', y = 'sepal_length');
plt.show()
```
> One small syntax point. You may have noticed in your own work that you get a little annoying line in the output when you plot. You can prevent that from happening by putting a semi-colon (`;`) after the last plotting command
## Finer control with matplotlib
![https://matplotlib.org/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py](graphs/matplotlib-anatomy.png)
As you can see from the figure, you can control each aspect of the plot displayed above using `matplotlib`. I won't go into the details, and will leave it to you to look at the `matplotlib` [documentation](https://matplotlib.org/contents.html) and [examples](https://matplotlib.org/gallery/index.html) if you need to customize at this level of granularity.
The following is an example using pure `matplotlib`. You can see how you can build up a plot. The crucial part here is that you need to run the code from each chunk together.
```{python 03-python-vis-52}
from matplotlib.ticker import FuncFormatter
data = {'Barton LLC': 109438.50,
'Frami, Hills and Schmidt': 103569.59,
'Fritsch, Russel and Anderson': 112214.71,
'Jerde-Hilpert': 112591.43,
'Keeling LLC': 100934.30,
'Koepp Ltd': 103660.54,
'Kulas Inc': 137351.96,
'Trantow-Barrows': 123381.38,
'White-Trantow': 135841.99,
'Will LLC': 104437.60}
group_data = list(data.values())
group_names = list(data.keys())
group_mean = np.mean(group_data)
```
```{python 03-python-vis-53}
plt.style.use('default')
fig, ax = plt.subplots()
plt.show()
```
```{python 03-python-vis-54}
fig, ax = plt.subplots()
ax.barh(group_names, group_data);
plt.show()
```
```{python 03-python-vis-55}
fig, ax = plt.subplots()
ax.barh(group_names, group_data)
ax.set(xlim = [-10000, 140000], xlabel = 'Total Revenue', ylabel = 'Company',
title = 'Company Revenue');
plt.show()
```
```{python 03-python-vis-56}
fig, ax = plt.subplots(figsize=(8, 4))
ax.barh(group_names, group_data)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45, horizontalalignment='right')
ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company',
title='Company Revenue');
plt.show()
```
After you have created your figure, you do need to save it to disk so that you can use it in your Word or Markdown or PowerPoint document. You can see the formats available.
```{python 03-python-vis-57}
fig.canvas.get_supported_filetypes()
```
The type will be determined by the ending of the file name. You can add some options depending on the type. I'm showing an example of saving the figure to a PNG file. Typically I'll save figures to a vector graphics format like PDF, and then convert into other formats, since that results in minimal resolution loss. You of course have the option to save to your favorite format.
```{python 03-python-vis-58}
# fig.savefig('sales.png', dpi = 300, bbox_inches = 'tight')
```
### Matlab-like plotting
`matplotlib` was originally developed to emulate Matlab. Though this kind of syntax is no longer recommended, it is still available and may be of use to those coming to Python from Matlab or Octave.
```{python 03-python-vis-59}
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4]);
plt.ylabel('some numbers');
plt.show()
```
```{python 03-python-vis-60}
import numpy as np
# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^');
plt.show()
```
```{python 03-python-vis-61}
def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
plt.figure();
plt.subplot(211);
plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k');
plt.subplot(212);
plt.plot(t2, np.cos(2*np.pi*t2), 'r--');
plt.show()
```
## Resources
A really nice online resource for learning data visualization in Python is the [Python Graph Gallery](https://python-graph-gallery.com/). This site has many examples of different kinds of plots using `pandas`, `seaborn` and `matplotlib`