From 95890f04a7355383b6cc9ff40a3160542ca4e5e4 Mon Sep 17 00:00:00 2001 From: Dave Gerrard Date: Fri, 27 Jul 2018 09:34:01 +0100 Subject: [PATCH] Adds R Notebook about Authors data for Issue #42 --- altmetric_data_analysis/Notebook_Authors.Rmd | 196 +++++++ .../Notebook_Authors.nb.html | 496 ++++++++++++++++++ altmetric_data_analysis/Notebook_Mentions.Rmd | 110 ++++ .../Notebook_Mentions.nb.html | 345 ++++++++++++ altmetric_data_analysis/author_summaries.R | 12 + 5 files changed, 1159 insertions(+) create mode 100644 altmetric_data_analysis/Notebook_Authors.Rmd create mode 100644 altmetric_data_analysis/Notebook_Authors.nb.html create mode 100644 altmetric_data_analysis/Notebook_Mentions.Rmd create mode 100644 altmetric_data_analysis/Notebook_Mentions.nb.html diff --git a/altmetric_data_analysis/Notebook_Authors.Rmd b/altmetric_data_analysis/Notebook_Authors.Rmd new file mode 100644 index 0000000..b6eac36 --- /dev/null +++ b/altmetric_data_analysis/Notebook_Authors.Rmd @@ -0,0 +1,196 @@ +--- +title: "Altmetric Authors" +date: 2018-07-27 +output: + html_notebook: default + github_document: default +--- + + +```{r setup, include = FALSE} +library(tidyverse) + +authors <- read_csv('../files_out/20180411_1510_authors.csv') +mentions <- read_csv('../files_out/20180411_1510_mentions.csv') +articles <- read_csv('../files_out/20180411_1510_master.csv') + +articles_with_mentions <- left_join(articles, mentions) +mentions_with_authors <- left_join(mentions, authors) +articles_with_mentions_and_authors <- left_join(articles_with_mentions, authors) + +``` + +A dataset of all the Authors that have ever Mentioned any of the Articles in the set. **Note:** this is Altmetric's definition of the word Author, which means: someone who Mentioned an Article somewhere (they may also use the term 'poster' of a 'post' too, but they're called Authors in the JSON from the API). This definition is distinct from the more academic definition, i.e.: 'author of the article itself': the [Dimensions](https://app.dimensions.ai/discover/publication) database might be a better source for finding out more about that type of author. + +Join Authors onto the Mentions dataset and you can see which authors discussed the set the most: + +```{r Most prolific Authors related to the set of Articles} + +arrange( + summarise( + group_by( + filter( + mentions_with_authors, + !is.na(author_name)), + author_name), + total_mentions = n(), + ), + desc(total_mentions) +) + +``` + +Join authors onto Mentions and the master Articles file and you can query how many times the Authors Mentioned a specific journal: + +```{r Most prolific authors about a specific journal} + +arrange( + summarise( + group_by( + filter(articles_with_mentions_and_authors, + journal_title == "Journal of Cancer Policy", + !is.na(author_name)), + author_name + ), + total_mentions = n() + ), + desc(total_mentions) +) + +``` + + +## How authors are managed + +The authors dataset is generated by the Altmetric Client using logic based upon how Altmetric themselves handle data about authors. Unsurprisingly, this logic varies according to the source of the Mention the author wrote. For instance, Twitter Authors are uniquely identified using the 'author_id_on_source'field, while blogging Authors are identified using the blog's web address in the author_url field. + +The Altmetric Client therefore parses each Mention, and then uses the source of the Mention to decide which field uniquely identifies the Author, using rules in the [AuthorManager Python class](https://github.com/CamLib/AltmetricClient/blob/master/altmetric_client/author_manager.py). Authors are then added to the Authors dataset with an id, generated by the Altmetric Client, that is unique to the dataset overall. This unique id is then posted back into the Mentions set, enabling Mentions to be joined back to their Authors. This in turn enables Authors that have posted multiple Mentions to be recorded more easily; essentially all of Altmetric's logic regarding 'which field identifies an Author for a given source' is handled as simply as possible in the AuthorManager. Hence this logic doesn't have to be added to the R used to analyse the dataset. + +There is, however, no way of identifying the same author mentioning an Article across various sources (e.g. a journalist mentioning an Article in a news story, and then tweeting about it). This is pretty much par for the course with social media data, though, of course - there are plenty of people on platforms such as Twitter who aren't who they say they are, so you can never really be *sure* if the id from one platform corresponds to the id from another. + +## Authors Data Dictionary + +The Author's dataset contains the following fields: + +### author_description + +**Data type: character** + +The author_description field is the description the Author has provided about themselves. In the test dataset at least, these only seem to be provided when the source is twitter or blogs. + +```{r Filters those Authors that have a description} + +select( + filter(authors, + !is.na(author_description)), + author_name, + author_source, + author_description +) + +``` + +These descriptions might actually provide a decent set of free text for Natural Language Processing, however. (My PhD research indicates that 'Twitter Biography' data potentially contains more useful information than tweets). + +### author_follower_count + +**Data type: int** + +Altmetric record the number of followers each Author has, but only from Twitter and Reddit. This clearly has some potential for assessing the reach of a specific article or journal on either of those platforms (see **Notebook_Mentions.Rmd** for the code to do that). + +```{r Totals of all followers by source} + +summarise( + group_by( + authors, + author_source + ), + total_of_all_followers = sum(author_follower_count) +) + +``` + + + +### author_id + +**Data type: character** + +An id generated by the Altmetric Client itself, which is unique to each analytical dataset, and is used to join each Author to the mentions they have written. The author_id is unique by author **and source** - the same actual Author cannot be identified across multiple sources. (See *How authors are managed* above). + +### author_id_on_source + +**Data type: character** + +The information Altmetric have retrieved about the Author's id *on the platform from which their Mentions have been found*. For instance, my Twitter author_id_on_source (for my largely unused Twitter account) would be EpiphanyLboro. This is the most-commonly-used unique identifier for Authors in sources, but isn't the only field that can be used as an id. Some sources - e.g. blogs or pages about specific journalists in newspapers - potentially don't have a specific Author id. + +### author_image_url + +**Data type: character** + +This is a very odd field that mostly seems to take data from Altmetric's own Amazon cloud web service. However, it does seem to uniquely identify policy 'Authors' - indeed the code chunk below: + +```{r Authors of multiple policy documents} + +arrange( + summarise( + group_by( + filter(mentions_with_authors, source == "policy"), + author_image_url + ), + total = n() + ), + desc(total) +) + +``` + +... outputs a list of policy 'Authors' that are in fact [thumbnails of the front covers of policy documents](https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-a9fd0dcbe86e77e6106c316a34a90235266d3c52b58b94abddb55f9f81c5ffd4.jpg) stored on Altmetric's server, no doubt for display on their own website. From this we can surmise that the policy's 'Author' is in fact the **policy document** itself. + +### author_name + +**Data type: character** + +The given name of the author, if known. Quite often Authors fail to name themselves, and it seems as if finding the names of the Authors of policy Mentions in particular might be problematic, as shown below. + +```{r Missing Author names by Mention source} + +ggplot(data = filter(authors, is.na(author_name))) + + geom_bar(mapping = aes(x = author_source)) + + coord_flip() + + +``` + +### author_source + +**Data type: character** + +The name of the source from which this Author's Mentions were found. This information can also be found in the Mentions set so it's a bit redundant here, though it means you can look for which sources all the Authors come from without having to join to Mentions and group by author: + +```{r Sources Authors come from} + +ggplot(data = authors) + + geom_bar(mapping = aes(x = author_source)) + + coord_flip() + +``` + +### author_url + +**Data type: character** + +Any URL that Altmetric have seen fit to attach to an author. These are used by Altmetric to identify blog, news, and Wikipedia Authors, e.g.: + +```{r List of Wikipedia Author URLs} + +select( + filter(authors, author_source == "wikipedia"), + author_name, + author_url +) + +``` + + + diff --git a/altmetric_data_analysis/Notebook_Authors.nb.html b/altmetric_data_analysis/Notebook_Authors.nb.html new file mode 100644 index 0000000..ac1a61a --- /dev/null +++ b/altmetric_data_analysis/Notebook_Authors.nb.html @@ -0,0 +1,496 @@ + + + + + + + + + + + + + + +Altmetric Authors + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +

A dataset of all the Authors that have ever Mentioned any of the Articles in the set. Note: this is Altmetric’s definition of the word Author, which means: someone who Mentioned an Article somewhere (they may also use the term ‘poster’ of a ‘post’ too, but they’re called Authors in the JSON from the API). This definition is distinct from the more academic definition, i.e.: ‘author of the article itself’: the Dimensions database might be a better source for finding out more about that type of author.

+

Join Authors onto the Mentions dataset and you can see which authors discussed the set the most:

+ + + +
arrange(
+  summarise(
+      group_by(
+        filter(
+          mentions_with_authors, 
+          !is.na(author_name)), 
+        author_name),
+    total_mentions = n(),
+  ),
+  desc(total_mentions)
+)
+ + +
# A tibble: 9,653 x 2
+   author_name      total_mentions
+   <chr>                     <int>
+ 1 Gill Livingston              52
+ 2 HIV Insight                  27
+ 3 Yahoo! News                  26
+ 4 Health Medicinet             25
+ 5 ToOur Shores                 22
+ 6 emocha                       20
+ 7 Health Affairs               20
+ 8 RinGs                        20
+ 9 Health Literacy              19
+10 IFS                          19
+# ... with 9,643 more rows
+ + + +

Join authors onto Mentions and the master Articles file and you can query how many times the Authors Mentioned a specific journal:

+ + + +
arrange(
+  summarise(
+    group_by(
+      filter(articles_with_mentions_and_authors, 
+             journal_title == "Journal of Cancer Policy", 
+             !is.na(author_name)),
+      author_name
+    ),
+    total_mentions = n()    
+  ),
+  desc(total_mentions)  
+)
+ + +
# A tibble: 97 x 2
+   author_name              total_mentions
+   <chr>                             <int>
+ 1 Health Medicinet                      4
+ 2 MedicalXpress                         4
+ 3 health enews                          2
+ 4 New York Post                         2
+ 5 Stat News                             2
+ 6 815wrldtrvlr                          1
+ 7 About Kidneys                         1
+ 8 Alexis Dix                            1
+ 9 Alok Khorana                          1
+10 "Ambrosio \"AJ\" Perera"              1
+# ... with 87 more rows
+ + + +
+

How authors are managed

+

The authors dataset is generated by the Altmetric Client using logic based upon how Altmetric themselves handle data about authors. Unsurprisingly, this logic varies according to the source of the Mention the author wrote. For instance, Twitter Authors are uniquely identified using the ’author_id_on_source’field, while blogging Authors are identified using the blog’s web address in the author_url field.

+

The Altmetric Client therefore parses each Mention, and then uses the source of the Mention to decide which field uniquely identifies the Author, using rules in the AuthorManager Python class. Authors are then added to the Authors dataset with an id, generated by the Altmetric Client, that is unique to the dataset overall. This unique id is then posted back into the Mentions set, enabling Mentions to be joined back to their Authors. This in turn enables Authors that have posted multiple Mentions to be recorded more easily; essentially all of Altmetric’s logic regarding ‘which field identifies an Author for a given source’ is handled as simply as possible in the AuthorManager. Hence this logic doesn’t have to be added to the R used to analyse the dataset.

+

There is, however, no way of identifying the same author mentioning an Article across various sources (e.g. a journalist mentioning an Article in a news story, and then tweeting about it). This is pretty much par for the course with social media data, though, of course - there are plenty of people on platforms such as Twitter who aren’t who they say they are, so you can never really be sure if the id from one platform corresponds to the id from another.

+
+
+

Authors Data Dictionary

+

The Author’s dataset contains the following fields:

+
+

author_description

+

Data type: character

+

The author_description field is the description the Author has provided about themselves. In the test dataset at least, these only seem to be provided when the source is twitter or blogs.

+ + + +
select(
+  filter(authors,
+         !is.na(author_description)),
+  author_name,
+  author_source,
+  author_description
+)
+ + +
# A tibble: 8,018 x 3
+   author_name      author_source author_description                               
+   <chr>            <chr>         <chr>                                            
+ 1 jesus alfaro     twitter       Catedrático de Derecho Mercantil.                
+ 2 Rolf Groeneveld  twitter       Natural resource economist Wageningen University…
+ 3 Milan Vaishnav   twitter       @CarnegieEndow Sr Fellow + Director, @CarnegieSA…
+ 4 Nasra Roy        twitter       Crazy interest in food & cooking, intrigued by p…
+ 5 Research Ethics… twitter       #economicphilosophy, #researchethics, #researchi…
+ 6 Stefan Thewissen twitter       Research fellow @ODIdev, interested in inclusive…
+ 7 Michal Brzezins… twitter       economist @EconUW; research on inequality, pover…
+ 8 Max Roser        twitter       Researcher @UniOfOxford – Follow me for data vis…
+ 9 PG Kroeger       twitter       historicus, meedenker, penvoerder, dagvoorzitter…
+10 Eric D. Munoz    twitter       Educator, political historian, and human. Podcas…
+# ... with 8,008 more rows
+ + + +

These descriptions might actually provide a decent set of free text for Natural Language Processing, however. (My PhD research indicates that ‘Twitter Biography’ data potentially contains more useful information than tweets).

+
+
+

author_follower_count

+

Data type: int

+

Altmetric record the number of followers each Author has, but only from Twitter and Reddit. This clearly has some potential for assessing the reach of a specific article or journal on either of those platforms (see Notebook_Mentions.Rmd for the code to do that).

+ + + +
summarise(
+  group_by(
+    authors,
+    author_source
+  ),
+  total_of_all_followers = sum(author_follower_count)
+)
+ + +
# A tibble: 9 x 2
+  author_source total_of_all_followers
+  <chr>                          <int>
+1 blogs                              0
+2 facebook                           0
+3 googleplus                         0
+4 news                               0
+5 policy                             0
+6 reddit                      18481686
+7 twitter                     30453697
+8 video                              0
+9 wikipedia                          0
+ + + +
+
+

author_id

+

Data type: character

+

An id generated by the Altmetric Client itself, which is unique to each analytical dataset, and is used to join each Author to the mentions they have written. The author_id is unique by author and source - the same actual Author cannot be identified across multiple sources. (See How authors are managed above).

+
+
+

author_id_on_source

+

Data type: character

+

The information Altmetric have retrieved about the Author’s id on the platform from which their Mentions have been found. For instance, my Twitter author_id_on_source (for my largely unused Twitter account) would be EpiphanyLboro. This is the most-commonly-used unique identifier for Authors in sources, but isn’t the only field that can be used as an id. Some sources - e.g. blogs or pages about specific journalists in newspapers - potentially don’t have a specific Author id.

+
+
+

author_image_url

+

Data type: character

+

This is a very odd field that mostly seems to take data from Altmetric’s own Amazon cloud web service. However, it does seem to uniquely identify policy ‘Authors’ - indeed the code chunk below:

+ + + +
arrange(
+  summarise(
+    group_by(
+      filter(mentions_with_authors, source == "policy"),
+      author_image_url
+    ),
+    total = n()
+  ),
+  desc(total)
+)
+ + +
# A tibble: 393 x 2
+   author_image_url                                                           total
+   <chr>                                                                      <int>
+ 1 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…    11
+ 2 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     9
+ 3 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     7
+ 4 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     6
+ 5 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     6
+ 6 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     3
+ 7 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     3
+ 8 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     3
+ 9 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     3
+10 https://s3.amazonaws.com/cache.altmetric.com/policy/thumbnails/thumbnail-…     2
+# ... with 383 more rows
+ + + +

… outputs a list of policy ‘Authors’ that are in fact thumbnails of the front covers of policy documents stored on Altmetric’s server, no doubt for display on their own website. From this we can surmise that the policy’s ‘Author’ is in fact the policy document itself.

+
+
+

author_name

+

Data type: character

+

The given name of the author, if known. Quite often Authors fail to name themselves, and it seems as if finding the names of the Authors of policy Mentions in particular might be problematic, as shown below.

+ + + +
ggplot(data = filter(authors, is.na(author_name))) +
+  geom_bar(mapping = aes(x = author_source)) +
+  coord_flip()
+ + +

+ + + +
+
+

author_source

+

Data type: character

+

The name of the source from which this Author’s Mentions were found. This information can also be found in the Mentions set so it’s a bit redundant here, though it means you can look for which sources all the Authors come from without having to join to Mentions and group by author:

+ + + +
ggplot(data = authors) +
+  geom_bar(mapping = aes(x = author_source)) +
+  coord_flip()
+ + +

+ + + +
+
+

author_url

+

Data type: character

+

Any URL that Altmetric have seen fit to attach to an author. These are used by Altmetric to identify blog, news, and Wikipedia Authors, e.g.:

+ + + +
select(
+  filter(authors, author_source == "wikipedia"),
+  author_name,
+  author_url
+)
+ + +
# A tibble: 20 x 2
+   author_name      author_url                                                      
+   <chr>            <chr>                                                           
+ 1 BullRangifer     http://en.wikipedia.org/wiki/User:BullRangifer                  
+ 2 Bobamnertiopsis  http://en.wikipedia.org/wiki/User:Bobamnertiopsis               
+ 3 Rjwilmsi         http://en.wikipedia.org/wiki/User:Rjwilmsi                      
+ 4 Imanrahul        http://en.wikipedia.org/wiki/User:Imanrahul                     
+ 5 Jeremy112233     http://en.wikipedia.org/wiki/User:Jeremy112233                  
+ 6 Chris Capoccia   http://en.wikipedia.org/wiki/User:Chris Capoccia                
+ 7 12.229.155.21    http://en.wikipedia.org/wiki/Special:Contributions/12.229.155.21
+ 8 Snooganssnoogans http://en.wikipedia.org/wiki/User:Snooganssnoogans              
+ 9 93.44.108.184    http://en.wikipedia.org/wiki/Special:Contributions/93.44.108.184
+10 Doc James        http://en.wikipedia.org/wiki/User:Doc James                     
+11 David.Throop     http://en.wikipedia.org/wiki/User:David.Throop                  
+12 Thumperward      http://en.wikipedia.org/wiki/User:Thumperward                   
+13 RDBrown          http://en.wikipedia.org/wiki/User:RDBrown                       
+14 7mike5000        http://en.wikipedia.org/wiki/User:7mike5000                     
+15 Cannondale0702   http://en.wikipedia.org/wiki/User:Cannondale0702                
+16 Rathfelder       http://en.wikipedia.org/wiki/User:Rathfelder                    
+17 IPFeditor        http://en.wikipedia.org/wiki/User:IPFeditor                     
+18 Basie            http://en.wikipedia.org/wiki/User:Basie                         
+19 AlexanderSilver  http://en.wikipedia.org/wiki/User:AlexanderSilver               
+20 Lsj              http://sv.wikipedia.org/wiki/User:Lsj                           
+ + + + +
+
+ +
LS0tCnRpdGxlOiAiQWx0bWV0cmljIEF1dGhvcnMiCmRhdGU6IDIwMTgtMDctMjcKb3V0cHV0OiAKICBodG1sX25vdGVib29rOiBkZWZhdWx0CiAgZ2l0aHViX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKCmBgYHtyIHNldHVwLCBpbmNsdWRlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQoKYXV0aG9ycyA8LSByZWFkX2NzdignLi4vZmlsZXNfb3V0LzIwMTgwNDExXzE1MTBfYXV0aG9ycy5jc3YnKQptZW50aW9ucyA8LSByZWFkX2NzdignLi4vZmlsZXNfb3V0LzIwMTgwNDExXzE1MTBfbWVudGlvbnMuY3N2JykKYXJ0aWNsZXMgPC0gcmVhZF9jc3YoJy4uL2ZpbGVzX291dC8yMDE4MDQxMV8xNTEwX21hc3Rlci5jc3YnKQoKYXJ0aWNsZXNfd2l0aF9tZW50aW9ucyA8LSBsZWZ0X2pvaW4oYXJ0aWNsZXMsIG1lbnRpb25zKQptZW50aW9uc193aXRoX2F1dGhvcnMgPC0gbGVmdF9qb2luKG1lbnRpb25zLCBhdXRob3JzKQphcnRpY2xlc193aXRoX21lbnRpb25zX2FuZF9hdXRob3JzIDwtIGxlZnRfam9pbihhcnRpY2xlc193aXRoX21lbnRpb25zLCBhdXRob3JzKQoKYGBgCgpBIGRhdGFzZXQgb2YgYWxsIHRoZSBBdXRob3JzIHRoYXQgaGF2ZSBldmVyIE1lbnRpb25lZCBhbnkgb2YgdGhlIEFydGljbGVzIGluIHRoZSBzZXQuICoqTm90ZToqKiB0aGlzIGlzIEFsdG1ldHJpYydzIGRlZmluaXRpb24gb2YgdGhlIHdvcmQgQXV0aG9yLCB3aGljaCBtZWFuczogc29tZW9uZSB3aG8gTWVudGlvbmVkIGFuIEFydGljbGUgc29tZXdoZXJlICh0aGV5IG1heSBhbHNvIHVzZSB0aGUgdGVybSAncG9zdGVyJyBvZiBhICdwb3N0JyB0b28sIGJ1dCB0aGV5J3JlIGNhbGxlZCBBdXRob3JzIGluIHRoZSBKU09OIGZyb20gdGhlIEFQSSkuIFRoaXMgZGVmaW5pdGlvbiBpcyBkaXN0aW5jdCBmcm9tIHRoZSBtb3JlIGFjYWRlbWljIGRlZmluaXRpb24sIGkuZS46ICdhdXRob3Igb2YgdGhlIGFydGljbGUgaXRzZWxmJzogdGhlIFtEaW1lbnNpb25zXShodHRwczovL2FwcC5kaW1lbnNpb25zLmFpL2Rpc2NvdmVyL3B1YmxpY2F0aW9uKSBkYXRhYmFzZSBtaWdodCBiZSBhIGJldHRlciBzb3VyY2UgZm9yIGZpbmRpbmcgb3V0IG1vcmUgYWJvdXQgdGhhdCB0eXBlIG9mIGF1dGhvci4KCkpvaW4gQXV0aG9ycyBvbnRvIHRoZSBNZW50aW9ucyBkYXRhc2V0IGFuZCB5b3UgY2FuIHNlZSB3aGljaCBhdXRob3JzIGRpc2N1c3NlZCB0aGUgc2V0IHRoZSBtb3N0OgoKYGBge3IgTW9zdCBwcm9saWZpYyBBdXRob3JzIHJlbGF0ZWQgdG8gdGhlIHNldCBvZiBBcnRpY2xlc30KCmFycmFuZ2UoCiAgc3VtbWFyaXNlKAogICAgICBncm91cF9ieSgKICAgICAgICBmaWx0ZXIoCiAgICAgICAgICBtZW50aW9uc193aXRoX2F1dGhvcnMsIAogICAgICAgICAgIWlzLm5hKGF1dGhvcl9uYW1lKSksIAogICAgICAgIGF1dGhvcl9uYW1lKSwKICAgIHRvdGFsX21lbnRpb25zID0gbigpLAogICksCiAgZGVzYyh0b3RhbF9tZW50aW9ucykKKQoKYGBgCgpKb2luIGF1dGhvcnMgb250byBNZW50aW9ucyBhbmQgdGhlIG1hc3RlciBBcnRpY2xlcyBmaWxlIGFuZCB5b3UgY2FuIHF1ZXJ5IGhvdyBtYW55IHRpbWVzIHRoZSBBdXRob3JzIE1lbnRpb25lZCBhIHNwZWNpZmljIGpvdXJuYWw6CgpgYGB7ciBNb3N0IHByb2xpZmljIGF1dGhvcnMgYWJvdXQgYSBzcGVjaWZpYyBqb3VybmFsfQoKYXJyYW5nZSgKICBzdW1tYXJpc2UoCiAgICBncm91cF9ieSgKICAgICAgZmlsdGVyKGFydGljbGVzX3dpdGhfbWVudGlvbnNfYW5kX2F1dGhvcnMsIAogICAgICAgICAgICAgam91cm5hbF90aXRsZSA9PSAiSm91cm5hbCBvZiBDYW5jZXIgUG9saWN5IiwgCiAgICAgICAgICAgICAhaXMubmEoYXV0aG9yX25hbWUpKSwKICAgICAgYXV0aG9yX25hbWUKICAgICksCiAgICB0b3RhbF9tZW50aW9ucyA9IG4oKSAgICAKICApLAogIGRlc2ModG90YWxfbWVudGlvbnMpICAKKQoKYGBgCgoKIyMgSG93IGF1dGhvcnMgYXJlIG1hbmFnZWQKClRoZSBhdXRob3JzIGRhdGFzZXQgaXMgZ2VuZXJhdGVkIGJ5IHRoZSBBbHRtZXRyaWMgQ2xpZW50IHVzaW5nIGxvZ2ljIGJhc2VkIHVwb24gaG93IEFsdG1ldHJpYyB0aGVtc2VsdmVzIGhhbmRsZSBkYXRhIGFib3V0IGF1dGhvcnMuIFVuc3VycHJpc2luZ2x5LCB0aGlzIGxvZ2ljIHZhcmllcyBhY2NvcmRpbmcgdG8gdGhlIHNvdXJjZSBvZiB0aGUgTWVudGlvbiB0aGUgYXV0aG9yIHdyb3RlLiBGb3IgaW5zdGFuY2UsIFR3aXR0ZXIgQXV0aG9ycyBhcmUgdW5pcXVlbHkgaWRlbnRpZmllZCB1c2luZyB0aGUgJ2F1dGhvcl9pZF9vbl9zb3VyY2UnZmllbGQsIHdoaWxlIGJsb2dnaW5nIEF1dGhvcnMgYXJlIGlkZW50aWZpZWQgdXNpbmcgdGhlIGJsb2cncyB3ZWIgYWRkcmVzcyBpbiB0aGUgYXV0aG9yX3VybCBmaWVsZC4KClRoZSBBbHRtZXRyaWMgQ2xpZW50IHRoZXJlZm9yZSBwYXJzZXMgZWFjaCBNZW50aW9uLCBhbmQgdGhlbiB1c2VzIHRoZSBzb3VyY2Ugb2YgdGhlIE1lbnRpb24gdG8gZGVjaWRlIHdoaWNoIGZpZWxkIHVuaXF1ZWx5IGlkZW50aWZpZXMgdGhlIEF1dGhvciwgdXNpbmcgcnVsZXMgaW4gdGhlIFtBdXRob3JNYW5hZ2VyIFB5dGhvbiBjbGFzc10oaHR0cHM6Ly9naXRodWIuY29tL0NhbUxpYi9BbHRtZXRyaWNDbGllbnQvYmxvYi9tYXN0ZXIvYWx0bWV0cmljX2NsaWVudC9hdXRob3JfbWFuYWdlci5weSkuIEF1dGhvcnMgYXJlIHRoZW4gYWRkZWQgdG8gdGhlIEF1dGhvcnMgZGF0YXNldCB3aXRoIGFuIGlkLCBnZW5lcmF0ZWQgYnkgdGhlIEFsdG1ldHJpYyBDbGllbnQsIHRoYXQgaXMgdW5pcXVlIHRvIHRoZSBkYXRhc2V0IG92ZXJhbGwuIFRoaXMgdW5pcXVlIGlkIGlzIHRoZW4gcG9zdGVkIGJhY2sgaW50byB0aGUgTWVudGlvbnMgc2V0LCBlbmFibGluZyBNZW50aW9ucyB0byBiZSBqb2luZWQgYmFjayB0byB0aGVpciBBdXRob3JzLiBUaGlzIGluIHR1cm4gZW5hYmxlcyBBdXRob3JzIHRoYXQgaGF2ZSBwb3N0ZWQgbXVsdGlwbGUgTWVudGlvbnMgdG8gYmUgcmVjb3JkZWQgbW9yZSBlYXNpbHk7IGVzc2VudGlhbGx5IGFsbCBvZiBBbHRtZXRyaWMncyBsb2dpYyByZWdhcmRpbmcgJ3doaWNoIGZpZWxkIGlkZW50aWZpZXMgYW4gQXV0aG9yIGZvciBhIGdpdmVuIHNvdXJjZScgaXMgaGFuZGxlZCBhcyBzaW1wbHkgYXMgcG9zc2libGUgaW4gdGhlIEF1dGhvck1hbmFnZXIuIEhlbmNlIHRoaXMgbG9naWMgZG9lc24ndCBoYXZlIHRvIGJlIGFkZGVkIHRvIHRoZSBSIHVzZWQgdG8gYW5hbHlzZSB0aGUgZGF0YXNldC4KClRoZXJlIGlzLCBob3dldmVyLCBubyB3YXkgb2YgaWRlbnRpZnlpbmcgdGhlIHNhbWUgYXV0aG9yIG1lbnRpb25pbmcgYW4gQXJ0aWNsZSBhY3Jvc3MgdmFyaW91cyBzb3VyY2VzIChlLmcuIGEgam91cm5hbGlzdCBtZW50aW9uaW5nIGFuIEFydGljbGUgaW4gYSBuZXdzIHN0b3J5LCBhbmQgdGhlbiB0d2VldGluZyBhYm91dCBpdCkuIFRoaXMgaXMgcHJldHR5IG11Y2ggcGFyIGZvciB0aGUgY291cnNlIHdpdGggc29jaWFsIG1lZGlhIGRhdGEsIHRob3VnaCwgb2YgY291cnNlIC0gdGhlcmUgYXJlIHBsZW50eSBvZiBwZW9wbGUgb24gcGxhdGZvcm1zIHN1Y2ggYXMgVHdpdHRlciB3aG8gYXJlbid0IHdobyB0aGV5IHNheSB0aGV5IGFyZSwgc28geW91IGNhbiBuZXZlciByZWFsbHkgYmUgKnN1cmUqIGlmIHRoZSBpZCBmcm9tIG9uZSBwbGF0Zm9ybSBjb3JyZXNwb25kcyB0byB0aGUgaWQgZnJvbSBhbm90aGVyLgoKIyMgQXV0aG9ycyBEYXRhIERpY3Rpb25hcnkKClRoZSBBdXRob3IncyBkYXRhc2V0IGNvbnRhaW5zIHRoZSBmb2xsb3dpbmcgZmllbGRzOgoKIyMjIGF1dGhvcl9kZXNjcmlwdGlvbgoKKipEYXRhIHR5cGU6IGNoYXJhY3RlcioqCgpUaGUgYXV0aG9yX2Rlc2NyaXB0aW9uIGZpZWxkIGlzIHRoZSBkZXNjcmlwdGlvbiB0aGUgQXV0aG9yIGhhcyBwcm92aWRlZCBhYm91dCB0aGVtc2VsdmVzLiBJbiB0aGUgdGVzdCBkYXRhc2V0IGF0IGxlYXN0LCB0aGVzZSBvbmx5IHNlZW0gdG8gYmUgcHJvdmlkZWQgd2hlbiB0aGUgc291cmNlIGlzIHR3aXR0ZXIgb3IgYmxvZ3MuIAoKYGBge3IgRmlsdGVycyB0aG9zZSBBdXRob3JzIHRoYXQgaGF2ZSBhIGRlc2NyaXB0aW9ufQoKc2VsZWN0KAogIGZpbHRlcihhdXRob3JzLAogICAgICAgICAhaXMubmEoYXV0aG9yX2Rlc2NyaXB0aW9uKSksCiAgYXV0aG9yX25hbWUsCiAgYXV0aG9yX3NvdXJjZSwKICBhdXRob3JfZGVzY3JpcHRpb24KKQoKYGBgCgpUaGVzZSBkZXNjcmlwdGlvbnMgbWlnaHQgYWN0dWFsbHkgcHJvdmlkZSBhIGRlY2VudCBzZXQgb2YgZnJlZSB0ZXh0IGZvciBOYXR1cmFsIExhbmd1YWdlIFByb2Nlc3NpbmcsIGhvd2V2ZXIuIChNeSBQaEQgcmVzZWFyY2ggaW5kaWNhdGVzIHRoYXQgJ1R3aXR0ZXIgQmlvZ3JhcGh5JyBkYXRhIHBvdGVudGlhbGx5IGNvbnRhaW5zIG1vcmUgdXNlZnVsIGluZm9ybWF0aW9uIHRoYW4gdHdlZXRzKS4KCiMjIyBhdXRob3JfZm9sbG93ZXJfY291bnQKCioqRGF0YSB0eXBlOiBpbnQqKgoKQWx0bWV0cmljIHJlY29yZCB0aGUgbnVtYmVyIG9mIGZvbGxvd2VycyBlYWNoIEF1dGhvciBoYXMsIGJ1dCBvbmx5IGZyb20gVHdpdHRlciBhbmQgUmVkZGl0LiBUaGlzIGNsZWFybHkgaGFzIHNvbWUgcG90ZW50aWFsIGZvciBhc3Nlc3NpbmcgdGhlIHJlYWNoIG9mIGEgc3BlY2lmaWMgYXJ0aWNsZSBvciBqb3VybmFsIG9uIGVpdGhlciBvZiB0aG9zZSBwbGF0Zm9ybXMgKHNlZSAqKk5vdGVib29rX01lbnRpb25zLlJtZCoqIGZvciB0aGUgY29kZSB0byBkbyB0aGF0KS4KCmBgYHtyIFRvdGFscyBvZiBhbGwgZm9sbG93ZXJzIGJ5IHNvdXJjZX0KCnN1bW1hcmlzZSgKICBncm91cF9ieSgKICAgIGF1dGhvcnMsCiAgICBhdXRob3Jfc291cmNlCiAgKSwKICB0b3RhbF9vZl9hbGxfZm9sbG93ZXJzID0gc3VtKGF1dGhvcl9mb2xsb3dlcl9jb3VudCkKKQoKYGBgCgoKCiMjIyBhdXRob3JfaWQKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKQW4gaWQgZ2VuZXJhdGVkIGJ5IHRoZSBBbHRtZXRyaWMgQ2xpZW50IGl0c2VsZiwgd2hpY2ggaXMgdW5pcXVlIHRvIGVhY2ggYW5hbHl0aWNhbCBkYXRhc2V0LCBhbmQgaXMgdXNlZCB0byBqb2luIGVhY2ggQXV0aG9yIHRvIHRoZSBtZW50aW9ucyB0aGV5IGhhdmUgd3JpdHRlbi4gVGhlIGF1dGhvcl9pZCBpcyB1bmlxdWUgYnkgYXV0aG9yICoqYW5kIHNvdXJjZSoqIC0gdGhlIHNhbWUgYWN0dWFsIEF1dGhvciBjYW5ub3QgYmUgaWRlbnRpZmllZCBhY3Jvc3MgbXVsdGlwbGUgc291cmNlcy4gKFNlZSAqSG93IGF1dGhvcnMgYXJlIG1hbmFnZWQqIGFib3ZlKS4KCiMjIyBhdXRob3JfaWRfb25fc291cmNlCgoqKkRhdGEgdHlwZTogY2hhcmFjdGVyKioKClRoZSBpbmZvcm1hdGlvbiBBbHRtZXRyaWMgaGF2ZSByZXRyaWV2ZWQgYWJvdXQgdGhlIEF1dGhvcidzIGlkICpvbiB0aGUgcGxhdGZvcm0gZnJvbSB3aGljaCB0aGVpciBNZW50aW9ucyBoYXZlIGJlZW4gZm91bmQqLiBGb3IgaW5zdGFuY2UsIG15IFR3aXR0ZXIgYXV0aG9yX2lkX29uX3NvdXJjZSAoZm9yIG15IGxhcmdlbHkgdW51c2VkIFR3aXR0ZXIgYWNjb3VudCkgd291bGQgYmUgRXBpcGhhbnlMYm9yby4gVGhpcyBpcyB0aGUgbW9zdC1jb21tb25seS11c2VkIHVuaXF1ZSBpZGVudGlmaWVyIGZvciBBdXRob3JzIGluIHNvdXJjZXMsIGJ1dCBpc24ndCB0aGUgb25seSBmaWVsZCB0aGF0IGNhbiBiZSB1c2VkIGFzIGFuIGlkLiBTb21lIHNvdXJjZXMgLSBlLmcuIGJsb2dzIG9yIHBhZ2VzIGFib3V0IHNwZWNpZmljIGpvdXJuYWxpc3RzIGluIG5ld3NwYXBlcnMgLSBwb3RlbnRpYWxseSBkb24ndCBoYXZlIGEgc3BlY2lmaWMgQXV0aG9yIGlkLgoKIyMjIGF1dGhvcl9pbWFnZV91cmwKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKVGhpcyBpcyBhIHZlcnkgb2RkIGZpZWxkIHRoYXQgbW9zdGx5IHNlZW1zIHRvIHRha2UgZGF0YSBmcm9tIEFsdG1ldHJpYydzIG93biBBbWF6b24gY2xvdWQgd2ViIHNlcnZpY2UuIEhvd2V2ZXIsIGl0IGRvZXMgc2VlbSB0byB1bmlxdWVseSBpZGVudGlmeSBwb2xpY3kgJ0F1dGhvcnMnIC0gaW5kZWVkIHRoZSBjb2RlIGNodW5rIGJlbG93OgoKYGBge3IgQXV0aG9ycyBvZiBtdWx0aXBsZSBwb2xpY3kgZG9jdW1lbnRzfQoKYXJyYW5nZSgKICBzdW1tYXJpc2UoCiAgICBncm91cF9ieSgKICAgICAgZmlsdGVyKG1lbnRpb25zX3dpdGhfYXV0aG9ycywgc291cmNlID09ICJwb2xpY3kiKSwKICAgICAgYXV0aG9yX2ltYWdlX3VybAogICAgKSwKICAgIHRvdGFsID0gbigpCiAgKSwKICBkZXNjKHRvdGFsKQopCgpgYGAKCi4uLiBvdXRwdXRzIGEgbGlzdCBvZiBwb2xpY3kgJ0F1dGhvcnMnIHRoYXQgYXJlIGluIGZhY3QgW3RodW1ibmFpbHMgb2YgdGhlIGZyb250IGNvdmVycyBvZiBwb2xpY3kgZG9jdW1lbnRzXShodHRwczovL3MzLmFtYXpvbmF3cy5jb20vY2FjaGUuYWx0bWV0cmljLmNvbS9wb2xpY3kvdGh1bWJuYWlscy90aHVtYm5haWwtYTlmZDBkY2JlODZlNzdlNjEwNmMzMTZhMzRhOTAyMzUyNjZkM2M1MmI1OGI5NGFiZGRiNTVmOWY4MWM1ZmZkNC5qcGcpIHN0b3JlZCBvbiBBbHRtZXRyaWMncyBzZXJ2ZXIsIG5vIGRvdWJ0IGZvciBkaXNwbGF5IG9uIHRoZWlyIG93biB3ZWJzaXRlLiBGcm9tIHRoaXMgd2UgY2FuIHN1cm1pc2UgdGhhdCB0aGUgcG9saWN5J3MgJ0F1dGhvcicgaXMgaW4gZmFjdCB0aGUgKipwb2xpY3kgZG9jdW1lbnQqKiBpdHNlbGYuCgojIyMgYXV0aG9yX25hbWUKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKVGhlIGdpdmVuIG5hbWUgb2YgdGhlIGF1dGhvciwgaWYga25vd24uIFF1aXRlIG9mdGVuIEF1dGhvcnMgZmFpbCB0byBuYW1lIHRoZW1zZWx2ZXMsIGFuZCBpdCBzZWVtcyBhcyBpZiBmaW5kaW5nIHRoZSBuYW1lcyBvZiB0aGUgQXV0aG9ycyBvZiBwb2xpY3kgTWVudGlvbnMgaW4gcGFydGljdWxhciBtaWdodCBiZSBwcm9ibGVtYXRpYywgYXMgc2hvd24gYmVsb3cuCgpgYGB7ciBNaXNzaW5nIEF1dGhvciBuYW1lcyBieSBNZW50aW9uIHNvdXJjZX0KCmdncGxvdChkYXRhID0gZmlsdGVyKGF1dGhvcnMsIGlzLm5hKGF1dGhvcl9uYW1lKSkpICsKICBnZW9tX2JhcihtYXBwaW5nID0gYWVzKHggPSBhdXRob3Jfc291cmNlKSkgKwogIGNvb3JkX2ZsaXAoKQoKCmBgYAoKIyMjIGF1dGhvcl9zb3VyY2UKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKVGhlIG5hbWUgb2YgdGhlIHNvdXJjZSBmcm9tIHdoaWNoIHRoaXMgQXV0aG9yJ3MgTWVudGlvbnMgd2VyZSBmb3VuZC4gVGhpcyBpbmZvcm1hdGlvbiBjYW4gYWxzbyBiZSBmb3VuZCBpbiB0aGUgTWVudGlvbnMgc2V0IHNvIGl0J3MgYSBiaXQgcmVkdW5kYW50IGhlcmUsIHRob3VnaCBpdCBtZWFucyB5b3UgY2FuIGxvb2sgZm9yIHdoaWNoIHNvdXJjZXMgYWxsIHRoZSBBdXRob3JzIGNvbWUgZnJvbSB3aXRob3V0IGhhdmluZyB0byBqb2luIHRvIE1lbnRpb25zIGFuZCBncm91cCBieSBhdXRob3I6CgpgYGB7ciBTb3VyY2VzIEF1dGhvcnMgY29tZSBmcm9tfQoKZ2dwbG90KGRhdGEgPSBhdXRob3JzKSArCiAgZ2VvbV9iYXIobWFwcGluZyA9IGFlcyh4ID0gYXV0aG9yX3NvdXJjZSkpICsKICBjb29yZF9mbGlwKCkKCmBgYAoKIyMjIGF1dGhvcl91cmwKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKQW55IFVSTCB0aGF0IEFsdG1ldHJpYyBoYXZlIHNlZW4gZml0IHRvIGF0dGFjaCB0byBhbiBhdXRob3IuIFRoZXNlIGFyZSB1c2VkIGJ5IEFsdG1ldHJpYyB0byBpZGVudGlmeSBibG9nLCBuZXdzLCBhbmQgV2lraXBlZGlhIEF1dGhvcnMsIGUuZy46CgpgYGB7ciBMaXN0IG9mIFdpa2lwZWRpYSBBdXRob3IgVVJMc30KCnNlbGVjdCgKICBmaWx0ZXIoYXV0aG9ycywgYXV0aG9yX3NvdXJjZSA9PSAid2lraXBlZGlhIiksCiAgYXV0aG9yX25hbWUsCiAgYXV0aG9yX3VybAopCgpgYGAKCgoK
+ + + +
+ + + + + + + + diff --git a/altmetric_data_analysis/Notebook_Mentions.Rmd b/altmetric_data_analysis/Notebook_Mentions.Rmd new file mode 100644 index 0000000..6caa2c3 --- /dev/null +++ b/altmetric_data_analysis/Notebook_Mentions.Rmd @@ -0,0 +1,110 @@ +--- +title: "Altmetric Mentions" +date: 2018-07-26 +output: + html_notebook: default + github_document: default +--- + +```{r setup, include = FALSE} +library(tidyverse) + +mentions <- read_csv('../files_out/20180411_1510_mentions.csv') +articles <- read_csv('../files_out/20180411_1510_master.csv') +authors <- read_csv('../files_out/20180411_1510_authors.csv') + +articles_with_mentions <- left_join(articles, mentions) +articles_with_mentions_and_authors <- left_join(articles_with_mentions, authors) + +``` + +A Mention (sometimes also called a *post* in Altmetric parlance) is a piece of content in which a specific article is mentioned. Collecting these mentions is pretty much the core piece of value that Altmetric add. + +As ever, the most fun with Mentions can be had by joining them to the master Articles dataset. However, the Altmetric client also extracts Authors information from each Mention and adds it to a third dataset (see the Authors notebook for more about this set). + +```{r Shows the journals that are most mentioned in news articles} + +arrange( + summarise( + group_by( + filter(articles_with_mentions, + source == "news"), + journal_title + ), + total = n() + ), + desc(total) +) + + +``` + +## Mentions Data Dictionary + +The following fields are included in the Mentions dataset + +### author_id + +**Data type: character** + +Used to join Mentions to the Authors dataset. Author information is extracted for each Mention and added to a separate set, which enables analysis of the engagement particular authors have with the Articles in the master dataset. For example, which papers have been mentioned by the Twitter users with the highest numbers of followers? + +```{r Articles ordered by the total number of followers of those that tweeted} + +arrange( + summarise( + group_by(filter(articles_with_mentions_and_authors, source == "twitter"), + article_title), + total_followers = sum(author_follower_count) + ), + desc(total_followers) +) + +``` + +The above depends upon counting two tweets about one article by a twitter user with n followers as being tweeted at 2n followers. Given that not all followers of a twitter user see all the tweets that user posts, this seems a reasonable assumption. + +### date_posted + +**Data type: POSIX Calendar Time** + +The date upon which the Mention was posted. This is incredibly useful for trending the timeline of the 'buzz' around a specific Article (in the case below, the most mentioned one in the test set - 10.1920/bn.ifs.2017.bn0211). + +```{r Trend of the buzz around a specific article} + +article_mentions <- filter(mentions, doi == "10.1920/bn.ifs.2017.bn0211") + +ggplot(data = article_mentions, mapping = aes(x = date_posted)) + + geom_freqpoly(binwidth = 86400) + +``` + +### doi + +**Data type: character** + +The key used to link a Mention back to the Article it mentioned in the master Articles set. + +### source + +**Data type: character** + +The source of the Mention. These can be charted using the query below (Twitter usually swamps all the others at time of writing). + +```{r Charts a count of mentions by source} + +ggplot(data = mentions) + + geom_bar(mapping = aes(x = source)) + + coord_flip() + +``` + + +### url + +**Data type: character** + +The URL of the Mention (i.e. the place on the internet that Altmetric found it). + + + diff --git a/altmetric_data_analysis/Notebook_Mentions.nb.html b/altmetric_data_analysis/Notebook_Mentions.nb.html new file mode 100644 index 0000000..1d5b645 --- /dev/null +++ b/altmetric_data_analysis/Notebook_Mentions.nb.html @@ -0,0 +1,345 @@ + + + + + + + + + + + + + + +Altmetric Mentions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +

A Mention (sometimes also called a post in Altmetric parlance) is a piece of content in which a specific article is mentioned. Collecting these mentions is pretty much the core piece of value that Altmetric add.

+

As ever, the most fun with Mentions can be had by joining them to the master Articles dataset. However, the Altmetric client also extracts Authors information from each Mention and adds it to a third dataset (see the Authors notebook for more about this set).

+ + + +
arrange(
+  summarise(
+    group_by(
+      filter(articles_with_mentions, 
+             source == "news"),
+      journal_title
+    ),
+    total = n()
+  ),
+  desc(total)
+)
+ + +
# A tibble: 45 x 2
+   journal_title                    total
+   <chr>                            <int>
+ 1 The Lancet                         458
+ 2 Institute for Fiscal Studies       178
+ 3 Annals of Internal Medicine        109
+ 4 Journal of Cancer Policy            53
+ 5 Tobacco Control                     45
+ 6 Journal of Economic Perspectives    39
+ 7 Health Affairs                      34
+ 8 New England Journal of Medicine     33
+ 9 JAMA Oncology                       31
+10 Journal of Clinical Oncology        22
+# ... with 35 more rows
+ + + +
+

Mentions Data Dictionary

+

The following fields are included in the Mentions dataset

+
+

author_id

+

Data type: character

+

Used to join Mentions to the Authors dataset. Author information is extracted for each Mention and added to a separate set, which enables analysis of the engagement particular authors have with the Articles in the master dataset. For example, which papers have been mentioned by the Twitter users with the highest numbers of followers?

+ + + +
arrange(
+  summarise(
+    group_by(filter(articles_with_mentions_and_authors, source == "twitter"),
+    article_title),
+    total_followers = sum(author_follower_count)
+  ),
+  desc(total_followers)
+)
+ + +
# A tibble: 205 x 2
+   article_title                                                     total_followers
+   <chr>                                                                       <int>
+ 1 Three-Year Impacts Of The Affordable Care Act: Improved Medical …         4201721
+ 2 Higher Education funding in England: past, present and options f…         3497734
+ 3 Teaching personal initiative beats traditional training in boost…         3438412
+ 4 Financing transformative health systems towards achievement of t…         3406705
+ 5 Global economic cost of smoking-attributable diseases                     3167965
+ 6 Dementia prevention, intervention, and care                               2409503
+ 7 Long-run comparisons of spending per pupil across different stag…         1953266
+ 8 Evolution and patterns of global health financing 1995–2014: dev…         1790594
+ 9 Update: Interim Guidance for Health Care Providers Caring for Pr…         1529194
+10 Machine Learning: An Applied Econometric Approach                         1358519
+# ... with 195 more rows
+ + + +

The above depends upon counting two tweets about one article by a twitter user with n followers as being tweeted at 2n followers. Given that not all followers of a twitter user see all the tweets that user posts, this seems a reasonable assumption.

+
+
+

date_posted

+

Data type: POSIX Calendar Time

+

The date upon which the Mention was posted. This is incredibly useful for trending the timeline of the ‘buzz’ around a specific Article (in the case below, the most mentioned one in the test set - 10.1920/bn.ifs.2017.bn0211).

+ + + +
article_mentions <- filter(mentions, doi == "10.1920/bn.ifs.2017.bn0211")
+ggplot(data = article_mentions, mapping = aes(x = date_posted)) +
+  geom_freqpoly(binwidth = 86400)
+ + +

+ + + +
+
+

doi

+

Data type: character

+

The key used to link a Mention back to the Article it mentioned in the master Articles set.

+
+
+

source

+

Data type: character

+

The source of the Mention. These can be charted using the query below (Twitter usually swamps all the others at time of writing).

+ + + +
ggplot(data = mentions) +
+  geom_bar(mapping = aes(x = source)) +
+  coord_flip()
+ + +

+ + + +
+
+

url

+

Data type: character

+

The URL of the Mention (i.e. the place on the internet that Altmetric found it).

+ +
+
+ +
LS0tCnRpdGxlOiAiQWx0bWV0cmljIE1lbnRpb25zIgpkYXRlOiAyMDE4LTA3LTI2Cm91dHB1dDogCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAogIGdpdGh1Yl9kb2N1bWVudDogZGVmYXVsdAotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQoKbWVudGlvbnMgPC0gcmVhZF9jc3YoJy4uL2ZpbGVzX291dC8yMDE4MDQxMV8xNTEwX21lbnRpb25zLmNzdicpCmFydGljbGVzIDwtIHJlYWRfY3N2KCcuLi9maWxlc19vdXQvMjAxODA0MTFfMTUxMF9tYXN0ZXIuY3N2JykKYXV0aG9ycyA8LSByZWFkX2NzdignLi4vZmlsZXNfb3V0LzIwMTgwNDExXzE1MTBfYXV0aG9ycy5jc3YnKQoKYXJ0aWNsZXNfd2l0aF9tZW50aW9ucyA8LSBsZWZ0X2pvaW4oYXJ0aWNsZXMsIG1lbnRpb25zKQphcnRpY2xlc193aXRoX21lbnRpb25zX2FuZF9hdXRob3JzIDwtIGxlZnRfam9pbihhcnRpY2xlc193aXRoX21lbnRpb25zLCBhdXRob3JzKQoKYGBgCgpBIE1lbnRpb24gKHNvbWV0aW1lcyBhbHNvIGNhbGxlZCBhICpwb3N0KiBpbiBBbHRtZXRyaWMgcGFybGFuY2UpIGlzIGEgcGllY2Ugb2YgY29udGVudCBpbiB3aGljaCBhIHNwZWNpZmljIGFydGljbGUgaXMgbWVudGlvbmVkLiBDb2xsZWN0aW5nIHRoZXNlIG1lbnRpb25zIGlzIHByZXR0eSBtdWNoIHRoZSBjb3JlIHBpZWNlIG9mIHZhbHVlIHRoYXQgQWx0bWV0cmljIGFkZC4gCgpBcyBldmVyLCB0aGUgbW9zdCBmdW4gd2l0aCBNZW50aW9ucyBjYW4gYmUgaGFkIGJ5IGpvaW5pbmcgdGhlbSB0byB0aGUgbWFzdGVyIEFydGljbGVzIGRhdGFzZXQuIEhvd2V2ZXIsIHRoZSBBbHRtZXRyaWMgY2xpZW50IGFsc28gZXh0cmFjdHMgQXV0aG9ycyBpbmZvcm1hdGlvbiBmcm9tIGVhY2ggTWVudGlvbiBhbmQgYWRkcyBpdCB0byBhIHRoaXJkIGRhdGFzZXQgKHNlZSB0aGUgQXV0aG9ycyBub3RlYm9vayBmb3IgbW9yZSBhYm91dCB0aGlzIHNldCkuCgpgYGB7ciBTaG93cyB0aGUgam91cm5hbHMgdGhhdCBhcmUgbW9zdCBtZW50aW9uZWQgaW4gbmV3cyBhcnRpY2xlc30KCmFycmFuZ2UoCiAgc3VtbWFyaXNlKAogICAgZ3JvdXBfYnkoCiAgICAgIGZpbHRlcihhcnRpY2xlc193aXRoX21lbnRpb25zLCAKICAgICAgICAgICAgIHNvdXJjZSA9PSAibmV3cyIpLAogICAgICBqb3VybmFsX3RpdGxlCiAgICApLAogICAgdG90YWwgPSBuKCkKICApLAogIGRlc2ModG90YWwpCikKCgpgYGAKCiMjIE1lbnRpb25zIERhdGEgRGljdGlvbmFyeQoKVGhlIGZvbGxvd2luZyBmaWVsZHMgYXJlIGluY2x1ZGVkIGluIHRoZSBNZW50aW9ucyBkYXRhc2V0CgojIyMgYXV0aG9yX2lkCgoqKkRhdGEgdHlwZTogY2hhcmFjdGVyKioKClVzZWQgdG8gam9pbiBNZW50aW9ucyB0byB0aGUgQXV0aG9ycyBkYXRhc2V0LiBBdXRob3IgaW5mb3JtYXRpb24gaXMgZXh0cmFjdGVkIGZvciBlYWNoIE1lbnRpb24gYW5kIGFkZGVkIHRvIGEgc2VwYXJhdGUgc2V0LCB3aGljaCBlbmFibGVzIGFuYWx5c2lzIG9mIHRoZSBlbmdhZ2VtZW50IHBhcnRpY3VsYXIgYXV0aG9ycyBoYXZlIHdpdGggdGhlIEFydGljbGVzIGluIHRoZSBtYXN0ZXIgZGF0YXNldC4gRm9yIGV4YW1wbGUsIHdoaWNoIHBhcGVycyBoYXZlIGJlZW4gbWVudGlvbmVkIGJ5IHRoZSBUd2l0dGVyIHVzZXJzIHdpdGggdGhlIGhpZ2hlc3QgbnVtYmVycyBvZiBmb2xsb3dlcnM/CgpgYGB7ciBBcnRpY2xlcyBvcmRlcmVkIGJ5IHRoZSB0b3RhbCBudW1iZXIgb2YgZm9sbG93ZXJzIG9mIHRob3NlIHRoYXQgdHdlZXRlZH0KCmFycmFuZ2UoCiAgc3VtbWFyaXNlKAogICAgZ3JvdXBfYnkoZmlsdGVyKGFydGljbGVzX3dpdGhfbWVudGlvbnNfYW5kX2F1dGhvcnMsIHNvdXJjZSA9PSAidHdpdHRlciIpLAogICAgYXJ0aWNsZV90aXRsZSksCiAgICB0b3RhbF9mb2xsb3dlcnMgPSBzdW0oYXV0aG9yX2ZvbGxvd2VyX2NvdW50KQogICksCiAgZGVzYyh0b3RhbF9mb2xsb3dlcnMpCikKCmBgYAoKVGhlIGFib3ZlIGRlcGVuZHMgdXBvbiBjb3VudGluZyB0d28gdHdlZXRzIGFib3V0IG9uZSBhcnRpY2xlIGJ5IGEgdHdpdHRlciB1c2VyIHdpdGggbiBmb2xsb3dlcnMgYXMgYmVpbmcgdHdlZXRlZCBhdCAybiBmb2xsb3dlcnMuIEdpdmVuIHRoYXQgbm90IGFsbCBmb2xsb3dlcnMgb2YgYSB0d2l0dGVyIHVzZXIgc2VlIGFsbCB0aGUgdHdlZXRzIHRoYXQgdXNlciBwb3N0cywgdGhpcyBzZWVtcyBhIHJlYXNvbmFibGUgYXNzdW1wdGlvbi4gCgojIyMgZGF0ZV9wb3N0ZWQKCioqRGF0YSB0eXBlOiBQT1NJWCBDYWxlbmRhciBUaW1lKioKClRoZSBkYXRlIHVwb24gd2hpY2ggdGhlIE1lbnRpb24gd2FzIHBvc3RlZC4gVGhpcyBpcyBpbmNyZWRpYmx5IHVzZWZ1bCBmb3IgdHJlbmRpbmcgdGhlIHRpbWVsaW5lIG9mIHRoZSAnYnV6eicgYXJvdW5kIGEgc3BlY2lmaWMgQXJ0aWNsZSAoaW4gdGhlIGNhc2UgYmVsb3csIHRoZSBtb3N0IG1lbnRpb25lZCBvbmUgaW4gdGhlIHRlc3Qgc2V0IC0gMTAuMTkyMC9ibi5pZnMuMjAxNy5ibjAyMTEpLgoKYGBge3IgVHJlbmQgb2YgdGhlIGJ1enogYXJvdW5kIGEgc3BlY2lmaWMgYXJ0aWNsZX0KCmFydGljbGVfbWVudGlvbnMgPC0gZmlsdGVyKG1lbnRpb25zLCBkb2kgPT0gIjEwLjE5MjAvYm4uaWZzLjIwMTcuYm4wMjExIikKCmdncGxvdChkYXRhID0gYXJ0aWNsZV9tZW50aW9ucywgbWFwcGluZyA9IGFlcyh4ID0gZGF0ZV9wb3N0ZWQpKSArCiAgZ2VvbV9mcmVxcG9seShiaW53aWR0aCA9IDg2NDAwKQoKYGBgCgojIyMgZG9pCgoqKkRhdGEgdHlwZTogY2hhcmFjdGVyKioKClRoZSBrZXkgdXNlZCB0byBsaW5rIGEgTWVudGlvbiBiYWNrIHRvIHRoZSBBcnRpY2xlIGl0IG1lbnRpb25lZCBpbiB0aGUgbWFzdGVyIEFydGljbGVzIHNldC4KCiMjIyBzb3VyY2UKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKVGhlIHNvdXJjZSBvZiB0aGUgTWVudGlvbi4gVGhlc2UgY2FuIGJlIGNoYXJ0ZWQgdXNpbmcgdGhlIHF1ZXJ5IGJlbG93IChUd2l0dGVyIHVzdWFsbHkgc3dhbXBzIGFsbCB0aGUgb3RoZXJzIGF0IHRpbWUgb2Ygd3JpdGluZykuIAoKYGBge3IgQ2hhcnRzIGEgY291bnQgb2YgbWVudGlvbnMgYnkgc291cmNlfQoKZ2dwbG90KGRhdGEgPSBtZW50aW9ucykgKwogIGdlb21fYmFyKG1hcHBpbmcgPSBhZXMoeCA9IHNvdXJjZSkpICsKICBjb29yZF9mbGlwKCkKCmBgYAoKCiMjIyB1cmwKCioqRGF0YSB0eXBlOiBjaGFyYWN0ZXIqKgoKVGhlIFVSTCBvZiB0aGUgTWVudGlvbiAoaS5lLiB0aGUgcGxhY2Ugb24gdGhlIGludGVybmV0IHRoYXQgQWx0bWV0cmljIGZvdW5kIGl0KS4KCgoK
+ + + +
+ + + + + + + + diff --git a/altmetric_data_analysis/author_summaries.R b/altmetric_data_analysis/author_summaries.R index b1fdb39..fb0b1bf 100644 --- a/altmetric_data_analysis/author_summaries.R +++ b/altmetric_data_analysis/author_summaries.R @@ -73,4 +73,16 @@ ggplot(data = followers_and_news) + geom_point(mapping = aes(x = total_followers, y = n)) +arrange( + summarise( + group_by( + filter(mentions_with_authors, source == "policy"), + author_image_url + ), + total = n() + ), + desc(total) +) %>% write.csv("../files_out/policy_author_urls.csv") + +