JournalAnalysis matches Europe PMC articles to journal-level metrics from Scimago or InCites (JCR). This tutorial walks through a typical workflow: define a search, retrieve and inspect results, summarize abstracts, rank journals, and export a shortlist.
Load JournalAnalysis and dplyr for the filtering steps later in the tutorial.
Europe PMC queries use the same boolean syntax as the website search box. You can pass one query or combine several; multiple queries are unioned before filtering.
JournalAnalysis also ships example query strings you can adapt:
query3
#> [1] "microbiome AND (psychiatry OR psychology OR neuroscience) AND (rhesus OR macaque or human or stress or monkey) AND (NOT ecology)"For this tutorial we use query3, which targets
microbiome papers in psychiatry, psychology, and neuroscience-related
contexts while excluding ecology papers.
See the Europe PMC search help for field names and operators.
get_publication_data() is the main entry point. It:
scimago or
incities (InCites / JCR)Start with a modest limit while you refine the
query:
pub_data <- get_publication_data(
journal_source = "scimago",
queries = query3,
limit = 200,
min_year = 2015,
min_citations = 3,
n_cores = 1
)
#> 28861 records found, returning 200
#> Removed records published before 2015.
#> Removed records with less than 3 citations.
#> Removed records with NA values for pmid, doi, and authors.
#> 17 records passed the filter.
length(pub_data)
#> [1] 3
names(pub_data)
#> [1] "journals" "articles" "combined"
vapply(pub_data, nrow, integer(1))
#> journals articles combined
#> 16 17 17The returned list has three elements:
| Element | Description |
|---|---|
pub_data$articles |
Article metadata from Europe PMC |
pub_data$journals |
Journal metrics for matched ISSNs |
pub_data$combined |
Articles joined to journal metadata |
After retrieval, inspect each list element to confirm the query and filters behaved as expected.
dplyr::glimpse(pub_data$articles)
#> Rows: 17
#> Columns: 30
#> $ id <chr> "41077197", "40420360", "40558535", "41092898", …
#> $ source <chr> "MED", "MED", "MED", "MED", "MED", "MED", "MED",…
#> $ pmid <chr> "41077197", "40420360", "40558535", "41092898", …
#> $ pmcid <chr> "PMC12666862", "PMC12106051", "PMC12190894", "PM…
#> $ doi <chr> "10.1016/j.jinf.2025.106626", "10.1002/alz.70273…
#> $ title <chr> "Age-related severity of nontuberculous mycobact…
#> $ authorString <chr> "Napier EG, Doratt BM, Cinco IR, Stuart EV, Gero…
#> $ journalTitle <chr> "J Infect", "Alzheimers Dement", "Cells", "Neuro…
#> $ pubYear <int> 2025, 2025, 2025, 2025, 2025, 2025, 2024, 2025, …
#> $ journalIssn <chr> "01634453; 15322742;", "15525260; 15525279;", "2…
#> $ pageInfo <chr> "106626", "e70273", "908", "4107-4133", "226", "…
#> $ pubType <chr> "research-article; journal article", "review-art…
#> $ isOpenAccess <chr> "Y", "Y", "Y", "N", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ inEPMC <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ inPMC <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ hasPDF <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ hasBook <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
#> $ hasSuppl <chr> "Y", "Y", "N", "N", "N", "N", "Y", "N", "Y", "Y"…
#> $ citedByCount <dbl> 3, 10, 3, 8, 8, 3, 4, 7, 3, 3, 38, 10, 12, 4, 4,…
#> $ hasReferences <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ hasTextMinedTerms <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ hasDbCrossReferences <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
#> $ hasLabsLinks <chr> "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"…
#> $ hasTMAccessionNumbers <chr> "Y", "N", "N", "N", "N", "N", "Y", "N", "N", "N"…
#> $ firstIndexDate <chr> "2025-10-13", "2025-05-27", "2025-06-26", "2025-…
#> $ firstPublicationDate <chr> "2025-10-10", "2025-05-01", "2025-06-16", "2025-…
#> $ journalVolume <chr> "91", "21", "14", "113", "30", "17", "10", "19",…
#> $ issue <chr> "5", "5", "12", "24", "1", "5", "12", NA, NA, "1…
#> $ ISSN.1 <chr> "01634453", "15525260", "20734409", "08966273", …
#> $ ISSN.2 <chr> "15322742;", "15525279;", "", "10974199;", "2047…
dplyr::glimpse(pub_data$journals)
#> Rows: 16
#> Columns: 28
#> $ Rank <int> 115, 306, 349, 501, 752, 800, 900, 1215, 144…
#> $ Sourceid <dbl> 17978, 19700182758, 3600148102, 50032, 21100…
#> $ Title <chr> "Neuron", "Nature Communications", "Alzheime…
#> $ Type <chr> "journal", "journal", "journal", "journal", …
#> $ Issn <chr> "10974199, 08966273", "20411723", "15525279,…
#> $ Publisher <chr> "Cell Press", "Nature Research", "John Wiley…
#> $ Open.Access <chr> "No", "Yes", "No", "Yes", "No", "Yes", "No",…
#> $ Open.Access.Diamond <chr> "No", "No", "No", "No", "No", "No", "No", "N…
#> $ SJR <dbl> 8.564, 4.904, 4.530, 3.685, 2.906, 2.827, 2.…
#> $ SJR.Best.Quartile <chr> "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q1", "Q…
#> $ H.index <int> 569, 634, 194, 189, 124, 144, 247, 152, 193,…
#> $ Total.Docs...2025. <int> 348, 11528, 1244, 300, 12, 538, 365, 289, 83…
#> $ Total.Docs...3years. <int> 1058, 26313, 1938, 947, 28, 1598, 1033, 1072…
#> $ Total.Refs. <int> 27520, 755955, 72380, 24581, 1953, 30815, 14…
#> $ Total.Citations..3years. <int> 12733, 452297, 17963, 12017, 277, 10831, 542…
#> $ Citable.Docs...3years. <int> 778, 25888, 1394, 946, 25, 1593, 700, 394, 2…
#> $ Citations...Doc...2years. <dbl> 11.25, 16.60, 11.93, 11.19, 7.86, 6.38, 5.10…
#> $ Ref....Doc. <dbl> 79.08, 65.58, 58.18, 81.94, 162.75, 57.28, 3…
#> $ X.Female <dbl> 41.08, 37.59, 49.88, 47.54, 75.56, 43.29, 46…
#> $ Overton <int> 2, 91, 6, 1, 0, 0, 0, 7, 6, 0, 1, 0, 6, 0, 0…
#> $ Country <chr> "United States", "United Kingdom", "United S…
#> $ Region <chr> "Northern America", "Western Europe", "North…
#> $ Coverage <chr> "1988-2026", "2010-2026", "2005-2026", "2004…
#> $ Categories <chr> "Neuroscience (miscellaneous) (Q1)", "Bioche…
#> $ Areas <chr> "Neuroscience", "Biochemistry, Genetics and …
#> $ ISSN <chr> "10974199; 08966273", "20411723", "15525279;…
#> $ ISSN.1 <chr> "10974199", "20411723", "15525279", "1742209…
#> $ ISSN.2 <chr> "08966273", "", "15525260", "", "21683492", …
dplyr::glimpse(pub_data$combined)
#> Rows: 17
#> Columns: 55
#> $ id <chr> "41077197", "40420360", "40558535", "4109289…
#> $ source <chr> "MED", "MED", "MED", "MED", "MED", "MED", "M…
#> $ pmid <chr> "41077197", "40420360", "40558535", "4109289…
#> $ pmcid <chr> "PMC12666862", "PMC12106051", "PMC12190894",…
#> $ doi <chr> "10.1016/j.jinf.2025.106626", "10.1002/alz.7…
#> $ title <chr> "Age-related severity of nontuberculous myco…
#> $ authorString <chr> "Napier EG, Doratt BM, Cinco IR, Stuart EV, …
#> $ journalTitle <chr> "J Infect", "Alzheimers Dement", "Cells", "N…
#> $ pubYear <int> 2025, 2025, 2025, 2025, 2025, 2025, 2024, 20…
#> $ journalIssn <chr> "01634453; 15322742;", "15525260; 15525279;"…
#> $ pageInfo <chr> "106626", "e70273", "908", "4107-4133", "226…
#> $ pubType <chr> "research-article; journal article", "review…
#> $ isOpenAccess <chr> "Y", "Y", "Y", "N", "Y", "Y", "Y", "Y", "Y",…
#> $ inEPMC <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ inPMC <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ hasPDF <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ hasBook <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N",…
#> $ hasSuppl <chr> "Y", "Y", "N", "N", "N", "N", "Y", "N", "Y",…
#> $ citedByCount <dbl> 3, 10, 3, 8, 8, 3, 4, 7, 3, 3, 38, 10, 12, 4…
#> $ hasReferences <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ hasTextMinedTerms <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ hasDbCrossReferences <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N",…
#> $ hasLabsLinks <chr> "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
#> $ hasTMAccessionNumbers <chr> "Y", "N", "N", "N", "N", "N", "Y", "N", "N",…
#> $ firstIndexDate <chr> "2025-10-13", "2025-05-27", "2025-06-26", "2…
#> $ firstPublicationDate <chr> "2025-10-10", "2025-05-01", "2025-06-16", "2…
#> $ journalVolume <chr> "91", "21", "14", "113", "30", "17", "10", "…
#> $ issue <chr> "5", "5", "12", "24", "1", "5", "12", NA, NA…
#> $ ISSN.1 <chr> "01634453", "15525260", "20734409", "0896627…
#> $ ISSN.2 <chr> "15322742;", "15525279;", "", "10974199;", "…
#> $ Rank <int> 1215, 349, 1977, 115, 31880, 6244, 2223, 266…
#> $ Sourceid <dbl> 22428, 3600148102, 21100978391, 17978, 13221…
#> $ Title <chr> "Journal of Infection", "Alzheimer's and Dem…
#> $ Type <chr> "journal", "journal", "journal", "journal", …
#> $ Issn <chr> "01634453, 15322742", "15525279, 15525260", …
#> $ Publisher <chr> "W.B. Saunders Ltd", "John Wiley and Sons In…
#> $ Open.Access <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes…
#> $ Open.Access.Diamond <chr> "No", "No", "No", "No", "No", "No", "No", "N…
#> $ SJR <dbl> 2.180, 4.530, 1.687, 8.564, NA, 0.856, 1.575…
#> $ SJR.Best.Quartile <chr> "Q1", "Q1", "Q1", "Q1", "-", "Q2", "Q1", "Q2…
#> $ H.index <int> 152, 194, 196, 569, 78, 33, 62, 152, 115, 19…
#> $ Total.Docs...2025. <int> 289, 1244, 2022, 348, 174, 205, 251, 245, 14…
#> $ Total.Docs...3years. <int> 1072, 1938, 9075, 1058, 1543, 310, 579, 1625…
#> $ Total.Refs. <int> 10607, 72380, 189277, 27520, 0, 12013, 16063…
#> $ Total.Citations..3years. <int> 3945, 17963, 58352, 12733, 7128, 1050, 2788,…
#> $ Citable.Docs...3years. <int> 394, 1394, 8892, 778, 1534, 305, 576, 1459, …
#> $ Citations...Doc...2years. <dbl> 4.41, 11.93, 6.13, 11.25, 4.74, 3.50, 4.24, …
#> $ Ref....Doc. <dbl> 36.70, 58.18, 93.61, 79.08, 0.00, 58.60, 64.…
#> $ X.Female <dbl> 46.76, 49.88, 47.25, 41.08, 41.83, 45.26, 45…
#> $ Overton <int> 7, 6, 0, 2, 1, 0, 1, 0, 0, 0, 1, 91, 6, 0, 0…
#> $ Country <chr> "United Kingdom", "United States", "Switzerl…
#> $ Region <chr> "Western Europe", "Northern America", "Weste…
#> $ Coverage <chr> "1979-2026", "2005-2026", "2011-2026", "1988…
#> $ Categories <chr> "Infectious Diseases (Q1); Microbiology (med…
#> $ Areas <chr> "Medicine", "Medicine; Neuroscience", "Bioch…PubMed IDs for the matched articles:
get_word_cloud() fetches abstracts for a vector of
PubMed IDs and writes a PNG word cloud:
wordcloud_path <- "microbiome_psych_wordcloud.png"
get_word_cloud(
pubmed_ids = pmids,
plot_name = wordcloud_path
)
#> There are total 17 PMIDs
#> Warning in tm_map.SimpleCorpus(abstTxt, removePunctuation): transformation
#> drops documents
#> Warning in tm_map.SimpleCorpus(text2.corpus, function(x) removeNumbers(x)):
#> transformation drops documents
#> Warning in tm_map.SimpleCorpus(text2.corpus, tolower): transformation drops
#> documents
#> Warning in tm_map.SimpleCorpus(text2.corpus, removeWords,
#> stopwords("english")): transformation drops documents
#> png
#> 2
knitr::include_graphics(wordcloud_path)Filter the matched journals to subject areas of interest, then keep titles at or above the median SJR within that subset.
cats <- "Multidisciplinary|Neuroscience|Psychology|Psychiatry"
best_journals <- pub_data$journals |>
filter(
Type == "journal",
SJR >= median(SJR, na.rm = TRUE),
grepl(cats, Categories)
) |>
select(Title, Rank, Type, SJR, Country, Categories)
best_journals
#> # A tibble: 5 × 6
#> Title Rank Type SJR Country Categories
#> <chr> <int> <chr> <dbl> <chr> <chr>
#> 1 Neuron 115 journal 8.56 United States Neuroscie…
#> 2 Nature Communications 306 journal 4.90 United Kingd… Biochemis…
#> 3 Alzheimer's and Dementia 349 journal 4.53 United States Cellular …
#> 4 Journal of Neuroinflammation 501 journal 3.68 United Kingd… Cellular …
#> 5 Alcohol Research: Current Reviews 752 journal 2.91 United States Clinical …Adjust the category pattern and ranking rule to match your field. For
InCites data (journal_source = "incities", or aliases
incites / jcr), use columns such as
Journal.Impact.Factor instead of SJR.
Save the ranked journal table for review outside R.
get_publication_data()scimago and incities (InCites /
JCR) journal sources for your topic?get_publication_data and the function
reference for all parameters