diff --git a/bib/proceedings-of-the-royal-society-b.csl b/bib/proceedings-of-the-royal-society-b.csl new file mode 100644 index 0000000..93e6707 --- /dev/null +++ b/bib/proceedings-of-the-royal-society-b.csl @@ -0,0 +1,197 @@ + + diff --git a/ecoevo_1000.Rproj b/ecoevo_1000.Rproj new file mode 100644 index 0000000..8e3c2eb --- /dev/null +++ b/ecoevo_1000.Rproj @@ -0,0 +1,13 @@ +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX diff --git a/ms/ms.qmd b/ms/ms.qmd index 3936cea..eab57dd 100644 --- a/ms/ms.qmd +++ b/ms/ms.qmd @@ -1,7 +1,7 @@ --- title: "The promise of community-driven preprints in ecology and evolution" bibliography: ../bib/refs.bib -csl: ../bib/nature.csl +csl: ../bib/proceedings-of-the-royal-society-b.csl format: docx: reference-doc: ../bib/template.docx @@ -21,8 +21,8 @@ crossref: fig-title: 'Figure' fig-labels: arabic title-delim: "-" - fig-prefix: "Figure" - tbl-prefix: "Table" + fig-prefix: "figure" + tbl-prefix: "table" --- ```{r, setup} @@ -107,7 +107,10 @@ $\ddagger$ corresponding author, daniel.noble@anu.edu.au \* equal contribution -# Introduction +# Abstract +Publishing preprints has become an entrenched practice across a multitude of scientific disciplines and it is quickly becoming commonplace in ecology and evolutionary biology. Preprints can facilitate the rapid sharing of scientific knowledge to establish precedence, speed up the dissemination of research findings, and enable feedback from the research community before peer review. Yet, significant barriers to preprint use exist including language barriers, a lack of understanding about the benefits of preprints and a lack of diversity in the types of research outputs accepted (e.g., reports). Community driven preprint initiatives can allow a research community to come together to break down these barriers and move scientific publishing practices into new and exciting directions that promote greater equity and better coverage of global knowledge. Here, we explore the first preprints uploaded to *EcoEvoRxiv*, a community-driven preprint server for ecologists and evolutionary biologists, to characterise preprint practices in ecology, evolution and conservation. Our perspective piece highlights some of the unique initiatives that *EcoEvoRxiv* has taken to break down barriers to scientific publishing by exploring the composition of articles, how gender and career stage influence preprint use, whether preprints are associated with greater open science principles (e.g., code and data sharing), and tracking preprint publication outcomes. Our analysis identifies areas that we still need to improve upon but highlight how community-driven initiatives, such as *EcoEvoRxiv*, can play a crucial role in shaping publishing practices in biology. + +# 1. Introduction Publishing preprints -- papers communicating non-peer-reviewed research findings -- is now an entrenched practice across a multitude of scientific disciplines [@Ginsparg2011]. Preprints in biology have had a slower uptake relative to other disciplines [@Berg2016], but new discipline-specific preprint servers, such as *EcoEvoRxiv* (https://ecoevorxiv.org), provide a means by which ecologists and evolutionary biologists can disseminate research findings. Preprints attempt to break down barriers to scientific publishing by: 1) increasing the visibility of research and the speed at which research findings become available, which can lead to more citations [e.g., @colavizza2024analysis; @fu2019releasing]; 2) helping establish the precedence of research findings; 3) removing financial barriers to open access publication; and 4) enabling feedback from the research community [@Proulx2013; @bourne2017ten; @vale2015accelerating]. Ultimately, preprints can facilitate the rapid sharing of scientific knowledge that can have significant impacts on fundamental and applied knowledge globally [@ni2024preprint]. @@ -117,13 +120,13 @@ Preprint servers can empower researchers to make their research findings more ac Here, we explore the first preprints uploaded to *EcoEvoRxiv* to characterise preprint practices in ecology and evolution. We aim to understand: 1) in what countries authors who use *EcoEvoRxiv* are located; 2) the taxonomic diversity study systems across preprints; 3) whether preprint server use depends on career stage and gender; 4) the extent to which authors make use of preprint servers for reports and community-driven peer review; 5) the extent to which data and code are shared in preprints; and 6) how many preprints remain unpublished, and for those that are published, how long it took for them to become published. In the process, we also provide a summary of what makes *EcoEvoRxiv* distinct from other preprint servers to help further clarify the benefits of using community-driven preprint servers to disseminate research findings. -# Getting to know your *EcoEvoRxiv* preprint server +# 2. Getting to know your *EcoEvoRxiv* preprint server *EcoEvoRxiv* is run by the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE)[@o2021towards]. Originally launched in 2018 on the Center for Open Science preprint platform, *EcoEvoRxiv* has become a popular preprint server for ecologists and evolutionary biologists. The server has since been adopted by the California Digital Library (CDL). Editors are ecologists and evolutionary biologists from across the globe who volunteer their time to screen papers and push new initiatives in the preprint space. *EcoEvoRxiv* allows authors to post both preprints and postprints (also known as author-accepted manuscripts). While preprints are versions of manuscripts posted by authors before peer-review, postprints are versions of peer-reviewed and accepted articles but without typesetting and formatting by a journal. The main reason for publishing postprints on a preprint server is to ensure published articles are openly accessible to everyone without a paywall (i.e., green open access). Postprints can be published anytime, provided that journals allow it (which most do; see https://www.sherpa.ac.uk/romeo/). ```{r, fig-summary} #| label: fig-summary -#| fig-cap: Summary of articles posted to *EcoEvoRxiv*. A) Number of articles (preprints and postprints) published on *EcoEvoRxiv* between 2018 and 2023. *EcoEvoRxiv* was established in June 2018 before the launch in November 2018. Notable milestones include *EcoEvoRxiv* transitioning to the California Digital Library (CDL), the acceptance of preprints and postprints in Spanish and Portuguese, and the acceptance of the first IUCN Red List Ecosystem report; B) Geographic origin of the preprints and postprints uploaded to *EcoEvoRxiv*, inferred from the country of affiliation of the submitting author; C) Taxa used/covered in the articles posted to *EcoEvoRxiv* (n = 1080 articles); D) Types of preprints accepted on *EcoEvoRxiv* (n = 620 articles). E) Academic age of authors posting to *EcoEvoRxiv* along with the gender of the submitting author. Values lower than zero are indicative of authors who uploaded preprints before their first scientific publication in a journal. Map base source: R Package "maps" v.3.4.2. Shapefile: Natural Earth https://www.naturalearthdata.com/about/terms-of-use/. +#| fig-cap: "Summary of articles posted to *EcoEvoRxiv*. A) Number of articles (preprints and postprints) published on *EcoEvoRxiv* between 2018 and 2023. *EcoEvoRxiv* was established in June 2018 before the launch in November 2018. Notable milestones include *EcoEvoRxiv* transitioning to the California Digital Library (CDL), the acceptance of preprints and postprints in Spanish and Portuguese, and the acceptance of the first IUCN Red List Ecosystem report; B) Geographic origin of the preprints and postprints uploaded to *EcoEvoRxiv*, inferred from the country of affiliation of the submitting author; C) Taxa used/covered in the articles posted to *EcoEvoRxiv* (n = 1080 articles); D) Types of preprints accepted on *EcoEvoRxiv* (n = 620 articles). E) Academic age of authors posting to *EcoEvoRxiv* along with the gender of the submitting author. Values lower than zero are indicative of authors who uploaded preprints before their first scientific publication in a journal." #| echo: false #| warning: false @@ -351,30 +354,30 @@ ggsave(here("output", "figs", "sex_ratio.svg"), sex_ratio) ``` -### *Overview of **EcoEvoRxiv** preprints (and postprints)* +### (a) *Overview of **EcoEvoRxiv** preprints (and postprints)* To better understand preprint (and postprint) use on *EcoEvoRxiv*, we downloaded metadata on the articles available on *EcoEvoRxiv* as of `r gsub("UTC", "", dates[2])` (see Supplement for more details on methods). We consider both preprints and postprints as ‘articles’. After removing five duplicate titles – suggesting that a few authors created multiple submissions for the same preprint rather than updating the existing submission – we had data for a total of `r pr` articles with ~55--60 preprints published approximately monthly in the last two years ([@fig-summary]A). *EcoEvoRxiv* hosts articles from authors based in `r length(unique(countries$country))` countries, with 90% of the articles coming from just `r nrow(most_use)-1` countries. North America, Australia, and European countries upload the most preprints, with many fewer coming from countries in Africa, Central America, and parts of Asia ([@fig-summary]B). Articles covered all major taxonomic groups, with the most common being vertebrates (`r sum_taxa$perc[6]`%), plants (`r sum_taxa$perc[1]`%), and invertebrates (`r sum_taxa$perc[5]`%) ([@fig-summary]C). -### *Diversifying article types on **EcoEvoRxiv**: overcoming the 'grey literature' problem* +### (b) *Diversifying article types on **EcoEvoRxiv**: overcoming the 'grey literature' problem* Accepting a greater diversity of article types allows *EcoEvoRxiv* to help deal with the 'grey literature' problem, whereby data that are relevant for research syntheses are not published in typical peer-reviewed journals [@haddaway2015shades; @haddaway2020eight]. *EcoEvoRxiv* has made a concerted effort to diversify the types of articles accepted. This is reflected by `r nt`% of the articles on *EcoEvoRxiv* being books, book chapters, reports, and other research output types, which are typically considered ‘grey literature’ in ecology and evolutionary biology. As a result, articles on *EcoEvoRxiv* are more diverse than those on preprint servers which have more restrictive submission policies. For example, *bioRxiv* only accepts research articles (https://www.biorxiv.org/submit-a-manuscript). Typical research articles are still the most common type of preprint on *EcoEvoRxiv* (`r sum_types[sum_types$type == "research article",]$perc`%), followed by reviews and meta-analyses (`r sum_types[sum_types$type == "reviews and meta-analyses",]$perc`%) and opinion papers (`r sum_types[sum_types$type == "opinion",]$perc`%) ([@fig-summary]D). Currently, *EcoEvoRxiv* does not host many reports, particularly from government or industry, but has formed fruitful partnerships with the International Union for Conservation of Nature (IUCN). For example, IUCN Red-list Ecosystem Reports are now posted to *EcoEvoRxiv* and our community has been able to work closely with the IUCN to ensure these documents meet the IUCN requirements. We would encourage authors to consider posting books, book chapters, and reports to ensure that they are openly accessible and more easily found. *EcoEvoRxiv* articles are given a unique DOI and are indexed on Google Scholar. -### *Breaking down language barriers to scientific communication: improving diversity and data representation globally* +### (c) *Breaking down language barriers to scientific communication: improving diversity and data representation globally* A significant barrier to the communication of research findings is the fact that they are primarily communicated in English [@amano2016languages; @amano2013four; @amano2021ten]. Research communication through a single language has major consequences for the global distribution of knowledge, resulting in knowledge gaps across some of the most biodiverse and threatened regions in the world [@amano2023role; @zenni2023multi]. Such gaps also impact research syntheses and meta-analyses because they create a distorted picture of our knowledge base that can affect future research, policy development and decision-making [@hannah2024language; @zenni2023multi; @white2021geographical; @konno2020ignoring]. *EcoEvoRxiv* is the only preprint server to date that breaks down language barriers to scientific communication by accepting not only English, but also Spanish, Portuguese and French language articles. *EcoEvoRxiv* plans to expand to other languages as new non-English editors become available. Such initiatives are incredibly important if we are to begin filling global voids of scientific knowledge. However, multilingual initiatives have been slow to take off on *EcoEvoRxiv*, with only a few Spanish articles, and a single Portuguese article, posted since starting to accept non-English articles in 2023. Part of the challenge in getting authors to submit non-English articles is the lack of awareness of *EcoEvoRxiv* in non-English speaking countries, cultural differences in the perception of preprints, and a strong reliance on traditional publishing models that typically mandate publishing in English [@arenas2024academic]. -### *Generational and gender-based gaps in preprinting practices* +### (d) *Generational and gender-based gaps in preprinting practices* Research papers can take a while to be published (see below). However, Early and Mid-Career Researchers (EMCRs) (~10 years post-PhD) are under pressure to publish rapidly to be competitive in job applications, promotions, and obtaining grants to progress their careers [@vale2015accelerating; @sarabipour2019value]. Preprints are one way EMCRs can achieve faster dissemination and greater visibility [@fu2019releasing]. As such, EMCRs may be expected to make use of preprints more than colleagues at later career stages. We collected data on the 'academic age' of submitting authors by looking at Google Scholar profiles of authors (when available) and recording their first year of publication in a peer-reviewed journal. While this is a rough estimate of career stage, there was evidence that the number of preprints posted decreases with later career stages (negative binomial glm: year slope = `r res[2, "Estimate"]`, SE: `r round(res[2, "Std. Error"], 2)`, *p* < 0.001, n = `r nrow(count_dat[complete.cases(count_dat),])` years). Most preprints were submitted by authors who published their first paper in the last ~10 years ([@fig-summary]E), with the median year since first publication being `r median(data2$submitting_author_first_publication_year, na.rm = TRUE)` (mean = `r mean(data2$submitting_author_first_publication_year, na.rm = TRUE)`; SD = `r sd(data2$submitting_author_first_publication_year, na.rm = TRUE)`, n = `r sum(complete.cases(data2$submitting_author_first_publication_year))`). These patterns support the expectation that EMCRs may use preprints to make their work more visible and disseminate their findings more quickly. However, we acknowledge that validating this conclusion does require more rigorous experimental approaches. Gender differences in preprint use and publication outcomes have also been observed in several research fields, including ecology and evolutionary biology [@fox2019gender; @wehner2020comparison]. Therefore, such discrepancies are expected to manifest in preprint use on *EcoEvoRxiv*, but it is unclear to what extent. Understanding gender publishing patterns is challenging with observational data such as ours because we cannot know the gender of authors for certain, but we can use a data-driven approach to ascertain the probability that a particular name is of a given gender (man or woman). To obtain a rough idea of an author's gender, we used the R package *gender* (v.`r utils::packageVersion("gender")`; @mullen2021predict) to predict the most likely gender of the submitting author of a preprint. We only used algorithm-assigned names where the gender was identified with 95% certainty. For the remaining names, we performed manual searches to determine gender based on the pronouns and photographs from professional and personal websites. We acknowledge that our approach does not capture self-assigned and non-binary genders. As such, our assumptions about an author's gender identity may be incorrect. Our data on gender had only two missing values--one where the first name of the submitting author was missing and the other one for a collective submission. As expected, we found that women were less likely to publish on *EcoEvoRxiv* compared to men (women: `r (table(data4$gender)[1] / sum(table(data4$gender))*100)`%), which may reflect the broader publishing disparities between male and female scientists [@fox2019gender]. -# Following the journey of a preprint on *EcoEvoRxiv*: from submission to publication +# 3. Following the journey of a preprint on *EcoEvoRxiv*: from submission to publication ```{r, fig-pubsummary} #| label: fig-pubsummary #| echo: false @@ -453,19 +456,19 @@ fig2 ``` -### *Science takes time, but publication could take longer* +### (a) *Science takes time, but publication could take longer* Increased competition in science has raised the bar with respect to the amount of data required for publication [@vale2015accelerating]. This requirement is a good outcome if it results in higher-impact research that better clarifies our understanding of the natural world, but it does come at a cost for the speed of research dissemination [@vale2015accelerating]. Preprints have been proposed as a way to disseminate research more quickly as it can take a long time before results are ultimately published [@bourne2017ten; @vale2015accelerating]. However, data on the time to publication is needed to quantify the real benefit of preprints in this context. We estimated how long it takes to publish a research paper in ecology and evolution by recording the time between when an article was first posted on *EcoEvoRxiv*, and its final acceptance in a peer-reviewed journal. In total, `r unpub` papers remained unpublished (`r (unpub / nrow(data2))*100`%, n = `r nrow(data2)`) at the time when these data were collected. Not all of these papers, however, are anticipated to be published in a peer-reviewed journal (e.g., reports). Nonetheless, the median time to publication for preprints was `r sum_preprints$median` days (8 months) (mean = `r sum_preprints$mean`; SD = `r sum_preprints$sd` days) with the maximum time to publication being `r sum_preprints$max` days or `r sum_preprints$max/365` years ([@fig-pubsummary]A). Our results largely confirm the extended timeframes that most authors experience between writing their research papers and their publication. -### *Cautious 'open'-mindedness of research in preprints* +### (b) *Cautious 'open'-mindedness of research in preprints* In addition to speeding up dissemination, preprints and postprints can also be a useful way to ensure that research remains open and accessible to the research community irrespective of the accessibility of the final peer-reviewed paper [@bourne2017ten; @vale2015accelerating]. We evaluated whether articles hosted at *EcoEvoRxiv*, and that were also published in a journal, were published open access. The open access status of each published article was obtained using the R package *roadoi* (v.`r utils::packageVersion("roadoi")`) to connect to the Unpaywall platform [@jahn2024]. Most of the published articles were open access [`r sum_is_oa$valid_percent[2]*100`% (*n* = `r sum_is_oa$n[2]` out of `r sum(sum_is_oa$n[1:2])` where the status was known); [@fig-pubsummary]A]; however, `r sum_is_oa$valid_percent[1]*100`% (*n* = `r sum_is_oa$n[1]`) were published behind paywalls. For articles published in open access journals, the type of open access also varied widely (e.g., Gold, Hybrid, Green OA etc., [@fig-pubsummary]B). Data and code sharing are also key components of open science [@roche2015public]. In the spirit of 'openness', we expected data and code sharing among preprints to be greater than in many papers published in research journals [@roche2015public; @o2021preferred]. Despite this, we found that `r (preprint_data[1,"n"]/sum(preprint_data[c(1,3),"n"]))*100`% (*n* = `r preprint_data[1,2]`) of articles relying on data on *EcoEvoRxiv* did not share data, and `r (preprint_code[1,"n"]/sum(preprint_code[c(1,3),"n"]))*100`% (*n* = `r preprint_code[1,2]`) did not share code (counting only data-based articles, i.e., excluding reviews, commentaries or theoretical works). Authors may be reluctant to share data and code for preprints because of the perceived concern that others may acquire and use their data and code before publication in a journal. Authors of `r pre_article_data[3,"percent"]*100`% (*n* = `r pre_article_data[3,2]`) of articles that did not share data at the preprint stage did ultimately share data when the article was published; whereas authors of `r pre_article_data[2,"percent"]*100`% (*n* = `r pre_article_data[2,2]`) never shared data. However, `r pre_article_data[1,"percent"]*100`% (*n* = `r pre_article_data[1,2]`) shared data at both stages. The same was true for code. Overall, `r pre_article_code[3,"percent"]*100`% (*n* = `r pre_article_code[3,2]`) preprints had no open code at the preprint stage but did at the published article stage and authors of `r pre_article_code[2,"percent"]*100`% (*n* = `r pre_article_code[2,2]`) preprints did not share code at either stage. However, `r pre_article_code[1,"percent"]*100`% (*n* = `r pre_article_code[1,2]`) shared code at both stages. Relatively low code and data-sharing practices in our sample is consistent with analyses of sharing practices for published articles (e.g., [@o2021preferred]), even for journals with strict public data archiving policies [@roche2015public]. -# Paving our future to open, transparent and community-driven science +# 4. Paving our future to open, transparent and community-driven science Our analysis has allowed us to better understand preprinting/postprinting practices in *EcoEvoRxiv*. Overall, *EcoEvoRxiv* articles are diverse but with primary research articles on vertebrates comprising most of the articles posted. North America, Europe and Australia use *EcoEvoRxiv* the most with very few non-English language articles to date. Submitting authors who were earlier in their career and more often with ‘male-associated names’ tended to use *EcoEvoRxiv* the most. Articles posted to *EcoEvoRxiv* tend to take up to 8 months to become published with many articles not being open access. Code and data sharing was also relatively uncommon at the preprint stage. At the same time, we attempted to collect data on community discussion around preprints no such data was found on preprint landing pages, likely reflecting inadequate functionality and cross-linking with sources where such discussion is occurring. Based on the insights from our analysis, we provide recommendations to authors and the scientific community on ways they can further promote open and transparent research through preprints: @@ -476,13 +479,16 @@ Our analysis has allowed us to better understand preprinting/postprinting practi Despite the early successes of the new initiatives taken by *EcoEvoRxiv*, as described above, much work remains to be done to improve the understanding and use of pre- and postprints within our community. We view this perspective piece as a small step towards achieving that goal. We hope that readers are more familiar with the benefits of using community-driven preprint servers and the unique initiatives they can pursue. Community-driven preprint servers can set their own agenda and are driven by the needs and desires of the community. Supporting these initiatives should be a priority for all researchers. Volunteers at *EcoEvoRxiv* are encouraged to remain open to new and innovative ways to improve publication and open science practices. We believe that the future of preprints is bright, and community-driven initiatives, such as *EcoEvoRxiv* will play a crucial role in the future of scientific publishing. -# Acknowledgements -We would like to thank the California Digital Library (CDL) and the CDL team (particularly, Alainna Wrigley, Justin Gonder, Lisa Schiff, Catherine Mitchell, Hardy Pottinger and Amanda Karby) for their support in hosting and maintaining *EcoEvoRxiv* for the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE). We would like to thank Gabriela Hidalgo and Daisy Larios for helping connect us with the IUCN and facilitating discussions to make *EcoEvoRxiv* a place where IUCN reports can be posted. Finally, we would also like to thank the endless number of SORTEE volunteers, and those especially on the *EcoEvoRxiv* Committee, who have helped to make *EcoEvoRxiv* a success. This paper emerged from a hackathon at the 2023 SORTEE conference, and we thank delegates who attended the session but could not be part of this paper. DWAN would also like to thank the Australian Research Council for a Future Fellowship (FT220100276). SN and ML are supported by the Australian Research Council (ARC) Discovery Project Grants (DP210100812 and DP230101248). - -# Data and Code Availability +# Data accessibility All data and code can be found on GitHub at: https://github.com/daniel1noble/ecoevo_1000 -# Conflict of Interest +# Competing interests The authors would like to acknowledge competing interests on the perspectives presented in this paper given that many (DWAN, SN, ML) are founding members of *EcoEvoRxiv* and/or are part of the *EcoEvoRxiv* committee. +# Funding +DWAN would also like to thank the Australian Research Council for a Future Fellowship (FT220100276). SN and ML are supported by the Australian Research Council (ARC) Discovery Project Grants (DP210100812 and DP230101248). + +# Acknowledgements +We would like to thank the California Digital Library (CDL) and the CDL team (particularly, Alainna Wrigley, Justin Gonder, Lisa Schiff, Catherine Mitchell, Hardy Pottinger and Amanda Karby) for their support in hosting and maintaining *EcoEvoRxiv* for the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE). We would like to thank Gabriela Hidalgo and Daisy Larios for helping connect us with the IUCN and facilitating discussions to make *EcoEvoRxiv* a place where IUCN reports can be posted. Finally, we would also like to thank the endless number of SORTEE volunteers, and those especially on the *EcoEvoRxiv* Committee, who have helped to make *EcoEvoRxiv* a success. This paper emerged from a hackathon at the 2023 SORTEE conference, and we thank delegates who attended the session but could not be part of this paper. + # References \ No newline at end of file diff --git a/plos_biol/ms.pdf b/plos_biol/ms.pdf new file mode 100644 index 0000000..62f288a Binary files /dev/null and b/plos_biol/ms.pdf differ