Volume 25, Issue 8 p. 1465-1483
Research Article
Open Access

Seasonality of biogeochemically relevant microbial genes in a coastal ocean microbiome

Adrià Auladell

Corresponding Author

Adrià Auladell

Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain

Correspondence

Adrià Auladell, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Isabel Ferrera, Centro Oceanográfico de Málaga, Instituto Español de Oceanografía, IEO-CSIC, 29640 Fuengirola, Málaga, Spain.

Email: [email protected]

Josep M. Gasol, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Contribution: Visualization (equal), Writing - review & editing (equal)

Search for more papers by this author
Isabel Ferrera

Corresponding Author

Isabel Ferrera

Centro Oceanográfico de Málaga, Instituto Español de Oceanografía, IEO-CSIC, 29640 Fuengirola, Málaga, Spain

Correspondence

Adrià Auladell, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Isabel Ferrera, Centro Oceanográfico de Málaga, Instituto Español de Oceanografía, IEO-CSIC, 29640 Fuengirola, Málaga, Spain.

Email: [email protected]

Josep M. Gasol, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Contribution: Visualization (equal), Writing - review & editing (equal)

Search for more papers by this author
Lidia Montiel Fontanet

Lidia Montiel Fontanet

Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain

Contribution: Data curation (equal), Methodology (equal)

Search for more papers by this author
Célio Dias Santos Júnior

Célio Dias Santos Júnior

Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China

Contribution: Data curation (equal), Methodology (equal)

Search for more papers by this author
Marta Sebastián

Marta Sebastián

Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain

Contribution: Writing - review & editing (equal)

Search for more papers by this author
Ramiro Logares

Ramiro Logares

Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain

Contribution: Writing - review & editing (equal)

Search for more papers by this author
Josep M. Gasol

Corresponding Author

Josep M. Gasol

Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain

Correspondence

Adrià Auladell, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Isabel Ferrera, Centro Oceanográfico de Málaga, Instituto Español de Oceanografía, IEO-CSIC, 29640 Fuengirola, Málaga, Spain.

Email: [email protected]

Josep M. Gasol, Departament de Biologia Marina i Oceanografia, Institut de Ciències del Mar, ICM-CSIC, 08003 Barcelona, Catalunya, Spain.

Email: [email protected]

Search for more papers by this author
First published: 12 March 2023

Abstract

Microbes drive the biogeochemical cycles of marine ecosystems through their vast metabolic diversity. While we have a fairly good understanding of the spatial distribution of these metabolic processes in various ecosystems, less is known about their seasonal dynamics. We investigated the annual patterns of 21 biogeochemical relevant functions in an oligotrophic coastal ocean site by analysing the presence of key genes, analysing high-rank gene taxonomy and the dynamics of nucleotide variants. Most genes presented seasonality: photoheterotrophic processes were enriched during spring, phosphorous-related genes were dominant during summer, coinciding with potential phosphate limitation, and assimilatory nitrate reductases appeared mostly during summer and autumn, correlating negatively with nitrate availability. Additionally, we identified the main taxa driving each function at each season and described the role of underrecognized taxa such as Litoricolaceae in carbon fixation (rbcL), urea degradation (ureC), and CO oxidation (coxL). Finally, the seasonality of single variants of some families presented a decoupling between the taxonomic abundance patterns and the functional gene patterns, implying functional specialization of the different genera. Our study unveils the seasonality of key biogeochemical functions and the main taxonomic groups that harbour these relevant functions in a coastal ocean ecosystem.

INTRODUCTION

Microbes account for ~70% of the total marine biomass, playing key roles in ocean biogeochemical processes (Bar-On et al., 2018; Falkowski, 2012). Bacteria and archaea represent a large fraction of this biomass and hold a tremendous metabolic variability (Falkowski et al., 2008). The introduction of molecular biology techniques in the late 80s allowed to distinguish for the first time the major taxonomic groups inhabiting the seas (Giovannoni et al., 1990). With the rapid expansion of omics technologies, we are transitioning from ‘who is there’ to ‘what are they doing’, unveiling the repertoire of functional genes and their impact on environmental processes (Gasol & Kirchman, 2018). Relevant findings obtained from such technologies include, among others, the discovery of novel metabolisms such as photoheterotrophy (Béjà et al., 2000), the role of urea in nitrification by marine archaea (Alonso-Sáez et al., 2012), and the importance of heterotrophs in nitrogen fixation in the ocean surface (Delmont et al., 2018). In turn, the discovery of new metabolisms increases our understanding of how marine biogeochemical cycles operate (Grossart et al., 2020). An illustrative case is that of the proteorhodopsin (PR) gene; after its initial discovery in the year 2000, it has been shown to be present in up to 80% of the microbial community members of the sunlit ocean (DeLong & Béjà, 2010; Yooseph et al., 2007).

The study of the abundance, taxonomic diversity and geographic distribution of key marker genes has allowed scientists to investigate various metabolic steps in relevant biogeochemical cycles (see Ferrera et al., 2015 for a review in the topic). Some examples are the study of photoheterotrophy by PR-containing bacteria and aerobic anoxygenic phototrophic (AAP) bacteria (by means of PR and pufM genes, Koblížek, 2015; Pinhassi et al., 2016), carbon monoxide (CO) oxidation and uptake (cox gene, Cordero et al., 2019; Moran & Miller, 2007), nitrate assimilation (narB and nasA genes, Martiny et al., 2009), degradation of urea (Antia et al., 1991; Su et al., 2013), or phosphorus utilization (ppx genes, Dyhrman et al., 2007). Currently, both amplicon of functional marker genes and metagenomic approaches are used to unveil the environmental distribution of these relevant genes. As an example, AAPs have been investigated in multiple marine biomes through amplicon approaches (Auladell et al., 2019; Gazulla et al., 2022; Lehours et al., 2018), and it has been found that globally the most prevalent groups are Rhodobacteraceae (Alphaproteobacteria) and Haliaceae (Gammaproteobacteria, Gazulla et al., 2022; Lehours et al., 2018). However, metagenomic approaches have allowed scientists to identify previously overlooked AAPs, such as Candidatus Luxescamonaceae (Alphaproteobacteria), which has potential for carbon fixation (Graham et al., 2018). Other study cases include genes for phosphorous starvation response; phosphorous deficiency in the ocean exerts strong selective pressure on many organisms and several taxonomic groups have developed strategies to overcome this phosphorous depletion (Martiny et al., 2006). Examples range from lipid remodelling by expressing phospholipases (plcP, Carini et al., 2015; Sebastián et al., 2016) to expression of alkaline phosphatases, exploiting alternative phosphorous sources (phoX and phoD, Sebastián & Ammerman, 2009). Biogeographically, the presence of phosphorous-related genes in some taxonomic groups such as Prochloroccocus and Pelagibacterales is linked to the specific nutrient stress levels of a particular ocean basin (Haro-Moreno et al., 2020; Ustick et al., 2021).

Global expeditions in the last decade have revealed an unprecedented number of functional genes using metagenomics, providing valuable information about the diversity, phylogeny, and biogeography of these genes (Acinas et al., 2021; Salazar et al., 2019; Sunagawa et al., 2015; Ustick et al., 2021; Yooseph et al., 2007). Still, knowledge on how environmental seasonality influences biogeochemical functions is growing at a slow pace. The seasonal trends of specific groups such as photosynthetic bacteria (Paerl et al., 2012), ammonia oxidizing bacteria (Galand et al., 2010), and photoheterotrophic groups (Auladell et al., 2019; Ferrera et al., 2014; Nguyen et al., 2015) have been described through amplicon analyses. In addition, some temporal metagenomic analyses have focused on studying specific taxonomic groups through the generation of metagenome assembled genomes (MAGs; Kashtan et al., 2014; Pereira et al., 2021). Despite the potential of MAGs for the analysis of functional groups, they usually miss the contribution of rare groups as well as groups without a good genome recovery due to repetitive regions or high microdiversity (Haro-Moreno et al., 2020). Following recent discussions regarding functional redundancy (Louca et al., 2018), Galand et al. (2018) found a link between temporal community turnover and the functional repertoire of the bacterial community, hinting that functional redundancy in marine waters was rather low. Processes such as photoheterophy (pufM gene) or carbon fixation (bacterial and archaeal RuBisCO) are known to change between seasons and functional richness was found to correlate with taxonomic richness (Galand et al., 2018). Another recent study looked at microbial trait variability using multiple metagenomic time series data (Beier et al., 2020) and found that larger metacommunity sizes translates into a higher temporal variability of gene alleles. Temporal changes in the transcription of several key biogeochemical genes were also analysed during 2 years, albeit with a small sample number (Alonso-Sáez et al., 2020). The temporal expression patterns of several genes for energy conservation such as carbon monoxide oxidation (coxL), reduced sulfur (soxB), and the oxidation of ammonia (amoA) differed temporally. Although metatranscriptomic analyses allow for relevant insights, transcription changes at a faster pace than community composition (Moran et al., 2013), and long-term multiannual data can offer a more robust picture of the seasonal patterns of functional groups. Through metagenomics, it is possible to quantify the enrichment of specific functions following the community turnover typical of temperate locations (Auladell et al., 2022; Fuhrman et al., 2015; Lambert et al., 2018), and through multiyear analyses we can further test whether the pattern is recurrent.

We present here a 7-year metagenomic analysis of monthly sampling in a microbial observatory in the NW Mediterranean coastal sea (Blanes Bay Microbial Observatory, BBMO), focusing on 21 functional genes coding for key biogeochemical functions. Through the analysis of these genes and using information on the environmental variables defining seasonality of the site, we have: (i) determined when each function prevails in the ecosystem, (ii) obtained a detailed picture of the main taxonomic groups harbouring each of the selected functions, and (iii) explored whether the distribution of these functions changes seasonally among genera within the same taxonomic family.

EXPERIMENTAL PROCEDURES

Sampling and sequencing procedures

We collected surface water samples from the Blanes Bay Microbial Observatory (BBMO, 41°40′ N, 2°48′ E) as described in the study by Gasol et al. (2016). This long-term station is a shallow (~20 m) coastal site about ~1 km offshore in the NW Mediterranean coast. We sampled monthly, from January 2009 to December 2015 (7 years) and obtained 80 samples. Several environmental parameters were collected simultaneously to generate an environmental data table with a total of 23 biotic and abiotic variables (see Auladell et al., 2022 for details). We used the astronomical equinoxes and solstices to define the seasons.

About 4 L of 200-μm pre-filtered surface seawater were sequentially filtered through a 20-μm mesh, a 3-μm pore-size polycarbonate filter (Poretics), and a 0.2-μm Sterivex Millipore filter using a peristaltic pump. Sterivex units were processed to obtain the genomic DNA (see Auladell et al., 2022), which was stored at −80°C. Sequencing of the samples was carried in two batches. An aliquot of the 0.2–3 μm fraction from each sample was processed using a Kapa Hyper kit and quality control was done with an agarose gel and Qubit. Afterwards, the first 3 years were sequenced using an Illumina Hiseq4000 and for the following 4 years using Illumina NovaSeq6000 (2 × 150 bp, Centre Nacional d'Anàlisi Genòmica CNAG). The 80 samples generated a total of 22.5 billion sequences with an average 133 million reads per sample (minimum = 2.3; maximum = 232 million reads).

Trimming, assembly and gene prediction

Samples were trimmed with cutadapt v3.5 to remove low-quality reads (Martin, 2011). Each sample was then assembled individually with Megahit to obtain contigs (Li et al., 2015). For each sample, we predicted the protein coding regions in the contigs using Prodigal v2.6.3 and MetaGeneMark v3.38 (Hyatt et al., 2010; Zhu et al., 2010) considering partial ORFs and a minimum fragment of 250 bp. The redundant dataset of 500 million genes, was clustered at 95% identity and 80% coverage through Linclust v10 (Steinegger & Söding, 2018). The final catalogue consisted of 231 million gene variants. We do not use the term ortholog here since the 95% identity does not guarantee that each sequence is representative of a different species.

Gene annotation

We focused on a specific subset of genes with known functions involved in relevant metabolic processes (reviewed in the study by Ferrera et al., 2015). In particular, from the whole gene catalogue, we selected and annotated 24 relevant genes for the major biogeochemical cycles: coxL, rbcL subunit I, chiA, pufM, pufL, PR, psbA, tauA, phnD, phnM, pstS, phoX, phoD, ppx, ppk1, plcP, nifH, narB, nasA, hao, amoA, ureC, dmdA, and fecA (see Figure 1 for a description of each gene). These genes were selected based on two main criteria: they had to show high specificity for the function of interest (in order to avoid false positives and mis-assignments) and they had to be well characterized in the protein databases. Most of the genes were exclusive from bacteria and archaea, but rbcL and psbA are also found in eukaryotes. For most genes, we used the Kyoto Encyclopedia of Genes (KOFAM database) based in Hidden Markov Models (HMM) (Aramaki et al., 2020). This database consists of an HMM for each specific KEGG ortholog (KO) and a score threshold for filtering unspecific results. For phosphorous-related genes, that are more taxonomically widespread and genetically diverse, we used a reverse PSI-BLAST v2.7 and a custom perl script for filtering multiple hits against the cluster of orthologous groups (COGs) (-soft_masking true -evalue 0.1) (Altschul et al., 1990; Galperin et al., 2015). Proteorhodopsins were annotated using the MicRhoDE database through Diamond v2.0.7 (--id 70, --query-cover 80 --evalue 0.1) (Boeuf et al., 2015; Buchfink et al., 2015). The putative PRs were aligned with MAFFT v7.4 together with a set of reference sequences (Olson et al., 2018). Afterwards, we looked for the presence of the amino acids implicated in the PR ion pumping mechanism (residues 97, 101, and 108) and the variations in the amino acid shown to be important for the spectral tuning of the molecule (residue 105) (Olson et al., 2018). The most common amino acid variants for residue 105 (Q, L and M) were analysed separately (PR blue Q105, PR green L105, and PR green M105), aggregating the other variants as ‘Other PR’. Finally, we also differentiated between two types of coxL—CODHI and II—by checking for the presence of an amino acid signature distinguishing the variants (AYXCSFR, King & Weber, 2007). In this analysis, we only kept CODHI since it is the only variant with proven oxidation potential (King & Weber, 2007). After generating the abundance table, we observed that the chiA, hao, and nifH genes presented a low number of detected variants (min = 1, max = 8 variants) with a small read count per sample (min = 1, max = 420 read counts). Specifically, there was only one variant of chiA detected, while nifH presented four variants with a total read count of 336 reads, and hao presented eight variants with a total read count of 2200 reads. As a comparison, when we observed the distribution of amoA the total read count was 36,000 (min = 1, max = 2774 read counts), 16 times more than hao. Given that these three genes did not present enough data to determine with precision their temporal trends on a multi-year basis, they were excluded from subsequent analyses, which were performed using 21 genes.

Details are in the caption following the image
Diagram presenting the selected functional genes and their relevance in biogeochemical cycles. Solid lines indicate prokaryotic mediated processes and the key functional genes involved in the processes are shown in boxes. The genes ppk1, ppx and plcP are not shown in the diagram. The table below is a summary of the properties of the genes in the study. The column named ‘Relevance in biogeochemical cycles’ specifies the literature references used for selecting the genes.

Read mapping

We used Diamond to match the raw reads from each sample to our gene database (--query-cover 90, --identity 95, --top 5 --min-score 20). The output presented the top five matches for each read. Since proteins present conserved regions that could recruit reads incorrectly and to avoid missassignations, we filtered the five top matches through the Functional Analysis of Metagenomes by Likelihood Inference (FAMLI v1.2) algorithm (Golob & Minot, 2020). Briefly, FAMLI iteratively assigns multi-mapping reads to the most likely true peptide through checking the coverage evenness along the length of the sequence.

Taxonomy

To assign taxonomy to each gene variant, we used the last common ancestor (LCA) algorithm as implemented in MMSEQ2 v13 (Mirdita et al., 2021; Steinegger & Söding, 2017). For each contig in the database, MMSEQ2 predicts the individual coding sequences, establishes the putative taxonomy of the genes through the LCA, and checks the whole contig taxonomy concordance. We used the Genome Taxonomy Database (GTDB, release 95), presenting 194,600 genomes in 31,910 species clusters (Parks et al., 2018). The taxonomy was also assigned with UniRef90 to obtain matches for eukaryotic genes (Suzek et al., 2007). For the variants matching Pelagibacteraceae, an additional step was performed to improve the taxonomy assignation. In a previous study, Haro-Moreno et al. (2020) obtained SAGs of SAR11 clade bacteria collected at the BBMO long-term station. Through a BLAST analysis (-perc_identity 95, -max_target_seqs 10, -cov 95), we differentiated the gMED subclade (Haro-Moreno et al., 2020) by the matching between these variants and the SAG genomes. Table S1 links the classic NCBI nomenclature with the GTDB nomenclature, providing references with the reasoning behind specific name changes.

Statistics

We performed all the analyses with the R v3.5 language (R Core Team, 2014). We used tidyverse v1.3 to process the data and ggplot2 v3.2 for all visualizations. For the analysis of seasonality, the gene variant read counts were transformed to ratios. Sample-wise, the gene read count is divided by the geometric mean of a set of eight single-copy gene (GTP1, pheS, argS, serS, cysS, tsaD, ffh, ftsY) read counts, obtaining a ratio instead of relative abundances. Mathematically, working with ratios instead of relative abundances avoids the proportion constraints; if the multiple samples are transformed to proportions with a total of 100%, when one gene increases substantially, the others are necessarily decreasing (Gloor et al., 2017). Single-copy gene abundances are used as a denominator to obtain a common scale. The single-copy gene read count was correlated with total sample sequencing depth (see Figure S7), removing the influence of sequencing depth. To test whether each of the genes displayed seasonality—that is, recurrent changes over time—we used the Lomb–Scargle periodogram (LSP) as implemented in the lomb package v1.2 (Ruf, 1999). Briefly, the LSP determines the spectrum of frequencies (the different sine waves with periods, e.g., half a year or 1 year) composing the dataset. Afterwards, through randomization, it tests whether the observed periods could occur by chance through a random distribution (q ≤ 0.01, FDR correction). Through the peak normalized (PN) score, we determine how strong the recurrence of an analysed gene is. We considered the results as seasonal only if PN was above 8 and q ≤ 0.01. In a previous study, we used a threshold of PN ≥10, but we decided to decrease the threshold as it was considered too stringent based on an analysis of the same dataset with an alternative methodology called recurrence index (Giner et al., 2019). We found that a PN = 8 presented better concordance between methods (Supplementary Information S1). The seasonal test was only applied to gene variants present in at least eight samples (which is 10% of the samples). Finally, we wanted to disentangle whether the gene variants clustered by season. We performed an ordination analysis of all the seasonal gene variants using the Uniform Manifold Approximation and Projection (UMAP; McInnes et al., 2020). UMAP is a novel dimension reduction technique used when the datasets are complex and large. Given that we have ~7000 seasonal gene variants, this approach is faster and more comprehensive than common ordinations such as the non-metric multidimensional scaling (NMDS). To display the temporal patterns of the gene variants, we used a generalized additive model fitting the abundance values along the day of the year, allowing a smoothing parameter with 12 knots and with a cyclic cubic regression splines option to force the end and start of the year to match.

RESULTS AND DISCUSSION

Environmental setting

The BBMO is a well-studied temperate shallow coastal site in the NW Mediterranean subjected to strong seasonal forcing. Its environmental characteristics have been studied for more than 25 years, providing a rather complete understanding of the main biotic and abiotic processes determining its ecosystem's ecology (Gasol et al., 2016). The environmental seasonality is typical for a temperate coastal system (Figure S1). The summer presents low dissolved inorganic nutrients (mean of 0.6 and 0.08 μM for NO3 and PO43−, respectively) and the microbial community is strongly limited by phosphorous (Pinhassi et al., 2006; Sebastián et al., 2016). With the start of autumn, the increase of precipitation, the changes in wind regimes and the water column mixing in the nearby open seawaters, facilitate the arrival of inorganic nutrients that increase bacterial richness (Auladell et al., 2022; Mestre et al., 2020). In late winter, the ecosystem reaches the highest phytoplankton biomass (chlorophyll a, average 0.88 μg L−1), dominated by photosynthetic nanoflagellates (mainly haptophytes) and diatom blooms (Nunes et al., 2018). During spring, the growth of phytoplankton and heterotrophic bacteria (~9 × 105 cells mL−1) depletes most of the dissolved nutrients. Day length is maximal by the end of the spring (15 h), preceding the start of summer, closing the seasonal cycle. These trends vastly influence bacterial community composition, which presents strong seasonality (Alonso-Sáez et al., 2007; Auladell et al., 2022; Krabberød et al., 2022; Mestre et al., 2020).

Marker gene richness follows whole community patterns

The marker genes chosen in this study belong to various functional categories and biogeochemical cycles, outlined in Figure 1. For carbon cycle, we selected phototrophic processes (PR, pufM, pufL and psbA), carbon fixation (rbcL), oxidation of inorganic compounds such as carbon monoxide (coxL), and transport of taurine (tauA), which is also part of the sulfur cycle; for the nitrogen cycle, we chose nitrate reductases (narB, nasA), the cleavage of urea (ureC) and ammonia oxidation (amoA); for phosphorous biogeochemistry we selected phnD, phnM, pstS, phoD, phoX, ppx, ppk1, and plcP genes involved in multiple processes to overcome phosphorous starvation (Figure 1); for sulfur biogeochemistry, we analysed the dmdA gene involved in demethylation of dimethylsulfoniopropionate (DMSP); and for the iron cycle, we selected fecA, encoding for a ferric iron transmembrane transporter. A total of 93,750 gene variants—that is, sequences differentiated at 95% identity, sometimes from the same species and therefore not true orthologs—of this gene set were observed (Table 1). We calculated the total abundance of each gene and the seasonal changes in richness (Figure 2). The genes with the highest number of variants were those related to phosphorous metabolism (min 2730, max 14,683). Other genes such as tauA and fecA also presented a high number of variants (1392 and 10,926, respectively). The high variability of fecA has been discussed in a recent study (Beier et al., 2020) and has been linked to the process in which phages incorporate iron atoms into their tails to infect microbes (a theory known as ‘Ferrojan Horse Hypothesis’, Bonnain et al., 2016). Contrarily, the amoA and narB genes presented the lowest richness values (25 and 23 variants), suggesting phylogenetic constraints. Regarding proteorhodopsins, the blue-absorbing type was the most diverse (727 variants), while the two green-absorbing types presented around 300 variants each (Figure 2). The richness of most of these genes was highest during autumn and winter, with minimum values in summer, following the same pattern observed for the whole community species richness (Spearman correlation = 0.37, p = 0.004, N = 56; 16S rRNA gene data from Auladell et al., 2022), and in agreement with the observations in a nearby microbial observatory (~95 km) showing that taxonomic and functional richness are linked (Galand et al., 2018). Nonetheless, a few genes differed from this general trend; the richness of dmdA variants reached maximum values during late winter, with a median of 70 variants (Figure S2). Additionally, there was some variability in the general trend and in the number of variants from year to year (Figure S2). The gene encoding for the photosystem II (psbA) presented bimodality in its richness distribution during the 7 years; we found up to 40 variants in some samples whereas others had only 20 (Figure S2). This pattern was the result of an increase of variants from multiple cyanobacterial groups during spring and summer, the seasons where Synechococcus abundance is highest (Figure S2B). The presence of multiple Cyanobiaceae psbA variants could indicate multiple coexisting Synechococcus ecotypes during these seasons, as shown elsewhere for Prochlorococcus (Kashtan et al., 2014). At a global scale, the maximal abundance of Synechoccocus is associated with low nutrient concentrations and high temperatures (Flombaum et al., 2013; Hunter-Cevera et al., 2016). Here, we observed that there are more variants during spring and summer, coinciding with the maximal abundances. Whether this is due to different populations contributing to the spring–summer blooms or intragenomic psbA variation remains unknown. Finally, both narB and nasA—encoding nitrate reductases—followed a similar pattern with the highest diversity during summer, although this pattern was based on a small number of variants and could therefore be uncertain.

TABLE 1. Numerical summary of the selected functional genes.
Name Total variants Total evaluated Seasonal variants Percentage seasonal (%) Median richness Q25 richness Q75 richness
Carbon cycle
PR blue 727 313 143 45.7 114.5 86.8 132
PR green (M105) 329 156 60 38.5 55 46 62
PR green (L105) 250 79 32 40.5 26 22.8 30
pufM 119 33 8 24.2 10 7.8 12
pufL 117 40 11 27.5 14.5 12 17
rbcL I 228 53 20 37.7 19 17 22
psbA 545 61 11 18.0 17.5 13 41
coxL 43 19 11 57.9 7 6 8
tauA 1392 396 139 35.1 136 111.8 155.2
Nitrogen cycle
narB 23 12 7 58.3 4.5 3 6
nasA 72 18 5 27.8 5 3.5 7
amoA 25 11 5 45.5 3 2 4.5
ureC 283 69 22 31.9 19 16 28
Phosporous cycle
phnD 6044 1492 476 31.9 497.5 377.2 602.8
phnM 2730 710 175 24.6 236 201.8 262.5
pstS 10,340 2393 534 22.3 785 668.2 900.8
phoD 12,417 2973 853 28.7 945.5 746.5 1190
phoX 6964 1656 504 30.4 524.5 387.2 706.8
ppx 14,683 3450 728 21.1 1109 901.8 1263.2
ppk1 14,512 2973 480 16.1 974.5 847.2 1082
plcP 10,466 2796 735 26.3 906.5 797 1026.2
Other
dmdA 315 168 74 44.0 66 52 73
fecA 10,926 3621 1360 37.6 1335 1080.8 1506.2
Total 93,550 23,492 6393 27.2
  • Note: The genes are grouped by the biogeochemical cycles to which they are related to. ‘Annotation’ specifies the database used for annotation; ‘Total Evaluated’ indicates the number of variants present in at least eight samples; ‘Seasonal variants’ are those which are found to be seasonal according to the Lomb–Scargle test (q ≤ 0.05, PN ≥ 8); ‘Percentage seasonal’ is the % of seasonal variants with respect to the number of variants evaluated; Median, Q25 and Q75 are the median, first and third quantile of the distribution of number of variants of the specific gene.
Details are in the caption following the image
(A) Total number of variants detected for each studied functional gene and (B) total number of variants of the protheorhodopsin types. The X-axes indicate the number of variants in logarithmic scale. The main biogeochemical cycle to which the genes are associated with are presented in different panels. The colours are specific for each gene. (C) Gene richness seasonal trends. The X-axis presents the month of the year and the Y-axis the richness, scaled to the mean. Each gene presents a generalized additive model fitted to the data coloured following the colour code presented in panel A.

Most genes present seasonal changes of abundance

To test whether the studied genes presented seasonal abundance changes, we used the ratio between the read counts of a particular gene and the geometric mean of the count of eight single-copy genes (see Experimental Procedures). The values are thus abundance ratios (therein ‘abundance’), indicating the relative abundance of these genes in the bacterial genomes (Figure 3). Gene abundances can present interannual variations and to test for recurrence we used the LSP. A total of 12 out of the 21 tested genes presented a significant seasonal pattern (q ≤ 0.05, PN ≥ 8; Figure 3). For the rest of genes that were statistically not seasonal, we could differentiate between genes that presented a random pattern (e.g. fecA) from genes presenting temporal variations that were not strong enough to be detected by the Lomb–Scargle method (e.g. psbA). As an example, tauA displayed a high monthly variation but overall, its relative abundance was higher during spring and summer than during winter (Figure 3). Along with the abundance ratios changes among seasons, we also determined the changes in taxonomic composition of each of the target genes to explore which groups encoded the different biogeochemical functions (Figure 4, Figure S3).

Details are in the caption following the image
(A) Temporal distribution of the number of reads of each gene expressed as ‘read count ratio’ (reads gene/reads of single-copy genes). The X-axis indicates the day of the year (labelled by the month initials) and the Y-axis is the ratio between the reads of the gene divided by the geometric mean of a selection of eight single-copy genes (see Experimental Procedures for details). A generalized additive model is fitted to the data, coloured based on the peak-normalized value to show how strong the seasonal signal is (the PN value, based on the Lomb-Scargle test, q ≤ 0.05). (B) Temporal distribution of the ‘read count ratio’ of a selection of proteorhodopsin variants.
Details are in the caption following the image
(A) Relative distribution of each functional gene at the family level over time. The Y-axis corresponds to the relative abundance and the X-axis to the month. The colours differentiate the main family groups as explained in the legend. (B) Taxonomic distribution of a selection of proteorhodopsin variants over time. Other Alpha: other Alphaproteobacteria. Other Gamma: other Gammaproteobacteria.

The genes related to phototrophic processes (pufL and pufM–together labelled as pufLM- PR and psbA) had diverse patterns of abundance, with proteorhodopsin presenting the highest abundance (median ratio 0.5), followed by psbA (0.17) and pufLM (0.05) (Figure 3). The abundance order (PR > psbA > pufLM) is in agreement with a previous assessment of this global distribution (Finkel et al., 2013) and with the proportions observed through direct pigment estimation in the Mediterranean Sea (Gómez-Consarnau et al., 2019). Both PR and pufML presented a seasonal distribution (recurrence strength = 6.65 and 22.1, respectively), whereas psbA was not recurrent, with high variability in each month (recurrence strength = 1.6) with two peaks, one in spring and one in late summer. The genes involved in the biosynthesis of the photosynthetic reaction centre of AAPs (pufML) peaked in spring, with the highest abundances associated to the Rhodobacteraceae (Alphaproteobacteria) and Haliaceae (Gammaproteobacteria) (Figure 4). The genes related to oxygenic photosynthesis and carbon fixation (psbA and rbcL) mimicked the known recurrences of the main photosynthetic populations, with eukaryotic groups dominating during winter (Giner et al., 2019; Nunes et al., 2018), Synechococcus blooming in spring and summer, and Prochloroccus during autumn (Auladell et al., 2022; Gasol et al., 2016). Notably, a single variant with unknown taxonomy dominated psbA abundances during late spring/early summer, appearing after the spring Synechococcus bloom (dark grey in Figure 4). This psbA variant did not match any bacterial or eukaryotic known sequence, yet it had multiple matches to cyanophages (details not shown). The appearance of this variant coupled with the decrease of the spring Synechococcus bloom could be indicative of a key role of this cyanophage in the bloom demise. In fact, recent studies have shown that cyanophage psbA variants can outnumber the photosynthetic host gene copies (Sieradzki et al., 2019). Our observations point to similar conclusions, but further analyses would be beyond the scope of this study.

The distribution of rbcL followed that of photosynthetic bacteria and eukaryotes (Figure 4), and was also present in some heterotrophic groups, such as the Rhodobacteraceae and different families of Gammaproteobacteria. These findings support their potential to fix inorganic carbon (Badger & Bek, 2008). During summer, one of the most abundant groups harbouring rbcL was the genus Litoricola (Gammaproteobacteria). The potential to fix inorganic carbon in this group has recently been confirmed by using single amplified genomes (SAGs; Pachiadaki et al., 2019). Further, the various PR types presented divergent seasonal patterns (Figures 3 and 4). We observed a dominance of the blue type (abundance median = 0.32, Figure 3B), in contrast with previous results, which showed the green types to be more typical of coastal waters, whereas the blue type dominated in open waters (Pinhassi et al., 2016). Both the average chlorophyll a level of Blanes Bay (0.64 mg m−3) and the average water transparency (14 m, Gasol et al., 2016) are characteristic of an oligotrophic coastal site, which could partially explain these results. While both blue and green L105 PRs were seasonal, appearing during summer and decreasing in winter (recurrence strength = 6.6), the green M105 PR did not present a clear seasonal pattern (recurrence strength = 3.3, p = 0.21). The blue PR type was mostly found in Pelagibacteraceae, SAR86 and other Alpha- and Gammaproteobacteria (Figure 4B). In contrast, the green L105 PR was more present in SAR86 in winter and in other Gammaproteobacteria groups such as Thioglobaceae, while during summer the gene was harboured by Puniceispirillaceae and HIMB59. These two types of PRs are known to be characteristic of the typically ‘oligotrophic’ bacteria, that is, those with small sizes and small genomes (Pachiadaki et al., 2019; Spietz et al., 2019). The M105 green PR on the other hand was present mainly in Flavobacteriaceae and dominated almost entirely by Flavobacteriales. In previous studies, Flavobacteria were highly seasonal (Teeling et al., 2016) but in this study, the diverse seasonal patterns of the different genera likely mask a unified single seasonal pattern of the green M105 PR subtype (Figure S4).

We also inspected coxL and tauA genes, both involved in the carbon cycle. The former codifies for the carbon monoxide dehydrogenase that oxidizes carbon monoxide (CO) to CO2 as a supplemental energy source to survive to carbon limitation, a process that has been suggested to be relevant in the coastal ocean (Moran & Miller, 2007). The gene tauA codes for a transporter to incorporate taurine—an amino acid-like compound—into cells, one of the main contributors of carbon and energy source for bacteria in epipelagic waters (Clifford et al., 2019). The seasonal pattern of coxL had its maximum in late spring, reaching a median abundance ratio of 0.3 and nearly disappearing during winter (Figure 3). The higher values were linked to the Rhodobacteraceae, particularly to a single gene variant matching an uncultured genus –named LFER01– that was incorporated to GTDB in a recent study of the Caspian Sea (Mehrshad et al., 2016) and that belongs to the Roseobacter clade (Cunliffe, 2011; Luo & Moran, 2014). During summer, Puniceispirillaceae (Aphaproteobacteria SAR116 clade) and Litoricolaceae (Gammaproteobacteria) were also the main groups containing coxL. The highest abundance was observed during July, coinciding with the maximum coxL transcript abundance of Rhodobacterales in a coastal system in the Cantabric Sea (Alonso-Sáez et al., 2020). With a similar abundance pattern, tauA reached maximal values during spring, albeit with large variability (recurrence index = 6.1). Taxonomically, tauA was dominated by Pelagibacteraceae all year round, in agreement with a previous study showing that the SAR11 clade accounted for a large fraction of cells taking up taurine in surface waters (Clifford et al., 2019).

Focusing on the nitrogen cycle, narB —a gene encoding a subunit of the nitrate reductase known in Cyanobiaceae (Martiny et al., 2009)— presented two abundance peaks matching the recurrent Synechococcus blooms of spring and summer (see Auladell et al., 2022). Taxonomically, the spring bloom presented two main Synechococcus variants, whereas the summer bloom was formed by a single 16S rRNA gene variant (Auladell et al., 2022). During summer, the Flavobacteriaceae also contained this gene, although information regarding assimilatory nitrate reductase activity of this group in seawater is lacking. A KEGG search against Genome Taxonomy database (GTDB) showed that narB presents 314 hits in Flavobacteriaceae, a similar number of matches to those found for Cyanobiaceae. The groups harbouring nasA (nitrate reductase) were not very abundant (median ratio = 0.02) and appeared mostly during summer and autumn (Figure 3), with a single Gammaproteobacteria variant dominating from April to November (Figure 4) that was related to Pseudohongiella nitratireducens. During summer, NO3 concentration reached its lowest levels at Blanes Bay (Supplementary Figure 1), and thus, both Pseudohongiella and Cyanobiaceae could be involved in NO3 decrease alongside eukaryotes (Spearman correlation narB and nasA seasonality pattern vs. median NO3 = −0.74 and − 0.92, p = <0.005). Following the opposite seasonal pattern, the amoA gene —encoding an ammonia monooxygenase— was present during winter and disappeared during summer (mean abundance ratio = 0.002). Previous studies using qPCR for both 16S rRNA and amoA genes in Blanes Bay found that the patterns could be linked to Crenarchaeota Group I (Galand et al., 2010). Our study identifies Candidatus Nitrosopelagicus as the main contributor to amoA, a recently described archaeal group that was within the previously named Thaumarchaeota group (Rinke et al., 2021; Santoro et al., 2015). Finally, ureC —encoding a urease degrading urea to ammonium— presented a seasonal pattern with two states: high abundance during spring and summer (mean abundance ratio = 0.27) and lower values in autumn and winter (0.14). During winter, Nitrosopumilaceae and Synechococcus were the most common groups containing this gene, whereas Puniceispirillaceae, Rhodobacteraceae, and Litoricola dominated during summer. To our knowledge, the presence of the ureC gene in Puniceispirillaceae and Litoricola has only been observed in a recent SAGs sequencing study (Pachiadaki et al., 2019).

Regarding the phosphorous cycle, our results indicate a synchronized pattern for some genes and multiple different responses for others (Figure 3). The genes encoding for functions related to polyphosphate synthesis (ppk1, a polyphosphate kinase) and degradation (ppx, an exopolyphosphatase), the enzyme responsible for membrane phospholipid remodelling (plcP), and a hypothetical alkaline phosphatase (phoD) presented some of the highest abundance ratios (median ratio = 1) and a seasonal pattern with the highest values at the end of summer (Figure 3). The abundance of these genes was negatively correlated with the median phosphate concentration (Spearman correlation = between −0.81 and −0.9, p = <0.01). The particularly high abundances at the end of summer likely reflect the phosphorous limitation conditions typically occurring in this coastal site and more relevant at this time of the year (Pinhassi et al., 2006; Tanaka et al., 2009). The phoX gene did not follow this trend but presented maximal abundances in spring and autumn. The phoD alkaline phosphatase was more abundant than phoX in our system (mean abundance ratio 0.92 vs. 0.48), as observed in the Sargasso Sea using data from the Global Ocean Sampling expedition (Luo et al., 2009). Yet, a large fraction of PhoD are cytoplasmic proteins (Luo et al., 2009), so it is possible that some of the phoD variants detected are not involved in the acquisition of phosphorus from dissolved phosphoesters, which are too large to cross the cell membrane. In our study, the taxonomic distribution of phoX and phoD was also different, with phoD being mostly associated with SAR86, Haliaceae, and Flavobacteriaceae, whereas phoX was more widespread. The other studied genes (ppx, ppk1, plcP) were widely distributed, with Pelagibacteraceae dominating ppk1 taxonomy and Flavobacteriaceae dominating plcP taxonomy. The phn genes, that were initially described in a single operon, are currently known to show multiple different syntenies (Martínez et al., 2012). This could explain the differences in the abundances between phnD and phnM (Figure 3). In our analysis, phnD —coding for the phosphonate ABC transporter— was non-seasonal, whereas phnM —encoding the alkylphosphonate utilization protein— presented maxima in spring and summer. Previous results have shown that in some taxa phnD are not expressed under phosphorous limitation (Martínez et al., 2012), while in others they are (Dyhrman et al., 2006; Ilikchyan et al., 2009), pointing to a possible explanation for the lack of an overall seasonal pattern. Taxonomically, phnM was assigned to the Rhodobacteraceae and other alphaproteobacterial groups, whereas phnD was more widely distributed, a fact also reflected in the number of variants detected for each gene (2730 vs. 6044, respectively). The pstS gene —encoding a phosphate transporter— did not show seasonality and was present mainly in alphaproteobacterial groups, Nitrosopumilaceae, Gammaproteobacteria, and Cyanobiaceae. It has been hypothesized that the PstS protein may be fulfilling the role of P uptake under both phosphorous replete and depleted conditions (Orchard et al., 2009), which could explain the lack of seasonality. Overall, five out of eight of the analysed phosphorous-related genes peaked in the summer.

Lastly, we analysed a gene related to the sulfur cycle (dmdA) and an iron transporter (fecA). The dmdA gene —coding a dimethylsulfonioproprionate demethylase— turns dimethylsulfonioproprionate (DMSP) to methyl-mercaptopropionate (MMPA) to incorporate it as a source of reduced sulfur and carbon. This gene presented the highest relative abundances during spring and summer and was dominated by Pelagibacteraceae (Figures 3 and 4). During these seasons, DMSP assimilation ratios are the highest in our system (Simó et al., 2009). Other relevant groups presenting the dmdA gene were Puniceispirillaceae and the HIMB59 family (previously considered part of SAR11 clade V). The fecA gene encoding a dicitrate siderophore transporter (Schauer et al., 2008)— presented the highest abundance (mean abundance ratio = 2) of the tested genes and did not display seasonality. Its high abundance could be linked to being present in multiple copies per genome with variable affinities to different siderophores (Tang et al., 2012). Taxonomically, the main groups harbouring fecA were Gammaproteobacteria and Flavobacteria. The low abundance of alphaproteobacterial groups could be explained by the different strategies they use for incorporating iron, since Rhodobacteraceae and Pelagibacteraceae are specialized in obtaining the inorganic iron through transporters (Debeljak, 2019; Tang et al., 2012). The lack of a specific seasonal pattern in this gene is in agreement with the fact that iron is not typically a limiting factor for microorganisms in the Mediterranean Sea (Sherrell & Boyle, 1988).

Our results show that most of the functional genes present variations in abundance at a yearly scale (57% with Lomb–Scargle ≥8 PN; 71% considering all significant values), while prokaryote abundance varies little seasonally (Figure S1). These results describe for the first time the seasonal trends of multiple genes and reinforce patterns observed for some of them that had been presented before (Auladell et al., 2019; Galand et al., 2010, 2018). Several of the genes had seasonal patterns that could be linked to the abiotic environmental conditions. For other genes, such as those involved in the oxidation of carbon monoxide, we did not have any biogeochemical measurements to compare with the observed patterns. Additionally, our taxonomic analysis sheds light into the key players for each biogeochemical process, unveiling relevant groups that had not been previously considered. Examples of these are the dominance of Flavobacteria and Gammaproteobacteria in functional genes such as phoD, nasA, and fecA, or the dominance of Rhodobacteraceae, Puniceispirillaceae, and Litoricolaceae in rbcL, ureC, and coxL during summer. The importance of Puniceispirillaceae and Litoricolaceae in these biogeochemical processes was so far unknown. Focusing the analysis on specific functions has proven thus useful to unveil the most likely relevant players of prokaryote-driven biogeochemical processes. Moreover, our results suggest that the enrichment and selection of genes/functions is largely influenced by community turnover. To further investigate the actual relevance of these genes and the encoded functions, the characterization of the relative importance of seasonally varying gene expression against community turnover would still be needed.

Seasonality of individual gene variants

Having obtained an overview of the aggregated seasonal pattern for each gene and the assignment of these genes to broad taxonomic groups, we wanted to deepen our knowledge into the seasonality of each particular gene variant within the selected functional genes. We first tested whether each variant presented seasonality and for those that were seasonal, we then identified the yearly seasonal maxima (e.g. ‘for 5 of the 7 years the maximum is in winter’). Finally, we performed a Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) ordination of the seasonal variants to visualize groups of variants that clustered together (Figure 5). For the analysed genes, we found 6359 out of 23,503 seasonal variants (Lomb–Scargle test, q ≤ 0.05, PN ≥ 8). Most of these variants reached their maximum values only in a particular season (4862, 76%), while some presented maxima in two seasons (1523, 23%). Specifically, most of the seasonal variants peaked in winter (2430 variants, 38%), followed by autumn (1945, 30%), summer (1282, 20%), and spring (728, 12%). Multiple variants assigned to autumn and spring were grouped within the summer and winter clusters in the UMAP ordination (Figure 5A), corresponding to the natural transition between seasons. Specifically, we observed a ‘tail’ connecting the winter and spring clusters, corresponding to the variants presenting an abundance maximum during April, and another cluster including the autumn season data connected to the left side of the summer samples composed of October variants (Figure 5A, top left). In conclusion, the four seasons presented distinguishable clusters of variants, albeit with summer and winter being clearer, and the spring and autumn clusters being more variable. The variability of these two seasons can probably be attributed to the heterogeneous meteorological conditions, not always coinciding with the equinoxes and solstices dates used here to define the seasons.

Details are in the caption following the image
(A) Annual patterns of seasonal gene variants. Uniform Manifold Approximation and Projection (UMAP) ordination of all the gene variants presenting seasonality. The colours differentiate the seasons according to the day of the year (following the astronomical definition). (B) Count of seasonal variant. Number of seasonal variants for each family (Y-axis) and season (X-axis). The coloured values indicate the season in which each specific group had its the maximum. (C) Total relative abundance for seasonal variants. Distribution of the seasonal gene variants. The X-axis represents the total relative abundance of the specific gene for each season and the Y-axis indicates the seasons. The colours differentiate the main family groups as shown in the figure legend. Other Alpha: other Alphaproteobacteria. Other Gamma: other Gammaproteobacteria.

We observed that taxonomy rather than the specific function explained the variant's seasonal pattern (Figure 5B). For most metabolic functions, the season presenting most seasonal variants was winter (Figure 5B). During this season, the system reached its highest taxonomic richness, possibly due to the mixing of waters carrying more nutrients and the increased resuspension that likely triggered the growth of multiple ‘rare’ bacteria (Auladell et al., 2022). Additionally, the genes presenting seasonality appearing in spring were from specific taxonomic groups such as Haliaceae, Rhodobacteraceae, and Synechococcus, whereas autumn was dominated by other groups, such as Prochlorococcus and HTCC2089 (Pseudomonadales). On the other hand, some genes involved in phosphorous uptake presented seasonal patterns linked to specific abiotic conditions rather than to taxonomic composition. Genes such as phnM, phoD, ppx, ppk1, and plcP presented the most abundant seasonal variants during summer, matching the aggregated seasonal pattern observed discussed above (Figures 3 and 5C).

Contrasted gene repertoire in the Pelagibacteraceae and SAR86 families

Given this different behaviour of the phosphorous genes, we further tested whether there were cases in which the phosphorous gene pattern deviated from the seasonal pattern of the taxonomic family, and thus, if there was niche differentiation at lower taxa levels. At the family level only 91 families presented at least 10 seasonal gene variants. Out of them, 25 families (27%) presented a higher proportion of genes related to phosphorous during the summer season. To test for differences between the phosphorous functional pattern and the taxon abundance pattern, we compared the 16S rRNA gene data from our previous study (Auladell et al., 2022), aggregated at the family level, to the seasonal variant patterns of the phosphorous genes (Figure S5). Families such as Puniceispirillales and Litoricolaceae presented the same abundance maxima for the 16S rRNA gene than for the phosphorous gene repertoire (Figure S5). These taxa appeared to be adapted to oligotrophic summer conditions and possibly presented a genomic repertoire that helped them to cope with phosphorous limitation. In contrast, based on the phosphorous genes, Pelagibacteraceae and SAR86 presented seasonal deviations from the 16S rRNA gene abundance patterns (Figure 6). These results indicate that at lower taxonomic levels, specific ecotypes could present a differentiated genomic repertoire adapted to the varying seasonal conditions.

Details are in the caption following the image
(A) Seasonal patterns of the Pelagibacteraceae and D2472 (SAR86) families defined using the 16S rRNA gene. The X-axis indicates the month and the Y-axis the relative abundance of the family 16S rRNA counts (obtained from Auladell et al., 2022). A generalized additive model smooth is adjusted to the data points. (B) Seasonal distribution of the phosphorous gene variants for the selected families. The X-axis indicates the season and the Y-axis the total relative abundance of each gene at that season. Each panel differentiates the gene variants by its seasonal maxima. The colours differentiate the various genes for phosphorous uptake.

Given the observed differences at the family level for genes linked to phosphorous uptake, we wanted to compare the studied marker genes of Pelagibacteraceae and SAR86 at the genus level (Figure S6). The most abundant seasonal genera within Pelagibacteraceae were Pelagibacter (SAR11 clade I), MED-G40 (SAR11 subclade IIa), Pelagibacter_A (SAR11 clade II), HIMB114 (SAR11 clade III), and AG-414-E02 (SAR11 subclade Ic). Within the Pelagibacter genus, we also identified one Mediterranean species (known as gMED as described in the study by Haro-Moreno et al., 2020) through a BLAST against SAGs reconstructed from our long-term station in a previous study (Haro-Moreno et al., 2020). Some variants within Pelagibacter, corresponding to the gMED, the MED-G40 genus, and Pelagibacter_A presented an abundance pattern with a maximum during summer, whereas other variants peaked in winter (Figure S6). For MED-G40, the variants of phoD, pstS, and plcP presented a higher total relative abundance during summer, as did those of phoX, ppk1, and ppx in Pelagibacter_A. In the abovementioned study about the ecogenomics of the SAR11 clade, Haro-Moreno et al. (2020) showed an increase in the genes for phosphorous processes in the gMED species as compared to SAR11 genomes from other latitudes. Our results extend the results to other Pelagibacteraceae genera such as HIMB114 and Pelagibacter_A, in agreement with the view that P exerts a strong selecting pressure for members of the SAR11 clade (Coleman & Chisholm, 2010).

In some cases, the seasonal variants within a genus presented big differences in abundance with other seasonal variants. For the Pelagibacter gMED species, the dmdA gene variant was more abundant than the rest of the genes, such as plcP and ppk1. By doing a BLAST search of the dmdA variant against all the SAGs from Haro-Moreno et al., 2020, we observed that this variant was conserved among multiple SAGs. The differences in abundance within the same genus therefore could be explained by the 95% identity clustering used in this study, which probably aggregates multiple variants together. As in most gene catalogues, the clustering step can influence the structure of the operational unit, and the specific taxonomic resolution should be approached with care at the lowest taxonomic levels (Commichaux et al., 2021).

The D2472 family (SAR86 clade) also presented a higher contribution of phosphorous genes during summer, albeit without the pronounced change seen in the SAR11 clade. The genera presenting seasonal variants were D2472, MED-G78, SAR86A, and SCGC−AAA076−P13. Unfortunately, for the SAR86 clade, there are no detailed phylogenies and genome descriptions compared to the SAR11 clade. A recent study found five differentiated clusters of the clade but without much discussion of the genomic repertoire of each cluster (Hoarfrost et al., 2020). In Blanes Bay, both the SAR86A and SCGC−AAA076−P13 groups presented summer ecotypes containing a high proportion of genes linked to phosphorous uptake (Figure S6). SCGC−AAA076−P13 was the only genus containing pstS and phnM. A pangenomic analysis comparing the genomic repertoire of both Pelagibacterales and/or SAR86 using seasonal time-series data would help disentangle the complete genomic repertoire differentiation beyond their phosphorous gene differences. Overall, these results show that adaptation to nutrient limitation have occurred at multiple taxonomic levels, with groups such as Puniceisipirillaceae presenting adaptations at the family level, whereas groups such as Pelagibacteraceae and D2472 containing specific genera adapted to the oligotrophic summer conditions. Furthermore, our results indicate that trait plasticity linked to the nutrient stress observed on a biogeographical dimension (Ustick et al., 2021) can also be detected at the seasonal scale.

CONCLUSIONS

We explored the seasonal patterns of 21 key biogeochemical marker genes using a 7-year metagenomic time series from a coastal site. Our data show that the marker genes presenting the highest richness were related to phosphorous starvation and Fe3+ dicitrate transport and that, generally, the patterns of gene richness followed the species richness of the whole community. Most of the studied genes presented recurrent seasonal dynamics with succession between the different taxonomic groups. Genes such as pufML, coxL, ureC, and tauA were predominant during spring, phosphorous cycling genes were enriched during summer while amoA presented its maximum during autumn and winter. We also identified the main taxonomic groups containing these functions and identified groups that previously had been not considered as relevant in certain functions, such as Litoricola for carbon fixation, CO oxidation, and urea degradation. Finally, fine-grained seasonal patterns (i.e. individual gene variants) showed that the patterns of abundance presented by the phosphorous marker genes within the Pelagibacteraceae and D2472 (SAR86 family), differed from those at the family level. Our data provides a framework to understand the seasonality of key biogeochemical processes in the coastal ocean and to generate new hypotheses about the relevance of specific organisms in each of these processes.

AUTHOR CONTRIBUTIONS

Adrià Auladell Martín: Visualization (equal); writing – review and editing (equal). Isabel Ferrera: Visualization (equal); writing – review and editing (equal). Lidia Montiel Fontanet: Data curation (equal); methodology (equal). Célio Dias Santos Júnior: Data curation (equal); methodology (equal). Marta Sebastián: Writing – review and editing (equal). Ramiro Logares: Writing – review and editing (equal). Josep M. Gasol: Writing: review and editing (equal).

ACKNOWLEDGEMENTS

The authors thank all the people involved in operating the BBMO, especially Clara Cardelús and Captain Anselm for facilitating sampling, and Vanessa Balagué for laboratory procedures, including DNA extraction. The authors also thank the MarBits Bioinformatics platform of the Institut de Ciències del Mar, in particular Pablo Sánchez, for computing support. This research was funded by grants REMEI (CTM2015-70340-R), MIAU (RTI2018-101025-B-I00), and ECLIPSE (PID2019-110128RB-100) from the Spanish Ministry of Science and Innovation, and received support from the Spanish government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000924-S). Adrià Auladell was supported by a Spanish FPI grant from the Ministry of Science and Innovation. Isabel Ferrera had the support of a ‘2019 Leonardo Grant for Researchers and Cultural Creators’ from the BBVA Foundation. The foundation takes no responsibility for the contents of this publication, which are entirely the responsibility of its authors.

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest.

    DATA AVAILABILITY STATEMENT

    All the code used for the analyses can be found in the following repository: https://github.com/adriaaulaICM/key_biogem_genes. The data to reproduce the analyses can be found in the same repository. The reader can find in the data folder a FASTA file for all the gene variants, the abundance count table, the taxonomic assignation (both with GTDB and Uniref databases), the functional annotation for each variant, and the seasonality test.