2020年10月biorxiv生信好文速览_生物研究_实用技巧

1.【组装】德国马普植物育种所（Max Planck Institute for Plant Breeding）学者开发长读段无空缺染色体组装方法

GALA: gap-free chromosome-scale assembly with long reads

High-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows of long-read platforms. Here we propose a chromosome-by-chromosome assembly strategy implemented through the multiple-layer computer graph which identifies mis-assemblies within preliminary assemblies or chimeric raw reads and partitions the data into chromosome-scale linkage groups. The subsequent independent assembly of each linkage group generates gap-free assembly free from the mis-assembly errors which usually plague existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Pacbio, Nanopore, Hi-C, and the genetic map, to generate gap-free chromosome-scale assembly. We de novo assembled C. elegans and A. thaliana genomes using GALA with combined Pacbio and Nanopore sequening data from publicly available datasets. We also demonstrated its applicability with a gap-free assembly of two chromosomes in the human genome. In addition, GALA showed promising performance for Pacbio high-fidelity long reads. Our method enables straightforward assembly of genomes with multiple data sources and multiple computational tools, overcoming barriers that at present restrict the application of de novo genome assembly technology.

2.【史前】古DNA测序揭示史前生物巨型狐猴基因组的奥秘

Evolutionary and phylogenetic insights from a nuclear genome sequence of the extinct, giant ‘subfossil’ koala lemur Megaladapis edwardsi

No endemic Madagascar animal with body mass >10 kg survived a relatively recent wave of extinction on the island. From morphological and isotopic analyses of skeletal ‘subfossil’ remains we can reconstruct some of the biology and behavioral ecology of giant lemurs (primates; up to ~160 kg), elephant birds (up to ~860 kg), and other extraordinary Malagasy megafauna that survived well into the past millennium. Yet much about the evolutionary biology of these now extinct species remains unknown, along with persistent phylogenetic uncertainty in some cases. Thankfully, despite the challenges of DNA preservation in tropical and sub-tropical environments, technical advances have enabled the recovery of ancient DNA from some Malagasy subfossil specimens. Here we present a nuclear genome sequence (~2X coverage) for one of the largest extinct lemurs, the koala lemur Megaladapis edwardsi (~85kg). To support the testing of key phylogenetic and evolutionary hypotheses we also generated new high-coverage complete nuclear genomes for two extant lemur species, Eulemur rufifrons and Lepilemur mustelinus, and we aligned these sequences with previously published genomes for three other extant lemur species and 47 non-lemur vertebrates. Our phylogenetic results confirm that Megaladapis is most closely related to the extant Lemuridae (typified in our analysis by E. rufifrons) to the exclusion of L. mustelinus, which contradicts morphology-based phylogenies. Our evolutionary analyses identified significant convergent evolution between M. edwardsi and extant folivorous primates (colobine monkeys) and ungulate herbivores (horses) in genes encoding protein products that function in the biodegradation of plant toxins and nutrient absorption. These results suggest that koala lemurs were highly adapted to a leaf-based diet, which may also explain their convergent craniodental morphology with the small-bodied folivore Lepilemur.

3.【评分】一款评估科学软件质量的软件SoftWipe

SoftWipe – a tool and benchmark to assess scientific software quality

Scientific software from all areas of scientific research is pivotal to obtaining novel insights. Yet the quality of scientific software is rarely assessed, even though it might lead to incorrect scientific results in the worst case. Therefore, we have developed an open source tool and benchmark called SoftWipe, that provides a relative software quality ranking of 51 computational tools from diverse research areas. SoftWipe can be used in the review process of software papers and to inform the scientific software selection process.

4.【墨尔本x1】墨尔本大学（The University of Melbourne）学者：微生物组学数据批次效应矫正的多变量方法

A multivariate method to correct for batch effects in microbiome data

Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to, and obscure any factors of interest. Existing batch correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce a new multivariate and non-parametric batch correction method based on Partial Least Squares Discriminant Analysis. PLSDA-batch first estimates treatment and batch variation with latent components to then subtract batch variation from the data. The resulting batch effect corrected data can then be input in any downstream statistical analysis. Two variants are also proposed to handle unbalanced batch x treatment designs and to include variable selection during component estimation. We compare our approaches with existing batch correction methods removeBatchEffect and ComBat on simulated and three case studies. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, and especially when batch effects have high variability. Reproducible code and vignettes are available on GitHub.

5.【墨尔本x2】莫纳什大学（Monash University）学者：大脑转录图谱中的区域异质性

Dynamical consequences of regional heterogeneity in the brain’s transcriptional landscape

Brain regions vary in their molecular and cellular composition, but how this heterogeneity shapes neuronal dynamics is unclear. Here, we investigate the dynamical consequences of regional heterogeneity using a biophysical model of whole-brain functional magnetic resonance imaging (MRI) dynamics in humans. We show that models in which transcriptional variations in excitatory and inhibitory receptor (E:I) gene expression constrain regional heterogeneity more accurately reproduce the spatiotemporal structure of empirical functional connectivity estimates than do models constrained by global gene expression profiles and MRI-derived estimates of myeloarchitecture. We further show that regional heterogeneity is essential for yielding both ignition-like dynamics, which are thought to support conscious processing, and a wide variance of regional activity timescales, which supports a broad dynamical range. We thus identify a key role for E:I heterogeneity in generating complex neuronal dynamics and demonstrate the viability of using transcriptional data to constrain models of large-scale brain function.

6.【古菌】深圳大学Li Meng课题组：75个新的asgard古菌基因组暗示真核生物的起源另有玄机

Expanding diversity of Asgard archaea and the elusive ancestry of eukaryotes

Comparative analysis of 162 (nearly) complete genomes of Asgard archaea, including 75 not reported previously, substantially expands the phylogenetic and metabolic diversity of the Asgard superphylum, with six additional phyla proposed. Phylogenetic analysis does not strongly support origin of eukaryotes from within Asgard, leaning instead towards a three-domain topology, with eukaryotes branching outside archaea. Comprehensive protein domain analysis in the 162 Asgard genomes results in a major expansion of the set of eukaryote signature proteins (ESPs). The Asgard ESPs show variable phyletic distributions and domain architectures, suggestive of dynamic evolution via horizontal gene transfer (HGT), gene loss, gene duplication and domain shuffling. The results appear best compatible with the origin of the conserved core of eukaryote genes from an unknown ancestral lineage deep within or outside the extant archaeal diversity. Such hypothetical ancestors would accumulate components of the mobile archaeal ‘eukaryome’ via extensive HGT, eventually, giving rise to eukaryote-like cells.

7.【白宫】白宫新冠病毒探秘（medRxiv）

Viral genome sequencing places White House COVID-19 outbreak into phylogenetic context

In October 2020, an outbreak of at least 50 COVID-19 cases was reported surrounding individuals employed at or visiting the White House. Here, we applied genomic epidemiology to investigate the origins of this outbreak. We enrolled two individuals with exposures linked to the White House COVID-19 outbreak into an IRB-approved research study and sequenced their SARS-CoV-2 infections. We find these viral sequences are highly genetically similar to each other, but are distinct from over 160,000 publicly available SARS-CoV-2 genomes, possessing 5 nucleotide mutations that differentiate this lineage from all other circulating lineages sequenced to date. We estimate this lineage has a common ancestor in the USA in April or May 2020, but its whereabouts for the past 5 to 6 months are not clear. Looking forwards, sequencing of additional community SARS-CoV-2 infections collected in the USA prior to October 2020 may reveal linked infections and shed light on its geographic ancestry. In sequencing of SARS-CoV-2 infections collected after October 2020, the relative rarity of this constellation of mutations may make it possible to identify infections that likely descend from the White House COVID-19 outbreak.

8.【敲除】加州大学伯克利分校Savage组：从一个密码子到整个基因尺度的敲除（making every possible deletion across a gene）

Comprehensive deletion landscape of CRISPR-Cas9 identifies minimal RNA-guided DNA-binding modules

Proteins evolve through the modular rearrangement of elements known as domains. It is hypothesized that extant, multidomain proteins are the result of domain accretion, but there has been limited experimental validation of this idea. Here, we introduce a technique for genetic minimization by iterative size-exclusion and recombination (MISER) that comprehensively assays all possible deletions of a protein. Using MISER, we generated a deletion landscape for the CRISPR protein Cas9. We found that Cas9 can tolerate large single deletions to the REC2, REC3, HNH, and RuvC domains, while still functioning in vitro and in vivo, and that these deletions can be stacked together to engineer minimal, DNA-binding effector proteins. In total, our results demonstrate that extant proteins retain significant modularity from the accretion process and, as genetic size is a major limitation for viral delivery systems, establish a general technique to improve genome editing and gene therapy-based therapeutics.

9.【错愕】意大利学者：细胞内的端粒转移实现T细胞寿命的延长

Intercellular telomere transfer extends T cell lifespan

The common view is that T-lymphocytes activate telomerase, a DNA polymerase that extends telomeres at chromosome ends, to delay senescence. We show that independently of telomerase, T cells elongate telomeres by acquiring telomere vesicles from antigen-presenting cells (APCs). Upon contact with T cells, APCs degraded shelterin to donate telomeres, which were cleaved by TZAP, and then transferred in extracellular vesicles (EVs) at the immunological synapse. Telomere vesicles retained the Rad51 recombination factor that enabled them to fuse with T cell chromosomal ends causing an average lengthening of ∼3000 base pairs. Thus, we identify a previously unknown telomere transfer program that supports T cell lifespan.

10.【呼吁】科学论文作图的新标准

Creating Clear and Informative Image-based Figures for Scientific Publications

Scientists routinely use images to display data. Readers often examine figures first; therefore, it is important that figures are accessible to a broad audience. Many resources discuss fraudulent image manipulation and technical specifications for image acquisition; however, data on the legibility and interpretability of images are scarce. We systematically examined these factors in non-blot images published in the top 15 journals in three fields; plant sciences, cell biology and physiology. Common problems included missing scale bars, misplaced or poorly marked insets, images or labels that were not accessible to colorblind readers, and insufficient explanations of colors, labels, annotations, or the species and tissue or object depicted in the image. Papers that met all good practice criteria examined for all image-based figures were uncommon (physiology 16%, cell biology 12%, plant sciences 2%). We present detailed descriptions and visual examples to help scientists avoid common pitfalls when publishing images. Our recommendations address image magnification, scale information, insets, annotation, and color and may encourage discussion about quality standards for bioimage publishing.

科研星球

2020年10月biorxiv生信好文速览

标签