藏着一些不成熟的文献笔记

为什么要新增这一个页面？

因为读的文献很多，但能真正明白真正吃透的文献很少，如果每读一个文献就把笔记拿来发文章难免有些不妥之处，毕竟里面有些东西可能我自己都没很懂。另一方面这样可能会弄得博客很杂乱，我一直感觉发文不在多而在精，所以专门开设了这个页面用来记录自己看过的很好的文献，重点在于分享而不是解说。

以后多久一更？

保持之前说的习惯一周一更~~（如果可以的话）~~。

主要会是什么文献？

和自己研究领域相关的文献，以后应该主要是基因组学方面的文章。

基因组学数据视野下的人类突变来源

点击查看笔记

原文：https://www.nature.com/articles/s41576-021-00376-2

Published: 23 June 2021

Main Points:

Mutations arise from two major mechanisms - replication errors and accumulated DNA damage that is not properly repaired. The relative contributions of these two mechanisms to human mutation are unknown.
Replication errors leave detectable footprints in genomic data such as dependency of mutation rate on paternal age, asymmetry between leading and lagging strands, and variability along the genome related to replication timing and activity of co-replicative repair like MMR.
Unrepaired DNA damage also leaves statistical patterns beyond just correlating mutation rate with mutagen exposure. Mechanisms include inaccurate replication over lesions, erroneous repair, and recruitment of error-prone polymerases at sites of damage. Evidence suggests most damage is converted to mutation during replication.
Deficiencies in co-replicative repair like loss of polymerase proofreading greatly increase mutation rate and cause specific asymmetric mutational patterns between strands. Other repair pathways like BER and NER also shape mutation patterns.
Mutation rate varies along the genome at multiple scales, related to replication timing, recombination, nucleosome positioning, TF binding, repetitive elements, etc. These patterns may reveal mechanisms but correlations alone are not definitive.
Mutations show asymmetries related to transcription (T-asymmetry) and replication (R-asymmetry) directions. This implies contributions of transcription-coupled repair and replicative processes.
Parental age and sex influence mutation rate and spectra. Most maternal mutations are damage-related and show localization in the genome. Paternal mutations likely arise from replication errors and linearly increase with age.
Some patterns like CpG hypermutability were thought to only arise from damage, but genome data suggests a replicative component also contributes.
Overall, sequencing data has revealed new insights into human mutation, but patterns must be interpreted in context of known biochemistry to infer mechanisms. Integrating experiments, sequencing analysis and modeling will be key to elucidating processes generating new mutations.

由须鲸野外谱系推断的突变率和历史丰度信息

点击查看笔记

原文：https://www.science.org/doi/10.1126/science.adf2160

Published: 31 Aug 2023

The paper estimated the germline mutation rate (μ) directly from pedigrees in four baleen whale species - blue, fin, bowhead, and humpback whales. This is the first time μ has been directly estimated from wild populations rather than inferred from phylogenies.
They identified trios (offspring and both parents) among samples of unknown ancestry using genetic analysis. Whole genome sequencing of 21 genomes in 8 trios was done to identify de novo mutations and estimate μ.
The estimated nuclear μ was around 1.1 x 10^-8 per site per generation, similar to estimates in primates and toothed whales with similar generation times. This contradicts the notion that whales have a lower μ due to their gigantic body size and lower metabolism.
For the mitochondrial genome, they detected heteroplasmy and changes in heteroplasmy ratios across generations in humpback whales to estimate μ. The mtDNA μ was around 4.3 x 10^-6, similar to humans.
Using their estimated mtDNA μ reduces previous estimates of pre-exploitation humpback whale abundance that were based on lower phylogenetic μ, making them more consistent with non-genetic estimates.
The new nuclear and mtDNA μ estimates negate a major effect of gigantism on mutation rate in whales. Direct pedigree-based estimation of μ is feasible in wild populations and can improve evolutionary inference.

Overall the study seems well-designed but expanding the analyses to more species could strengthen the conclusions. The new rate estimates have important implications for various evolutionary inferences in whales.

用于推断精细尺度种系突变率图谱的可泛化深度学习框架

点击查看笔记

原文：https://www.nature.com/articles/s42256-022-00574-5

Published: 08 December 2022

The paper presents MuRaL, a deep learning framework to predict mutation rates at the nucleotide level across the genome using only DNA sequence as input. Accurate estimation of fine-scale mutation rates is important for evolutionary and genetic analyses.
MuRaL has two main modules - a ‘local’ module that learns signals from a small region around the focal nucleotide, and an ‘expanded’ module that learns from a larger 1kb region. This design aims to capture both local and distal effects on mutation rate.
Comprehensive assessments using human mutation data showed MuRaL achieved higher accuracy in predicting mutation rates compared to previous state-of-the-art methods, especially at smaller genomic scales.
MuRaL required much fewer training mutations and genomes than previous methods. It could build effective models using rare variants from just 100 human genomes. This makes it more readily applicable to other species lacking large mutation datasets.
The authors demonstrated successful application of MuRaL to generate mutation maps for rhesus macaque, fruit fly, and Arabidopsis genomes. Transfer learning further reduced data needs.
Analyses showed developmental genes tend to have elevated mutation rates, suggesting high mutational burden. Regions with lowest depletion ranks were enriched for pathogenic variants. This demonstrates usefulness of the mutation rate maps.

The method seems quite promising. It will be interesting to see broader applications of the generated mutation maps in future evolutionary and medical studies.

哺乳动物染色质组织中的转座元件

点击查看笔记

原文：https://www.nature.com/articles/s41576-023-00609-6

Published: 07 June 2023

Main Points:

Transposable elements (TEs) comprise nearly 50% of mammalian genomes. They are mobile DNA elements that can replicate and integrate into new positions in the genome.
TEs contribute regulatory sequences such as enhancers, promoters, and transcription factor binding sites that impact gene expression. Recent advances show TEs also affect 3D genome architecture.
TEs provide CCCT-C binding factor (CTCF) motifs that establish topologically associating domain (TAD) boundaries. TADs are megabase-scale chromatin domains that facilitate local interactions.
TE-derived CTCF motifs maintain TAD boundaries over evolution by providing redundant binding sites. This buffers against mutations disrupting ancestral sites.
Some TEs reshape chromatin architecture in a lineage-specific manner, forming novel loops that alter gene regulation and may underlie phenotypic differences.
Within TADs, TEs contribute regulatory sequences involved in short-range enhancer-promoter interactions and long-range interactions between TADs.
TEs are abundant in heterochromatin and provide docking sites for repressive histone modifications that maintain chromatin structure. Loss of heterochromatin is linked to aging.
Most evidence is correlative, more functional studies are needed. Long-read sequencing will expand analysis of difficult-to-assemble TE-rich regions.
Understanding TE roles in chromatin organization will provide insights into genome evolution, gene regulation, and disease mechanisms. Targeting TEs may have therapeutic potential.

In summary, this comprehensive review covers recent advances showing TEs shape mammalian genome architecture at multiple scales through both conserved and lineage-specific mechanisms. TEs contribute substantially to chromatin domain boundaries, short and long-range chromatin interactions, and heterochromatin integrity. Elucidating specific functions of TEs in chromatin organization remains an important area for future research.

哺乳动物基因组中突变率的差异

点击查看笔记

原文：https://www.nature.com/articles/nrg3098

Published: 04 October 2011

Key Points:

The germline mutation rate varies across the mammalian genome at several scales: between adjacent nucleotides; over hundreds of nucleotides; over hundreds of thousands to millions of nucleotides; and between whole chromosomes.
The strongest patterns are observed at the smallest scales.
Variation between adjacent nucleotides can either be dependent or independent of context.
Large-scale variation in the mutation rate is underestimated by between-species comparisons.
Variation between chromosomes is most conspicuous for the sex chromosomes.
There is variation in the somatic mutation rate across the genome. The variation has similarities and differences to that observed in the germline.

The mutation rate varies across the mammalian genome at many different scales within the germline. The strongest variation is at the smallest scale, at which sites can differ by more than tenfold in their mutability. Nevertheless, there is substantial variation at the subchromosomal level, the extent of which has probably been underestimated by comparative genomics; our best estimate is that most genomic regions have mutation rates within twofold of each other. Variation at the chromosomal level is even less, particularly among the autosomes. Similar patterns are observed in somatic tissue, although these are not as well characterized. The variation in the mutation rate has important consequences for our understanding of the evolutionary process. The role that this variation has in determining which genes become involved in somatic and inherited genetic disease is a question for the future.

克服挑战和教条，理解伪基因的功能

点击查看笔记

原文：https://www.nature.com/articles/s41576-019-0196-1

Published: December 2019

This paper discusses the evolving understanding of pseudogenes in the genome, traditionally considered to be non-functional or “junk” DNA. The authors argue that this view is outdated and that pseudogenes, which are found in nearly all forms of life and in similar numbers to recognized protein-coding genes in mammalian genomes, have been overlooked in terms of their potential biological roles due to historical biases and terminological challenges.

Main Points:

Pseudogenes are defined as defective copies of genes, but growing evidence suggests they play significant biological roles.
The term “pseudogene” itself may contribute to biases against considering these genomic regions as functionally relevant, reminiscent of historical misconceptions in biology, such as the “prokaryote” grouping.
Pseudogenes arise from processes like gene duplication and retrotransposition, important for evolution and potentially for creating new biological functions.
Despite the historical view that pseudogenes lack function, evidence shows that they are involved in a wide range of cellular processes, including gene regulation, innovation of new genes, and contributing to genetic diversity and disease.
The review criticizes the binary distinction between genes and pseudogenes as overly simplistic and calls for a reassessment of pseudogenes’ roles using modern genomic technologies like long-read sequencing and CRISPR-Cas9.
The authors suggest adopting more neutral terminology, such as “retrocopy” for retrotransposed elements and “gene copy” or “paralogue” for duplications, to encourage unbiased investigation into these genomic elements’ functions.
Advances in genomic analysis tools are beginning to reveal the diverse roles of pseudogenes, challenging the notion that they are mere genomic relics without biological function.

In summary, the review advocates for a paradigm shift in how pseudogenes are perceived and studied, highlighting their potential significance in understanding genome function, evolution, and disease. The authors argue that the historical classification of pseudogenes as non-functional elements is a simplification that neglects the complexity and potential utility of these genomic regions.