Funded Grants

As of November 19, 2019
ASC1 (NIMH, 2013-2018)

Specific Aims

Aim 1. Maintain the infrastructure to support the ASC objectives
This Aim will provide infrastructure to support all ASC objectives, including supporting all committees, facilitating the communication between sites, coordinating sample and data transfer amongst sites, etc. In addition, this Aim provides an ASC bioinformatic hub site for hosting cleaned and called data and for analyses.

Aim 2. Deploy pipelines for data cleaning and harmonization and variant calling
This Aim provides resources for developing and implementing improved methods for variant calling and for cleaning and harmonizing datasets, and the subsequent deployment of the called data for analysis, relying on methods developed in the 1000 Genomes Project and other large-scale initiatives.

Aim 3. Implement novel statistical methods for identifying ASD-associated genes
This Aim will take advantage of the diverse expertise of the ASC to develop and apply novel methods to identify likely ASD-associated genes. This would include studying de novo and inherited variation — including X-linked and recessive loci, looking within recurrent CNV loci, use of classifiers, use of systems genetics, and use of datasets from other disorders that share risk with ASD (e.g., schizophrenia, ID, epilepsy).

Aim 4. Carry out whole-exome sequencing of 3,000 ASD subjects and parents
This Aim relies on sequencing from both Yale and contributions from the NHGRI Genome Sequencing Program to expand the numbers of samples sequenced at the whole-exome level to include 3,000 additional ASD trios (selected from the NIH repository). This will provide a total sample of 9,000 ASD subjects and nearly 30,000 samples for detailed gene discovery.

ASC1 Supplement (NIMH, 2015)
Specific Aims

Aim 1. To collect, organize and integrate clinical data. The large sample of the ASC, which keeps expanding through recruitment at existing sites and participation of new sites, provides a unique opportunity for genotype-phenotype analyses. We will create a database of the available clinical data that will include as minimally required parameters, ADI-R, ADOS-G, standardized IQ measures, presence of seizures, additional diagnoses in the proband and family psychiatric history. Ed Cook, Mike Gill, and Catalina Betancur, co-heads of the Samples and Phenotypes Committee of the ASC, have worked on harmonizing and integrating such data in the AGP, SSC and PGC. Many of the samples in the ASC are from well-organized cohorts [e.g., AGP (Autism Genome Project et al., 2007), TASC (Buxbaum et al., 2014), SSC (Fischbach and Lord, 2010), or PAGES (Gaugler et al., 2014)], and this will facilitate access to data and integration into ongoing ASC analyses. Certain aspects of the proposed codebook are found in the Tables below.

ASC FID ASC IID ADI AGE ADI DX (1) ADI verbal/non verbal (2) ADOS DX IQ ASD DX_VER1 (3)
in months 0 – Not Autism/ASD 0 – daily functional use of 3+ word phrases 0 – Not Autism, not spectrum 1 – 1- 34 1 – unaff
1 – Autism 1 – daily speech with use of at least 5 words in past month 1 – Spectrum 2 – 35 – 69 2 – aff
2 – ASD 2 – fewer than five words total or speech not used on a daily basis 2 – Autism 3 – 70 – 90 999 – missing ADI, ADOS
4 – > 90
Notes: ASCFID, is the ASC family ID while ASCIID, is the ASC individual ID. (1) ADI DX = 2: Use ASD criteria from Risi et al PMID: 16926617 ASD1 or ASD2; (2) Fields for different ADI versions, including ADI30 – ADI WPS, ADI19 – ADI 95 long, ADI14 – ADI 95 short; (3) Alg for aff/unaff/missing is ADI OR ADOS > 0 = 2 (aff), ELSE = 1 (unaff), UNLESS ADI AND ADOS NULL = 999 (missing).
Has the patient had a seizure or been diagnosed with epilepsy? if yes, what is the seizure type? Are the seizures controlled on medication? If an EEG was performed: If brain MRI performed:
0 unknown or none focal – simple partial

focal – complex partial

focal – secondary generalized

generalized – absence

generalized – myoclonic

generalized – tonic-clonic

epileptic encephalopathy – infantile spasms

epileptic encephalopathy – Lennox-Gastaut syndrome

other epileptic encephalopathy

febrile seizure

other type of seizure – describe



Seizures stopped without medication



focal or bilateral/generalized

Epileptiform activity –
continuous spike-wave of slow wave sleep


Other – describe

1 probable
2 definite

Figure 1. Proband IQ and variation in the genetic architecture of ASD. LoF, loss-of-function; Hx, history; SCZ, schizophrenia; BPD, bipolar disorder; MDD, major depressive disorder. Modified from Robinson et al. 2014.

We recently consulted with experts in epilepsy (from the Epi4K and EpiPhenome Consortia) to get direction on epilepsy measures. Dr Orin Devinsky provided the above approach for epilepsy, currently in use in large-scale studies.
In addition to these measures, we will also ask about other diagnoses in the proband and family history of major psychiatric disorders and were available more details about ADI, ADOS, IQ measures, age of parents at birth of child, and head circumstance and additional body metrics.
One example of integration of genetic data (de novo LoF mutations), a phenotypic variable (cognitive functions as assessed by full scale IQ) and family history of specific psychiatric disorders is shown in Fig. 1, which highlights how increase in IQ in probands with ASD is associated with lower rate of de novo LoF variants and with higher family history of major adult psychiatric conditions. It is of immediate interest to carry out similar analyses in the ASC, where the IQ distribution is broader than that of the SSC.

Aim 2. To facilitate and promote patient recontacting. A key aspect of the ASC that will simplify this process is that most sites have approval to recontact as part of independent funding. This permits the ASC sites to follow up with patients showing relevant genetic findings, obtain additional clinical information for the affected and family members, and even collect additional biomaterials. As a long-term commitment, the ASC will identify patients to follow up on the basis of mutations in known ASD genes (a list curated and constantly updated by Catalina Betancur), novel risk genes identified by large WES studies to date in ASD (De Rubeis et al., 2014; Iossifov et al., 2014) and in other neurodevelopmental disorders (Deciphering Developmental Disorders, 2015; Euro Epinomics-RES et al., 2014), and from ongoing ASC analyses of genetic variation at all scales (e.g., recessive variants, X-linked variants, CNVs). For the purpose of this supplement, to ensure the completion within 1 year, we will give priority to the top ~20 ASD genes identified in the joint analyses of the ASC and SSC data (De Rubeis et al., 2014; Iossifov et al., 2014), excluding any with detailed existing genotype-phenotype information. Based on this evaluation, the ASC will solicit the original sites to recontact the patient, relying on their independent IRB approvals, supervise the collection of clinical data, as per the recommendation of the Samples and Phenotypes Committee, and share data (in anonymous form) with the Committee in order to develop a deep analysis of genotype-phenotype correlations.

Aim 3. To increase discoveries in high-risk loci. We intend to complement WES efforts with cost-effective targeted sequencing of candidate genes poorly covered by WES and unexplored genes because of PCR and sequencing biases. Targeted sequencing studies have proven powerful in detecting risk-conferring variants that accrue on the mutational landscape revealed by WES (O’Roak et al., 2012). We will focus on genes containing GC-rich exons, which are likely a source of risk because of their hypermutability (CpG dinucleotides show hypermutability to TpG or CpA, as compared to mutation of TpG or CpA to CpG) but are not adequately amplified during library preparation. On the basis of WES coverage information available in the ExAC database, we will compile a list of 300 GC-rich genes that include a) plausible and known ASD risk genes implicated by evidence other than WES, including previous targeted sequencing screenings, CNV studies, and the DAWN algorithm (detecting association with networks) (Liu et al., 2014); b) genes not yet implicated in ASD, selected on the basis of neuronal expression, co-expression and functional relations to ASD genes. We will sequence genes containing GC-rich exons and enriched using a double strategy. We will first interrogate the panel of 300 genes with the Ion Torrent technology, which combines cost-effectiveness, speed and improved performances on GC-rich exons. We expect 80-90% of the panel to have adequate coverage by this first step. The remaining 10-20% genes that will fail during this first step will be resequenced with the PCR-free SMRT technology from PacBio. After initial setup of the method, we will pilot the sequencing of these genes on 300 trios analyzed by WES but not showing significant findings.

Figure 2. Relation between IQ and de novo LoF variants detected or not in ExAC.

Aim 4. To generate a public database of ASD risk genes. We will create a publicly accessible database that contains the statistical evidence for each gene. Disease association will be assessed and systematically updated on the basis of analyses using the robust and reliable statistical framework of TADA (He et al., 2013) and the >60,000 control exomes in ExAC. Ongoing analyses show the reliability of the genetic variation comprised in ExAC to infer variants deleteriousness. As shown in Fig. 2, de novo LoF variants detected in ASD probands in the SSC cohort and in ExAC are not associated with phenotypic severity, suggesting those variants are less likely to be deleterious (Robison et al., unpublished). The database will be updated every 6 months to accommodate novel data that will emerge from joint analyses of existing and newly published cohorts, explorations of novel types of variation (X-linked, mosaicisms, recessive), novel analytical tools and the expanding ExAC sample, and will be hosted in our servers for at least 5 years.

ASC2 (NIMH, 2017-2022)
Specific Aims

Aim 1. Produce and/or analyze WES of 30,000 new ASD subjects, parents and other controls. These samples will be added to the current ASC collection of 22,000 WES samples, for a total of more than 50,000 samples. New samples will be provided by ASC collaborators, including roughly 3,000 trios already identified and a large number of Danish case-control samples. The budget of this proposal is only for analyses of these data because WES data production is covered by pre-existing independent funding arrangements.

Aim 2. Looking below the surface: using and developing approaches to find “hidden” risk variants. We will apply novel methods to the ASC exome data to identify “hidden” subsets of risk variants that confer high ASD risk in categories of genetic variants that show weak ASD risk en masse (e.g. de novo missense variation, inherited loss of function mutation). We will leverage multiple novel approaches, often garnering strength from the data in and results from the ExAC resource.

Aim 3. Use results from common and rare variant studies to describe the interplay of such variation in ASD risk. We will integrate WES variants with data from whole-genome sequence (WGS) and GWAS to produce a complete picture of the genetic architecture of ASD, to improve gene discovery, and to refine clinical interpretation. WGS and GWAS data will be derived from other ongoing studies.