A feasible roadmap for unsupervised deconvolution of twosource mixed gene expressions

Similar documents
UNIT 6 RECTIFYING SAMPLING PLANS

TOWNSHIP OF LOWER MERION. PARKS AND RECREATION COMMITTEE Wednesday, February 6, :15 PM (Approximately)

NINETY FOUR ST. STEPHEN S GREEN

[W\ CITY OF BEVERLY HILLS HUMAN SERVICES DIVISION MEMORANDUM TO: Human Relations Commission FROM:

MARKET INSIGHT LOUISVILLE, KENTUCKY MULTIFAMILY REPORT THIRD QUARTER 2018

AGRICULTURAL BUILDING APPLICATION

«A first introduction to symmetrical algebraic expressions»

o 225-OG Same as Survey o Encroachment, etc. series o Access series o Forced Removal series

Computational Analysis of 3D Gravity Anomalies in Cartesian and Geographical Coordinates

Unusable for. a transaction. Specimen REPRESENTED BY. (hereinafter called the AGENCY or the BROKER ) (hereinafter called the SELLER ) DATE

CITY COUNCIL AGENDA REPORT. Honorable Mayor and City Council Members

Submitted by: Prepared by: For Reading: ANCHORAGE,ALASKA AO No

ORDINANCE NO

CITY OF CARLTON PLANNING COMMISSION MEETING AGENDA MONDAY, APRIL 15, 2019, 6:00 P.M. CITY HALL, 191 EAST MAIN STREET, CARLTON

SELLER DISCLOSURE STATEMENT IMPROVED PROPERTY. Nara Neves Seller

Wylfa Newydd Project ENERGY WORKING FOR BRITAIN. Neighbourhood Support Scheme Property Price Support

The CHROMOSOMES, Third Edition

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

Tab 3 CONTINUANCE FUTURE PLAN LLC VARIANCE VAR

Understanding Rational Expressions 1

Perspectives on the Direction of the Suncheon Bay National Garden from Local Residents and Non-Local Visitors

RC RURAL CLUSTER DEVELOPMENT OVERLAY DISTRICT (8/6/02)

Town of Grenfell Official Community Plan. Bylaw No

çyjryty Design Review Commission Report Thursday, January 5, 2017 (Continued from December 5, 2076)

Approved as to Form of Legality

APPENDIX A-2 DEPARTMENT OF RISK MANAGEMENT AND INSURANCE SELF-STUDY (2010) RMI FULL-TIME FACULTY HONORS/AWARDS a ( )

c.ifd CHARLEY TOPPINO AND SONS, INC. Miami, Florida IN THE SUPREME COURT STATE OF FLORIDA CHARLEY TOPPINO AND SONS, INC., et al.

PC Staff Report 05/18/15 PP Item No. 3-1

3 To preserve scenic views and to minimize views of new development from existing streets.

Application for a Visitor s Firearm/Shot Gun Permit

APPLICATION FOR PERMIT BAYFIELD COUNTY, WISCONSIN. ^~ Da^e Stamp (Received) !'! '-; ^ ' t -^ **. ^?

JEFFERSON PARK OVERLOOK TOWNHOMES

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

Sorting based on amenities and income

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

APPLICATION FOR PERMIT BAYFIELD COUNTY, WISCONSIN. Date Stamp (Received) MAY 2 i L^ i:

SITE DEVELOPMENT PLAN

Application for a Visitor's Firearm/Shot Gun Permit

Application for a Visitor's Firearm/Shot Gun Permit

APPLICATION FOR PERMIT. BAYFIELD COUNTY, WISCONSIN ~^, lj ; Pa* Stamp (Received) 11 J JUN04ZU1

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

A-T - APPLICATION FOR PERMIT ~~^^ ^ BAYFIELD COUNTY, WISCONSIN,'i.,^ jop. ^4'^lT ^ ^ JLJL 11 ::^ l! SayfieSd Co. Zoning Depfc

APPLICATION FOR PERMIT BAYFIELD COUNTY, WISCONSIN. Date St^AtK (Reiielued) 1 / '' APR m-^

Design Review Commission Report

.Brown. Federal Post Office Building AND/OR HISTORIC: Minnesota,, ? g

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

SAS at Los Angeles County Assessor s Office

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

Absolute Coolant Proof Caliper SERIES 500 with Dust/Water Protection Conforming to IP66/67 Level

REF 2016/08/26 INCH DIM C TOLERANCE 4 PLACES ± ± 3 PLACES ± ± PLACES ± 0.25 ± PLACES ± 0.38 ± 0 PLACES ± ± MKIPPER JBELL

Rents Quarterly Change

Hedonic Pricing Model Open Space and Residential Property Values

These photos show a typical flat in the advertised block or street. Please do not contact any occupants in these buildings

A Method for Merging Similar Zones to Improve Intelligent Models for Real Estate Appraisal

APPLICATION FOR PERMIT BAYFIELD COUNTY, WISCONSIN. Date Stamp (Received) ^1^ Q^QW

Sensorimotor control of reactive balance in individuals with Parkinson's disease before and after adapted tango

What Factors Determine the Volume of Home Sales in Texas?

A. K. Alexandridis University of Kent. D. Karlis Athens University of Economics and Business. D. Papastamos Eurobank Property Services S.A.

Hedonic Regression Models for Tokyo Condominium Sales

PS819158X PLAN OF SUBDIVISION EDITION 1 LOCATION OF LAND. Notations VESTING OF ROADS AND/OR RESERVES NOTATIONS EASEMENT INFORMATION SV00

çbevrlyrly Design Review Commission Report Cty of Beverly

Chapter 1. Introduction: Some Representative Problems. CS 350: Winter 2018

CHAPTER ly Merchantability a condition Fitness for intended use a Difference in legal effect between

The Improved Net Rate Analysis

BE IT RESOLVED BY THE CITY COUNCIL OF THE CITY OF DALLAS: SECTION 1. That City JR., City Attorney 424. BY: 1Assistant. November

EDITION 1. Certification. Public Open Space. Other purpose of this plan EASEMENT INFORMATION. Origin SV00

26TH REPORT OF THE. 2. (2) That, on the recommendation of the General Manager of Planning

Ad-valorem and Royalty Licensing under Decreasing Returns to Scale

gains from international risk sharing?

On the Relationship between Track Geometry Defects and Development of Internal Rail Defects

Ontario Ministry of the Environment and Climate Change - Record of Site Condition # Intended Property Use. Certification Date 2014/11/24

PLANNING COMMISSION REPORT Regular Agenda Public Hearing Item. PC Staff Report 12/14/15 A (County) TO RS10; 3 ACRES; 1041 N 1700 Road (KES)

Cautions. Correct Use. Tactile Switches Technical Information. Handling. Dust Protection. PCBs. Soldering. Automatic Soldering Baths.

University of Zürich, Switzerland

WESTOVER MEDICAL PARK BUILDING FOR SALE

PROPERTY INFORMATION QUESTIONNAIRE (LEASEHOLD)

Filing # E-Filed 06/17/ :34:36 AM

Do Family Wealth Shocks Affect Fertility Choices?

APPLICATION FOR PERM IT BAYFIELD COUNTY, WISCONSIN. Date Stamp (Received) ^ /^

KNOXVILLE/KNOX COUNTY METROPOLITAN PLANNING COMMISSION PLAN AMENDMENT/REZONING REPORT

Role of Supply Chain Coordination in OM: Select Experiences from India

CALIFORNIA. ~ tgby ERIC GARCETTI MAYOR

by Tahsin YOMRALIOGLU, Bayram UZUN & Recep NISANCI Department of Surveying Engineering Karadeniz Technical University Trabzon, Turkey

PROPERTY INFORMATION QUESTIONNAIRE (LETTING)

Comparables Sales Price (Old Version)

PS R PLAN OF SUBDIVISION EDITION 1 LOCATION OF LAND. Notations VESTING OF ROADS AND/OR RESERVES NOTATIONS EASEMENT INFORMATION SV00

In several chapters we have discussed goodness-of-fit tests to assess the

Nonlocal methods for image processing

THE CITY OF DRIGGS IDAHO

Micro Factors Causing Fall in Land Price in Mixture Area of Residence and Commerce

Chapter 4 An Economic Theory of Property

Analyzing Ventilation Effects of Different Apartment Styles by CFD

UNIVALENCE CRITERIA OF CERTAIN INTEGRAL OPERATOR (Kriterium Univalen bagi Pengoperasi Kamiran Tertentu)

1. 6 RATIOnAl expressions

Full Building Currently Under Renovation Ready Spring ,852 SF OAKLAND LOFT STYLE OFFICE SPACE 555 Gold Way Pittsburgh, PA 15213

Bulletin - Site Access

Optimal Apartment Cleaning by Harried College Students: A Game-Theoretic Analysis

Hedonic Amenity Valuation and Housing Renovations

Marginalized kernels for biological sequences

THE IMPACT OF ENVIRONMENTAL CONDITIONS ON SHOPPING LOCATIONS: AN ANALYSIS OF THE AUSTRIAN MARIAHILFERSTRAßE

Transcription:

Gene epression A fesible rodmp for unsupervised deconvolution of twosource mied gene epressions Niy Wng, Eric P. Hoffmn, Robert Clrke, Zhen Zhng, Dvid M. Herrington 5, Ie-Ming Shih, Dougls A. Levine 6, Guoqing Yu, Jinhu Xun nd Yue Wng, Deprtment of Electricl nd Computer Engineering, Virgini Tech, Arlington, VA, USA; Reserch Center for Genetic Medicine, Children's Ntionl Medicl Center, Wshington, DC, USA; Lombrdi Comprehensive Cncer Center, Georgetown University, Wshington, DC 57, USA; Deprtment of Pthology, Johns Hopkins University, Bltimore, MD, USA; 5 Deprtment of Internl Medicine, Wke Forest University, Winston-Slem, NC 757, USA; 6 Deprtment of Surgery, Memoril Slon-Kettering Cncer Center, New York, NY, USA Received on XXXXX; revised on XXXXX; ccepted on XXXXX Associte Editor: XXXXXXX Contct: yuewng@vt.edu Supplementry informtion: Supplementry dt re vilble t Bioinformtics online. INTRODUCTION Tissue heterogeneity is mjor confounding fctor in studying individul popultions tht cnnot be resolved directly by globl profiling (Hoffmn, et l., ). Eperimentl solutions to mitigte tissue heterogeneity re epensive, time consuming, inpplicble to eisting dt, nd my lter the originl gene epression ptterns (Kuhn, et l., ; Shen-Orr, et l., ). Alterntively, vrious in silico methods perform bsiclly supervised deconvolution bsed on either eternlly-obtined constituent proportions (Shen-Orr, et l., ; Sturt, et l., ) or previously-cquired cell-specific signtures (Kuhn, et l., ; Lu, et l., ). In the erlier issues of this journl, few rticles hve reported semi-supervised methods tht were specificlly focused on dissecting two-source mied gene epressions. Gosink et l. used (known) epression dt from single cell type to determine the proportion (nd subsequently epression profile) of ech cell type in heterogeneous smple (Gosink, et l., 7). This method detects the minimum of proportion tht provides good estimte in noiseless or simultion dt. Built upon this work, Clrke et l. developed geometry-bsed method tht provides more ccurte estimte of this minimum in noisy rel dt nd cn be pplied in situtions where one or multiple heterologous smples re vilble (Clrke, et l., ). These methods ssume liner miture of log-trnsformed epression levels tht hs recently been shown to be invlid (Zhong nd Liu, ). Ahn et l. proposed sttisticl pproch for deconvolving linerly mied cncer trnscriptomes in individul smples under vrious rw mesured dt scenrios (Ahn, et l., ). This prcticl (most likely) solution gin requires prior knowledge of gene epression of one tissue type from multiple similr smples nd pplies vrious heuristics. Here we sk whether it is possible to deconvolute two-source mied epressions (estimting both proportions nd cell-specific profiles) from two or more heterogeneous smples without requiring ny forementioned prior knowledge (Wng, ). Supported by well-grounded mthemticl frmework, we rgue tht both constituent proportions nd cell-specific epressions cn be estimted in completely unsupervised mode when cell-specific mrker genes eist, which do not hve to be known priori, for ech of constituent cell types. Fundmentl to the success of our pproch is geometric eploittion of cell/condition-specific mrker genes nd epression non-negtivity. Specificlly, we show tht () the sctter plot of mied epressions is compressed version of cell-specific epressions; () the resident genes on the two rdii of sctter sector re the cell-specific mrker genes; () the rdius vectors defined by the cell-specific mrker genes re the column vectors of the miing mtri; nd () the rnk of between-tissue differentilly epressed genes is miing-invrint. We demonstrte the performnce of unsupervised deconvolution on both simultion nd rel gene epression dt, together with perspective discussions. THEORY AND METHOD. Rodmp of unsupervised deconvolution We dopt the liner ltent vrible model of rw mesured epression dt (Zhong nd Liu, ), given by (bold font indictes column vectors) () i s () i, or ( i) s tissue ( i ) s tissue ( i ), () () i smple + s () i tissue smple tissue where s tissue(i) nd s tissue(i) re the gene epression vlues for pure tissues -, nd smple(i) nd smple(i) re the gene epression vlues for heterogeneous smples -, for genes i,,n, respectively; nd jk re the miing proportions with + + (fter signl normliztion). We further dopt the concept of cell-specific mrker genes (MG) (Gosink, et l., 7; Kuhn, et l., ; Wng, ), i.e., genes whose epression is highly nd eclusively enriched in prticulr cell popultion in the contet of interest, or mthemticlly s(i MG) [α i ] T nd s(i MG) [ β i] T. Since rw mesured gene epression vlues s re non-negtive, when cell-specific mrker genes eist for ech cell type, the liner ltent vrible model () is identifible using two or more mied epressions, s we will elborte vi the following theorems nd their forml proofs (see Fig. for geometric illustrtion). Theorem (Sctter compression). Suppose tht pure tissue epressions re non-negtive nd (i) s tissue(i) + s tissue(i) where nd re linerly

N.Wng et l. stissue stissue ( j ) + ( ) stissue ( j ) stissue independent, then, the sctter plot of mied epressions is compressed into sctter sector whose two rdii coincide with nd. + stissue stissue ( j ) Proof of theorem. Since nd re linerly independent, without loss of generlity, we ssume tht, i.e.,. stissue stissue ( j ) + ( ) stissue stissue ( j ) + stissue stissue ( j ). Multiply both sides by stissue(i) nd dd stissue(i) to both sides, since rw mesured epressions re non-negtive, we hve stissue + stissue stissue + stissue. Simple mthemticl reorgniztions led to stissue stissue ( j ) + stissue stissue ( j ) + stissue ( j ) stissue + stissue stissue ( j ) Simple mthemticl reorgniztions led to ( stissue + stissue ) ( stissue + stissue ), stissue stissue ( j ) + stissue ( j ) stissue + stissue stissue ( j ) + stissue stissue ( j ), stissue + stissue. stissue + stissue Since (i) stissue(i) + stissue(i), we hve smple smple. Since (i) stissue(i) + stissue(i), we complete the proof with ( j) stissue ( j ) s tissue smple smple. stissue ( j ) stissue smple ( j ) smple Using similr strtegy, we cn show smple, smple tht redily completes the proof. Theorem (Unsupervised identifibility). Suppose tht pure tissue epressions re non-negtive nd cell-specific mrker genes eist for ech constituting tissue type, nd (i) stissue(i) + stissue(i) where nd re linerly independent, then, the two rdii of the sctter sector of mied epressions coincide with nd tht cn be redily estimted from mrker gene epression vlues with pproprite rescling. Proof of theorem. Bsed on the definition of cell-specific mrker genes, i.e., s(img) [αi ]T nd s(img) [ βi]t, nd the eistence of cell-specific mrker genes for ll constituting tissue types, we hve ( img ) α i, ( img ) β i. By the conclusion of Theorem, we complete the proof..5.5 All genes Mrker genes All genes Mrker genes.5 stissue + stissue smple s tissue.5 ( img ) β i.5.5.8 86.8 68 [ ] s tissue.5.5 ( img ) α i.5 smple.5.5.5 Figure. Geometric nd mthemticl description of the miing process. Corollry (Invrince of differentil epression). Suppose tht pure tissue epressions re non-negtive nd (i) stissue(i) + stissue(i) where nd re linerly independent, then, the rnk of between-tissue differentilly epressed genes is miing-invrint. Proof of corollry. Without loss of generlity, we ssume tht, i.e.,, nd stissue ( j ) s tissue, i.e., stissue ( j ) stissue stissue stissue ( j ). stissue ( j ) stissue Since nd re linerly independent nd, multiply both sides by ( - ), nd dd stissue(i)stissue(j) nd stissue(i)stissue(j) to both sides, we hve stissue ( j ) + stissue ( j ) s + stissue tissue. stissue ( j ) + stissue ( j ) stissue + stissue From Theorem, there eists mthemticl solution uniquely identifying the liner ltent vrible model () bsed on two/more mied epressions: under noise-free scenrio, we cn (in principle) directly estimte nd by locting the two rdii tht most tightly enclose the sctter sector of mied epressions. Moreover, Corollry llows for between-tissue differentil nlysis from mied epressions without requiring deconvolution.. Algorithm nd evlution criteri So fr, we hve described the theoreticl rodmp for unsupervised deconvolution of two-source mied epressions. We now complete the description of our lgorithm by considering the identifiction of mrker genes or miing mtri, nd its ppliction to dt deconvolution. Although the miing mtri cn be estimted using only one mrker gene per tissue type, more ccurte solution, with prcticl pplicbility, is to estimte nd using multiple mrker genes. Our unsupervised deconvolution begins by detecting the cell-specific mrker genes directly from mied epressions, in which the differentil nlysis of gene epressions is performed on ll genes. Mthemticlly, MG is defined s n inde set MG i smple k m k min ε smple k min k m ε, () smple smple where km nd kmin re the mimum nd minimum rtios of smple(i) over smple(i) cross ll i, respectively; nd ε is pre-fied positive smll rel number. To obtin relible set of mrker gene indices, some pre-processing steps re required, including mode/men-bsed normliztion nd removl of minimlly-epressed nd outlier genes. On the bsis of the epression vlues of detected cell-specific mrker genes, the miing mtri is estimted using stndrdized smple verges, ˆ, ˆ, nmg i MG ( i ) nmg i MG ( i ) () where MG nd MG re the inde sets of mrker genes for tissue types nd, respectively; nmg nd nmg re the numbers of mrker genes for tissue types nd, respectively; nd. denotes the vector norm (L or L). The resulting â nd â re then used to deconvolute the mied epressions into cell-specific profiles vi mtri inversion techniques. Unsupervised deconvolution lgorithm: ) Normlize gene epression profile using globl men/mode; ) Remove minimlly-epressed genes whose norm is less thn prefied positive smll rel number δ, nd outlier genes whose norm is bigger thn pre-fied positive lrge rel number γ;

A fesible rodmp for unsupervised deconvolution of two-source mied gene epressions ) Detect the indices of cell-specific mrker genes, for ech of the constituting tissue types, ccording to (); ) Estimte miing mtri ccording to (), normlized to proportions; 5) Estimte cell-specific epression profiles using mied epressions nd mtri inversion technique(s). We use four complementry evlution criteri nd known ground truth to ssess the performnce of the proposed unsupervised deconvolution method. To ssess the ccurcy of tissue proportion estimtes, in ddition to clssic correltion coefficient, we dopt the E criterion given by p ij p ij E +, () i j m k p ik j i m k p kj where p ij is the ijth element of the mtri [â â ] - [ ] with â nd â being the estimted column vectors of miing mtri. Note tht E is invrint to permuttion or scling nd E when the estimtion is perfect. To ssess the ccurcy of estimted cell-specific epression ptterns, we clculte the correltion coefficient between the estimted epression profile nd ground truth over mrker genes nd ll genes respectively. Moreover, to ssess the membership (nd rnk) mtch (nd mismtch) between the mrker genes detected from pure versus mied epressions, we utilize Venn digrms, together with Spermn s rnk correltion coefficient. More detils on lgorithm, prmeter settings, nd lterntive schemes, re included in the supplementry informtion. EXPERIMENTAL RESULTS. Vlidtion on cell line epression dt We first considered numericl mitures of two humn cell line epressions, sitution in which ll fctors re known nd liner miture model is idel. We reconstituted mied epressions by multiplying the mesured cell line epressions by the proportion of the tissue subset in given heterogeneous smple (Fig. ). We detected the cell-specific mrker genes solely bsed on the reconstituted epression mitures nd ccordingly obtined highly ccurte estimte of the miing mtri with E.95 (nd correltion coefficient of.99) (Tble ). For ech cell line, comprison of the estimted epression profile of ech type to the mesured epression pttern in the pure cell line showed n lmost perfect correltion with n verge correltion coefficient of.99, indicting tht we could ccurtely deconvolute the mied epressions into constituent epression ptterns in completely unsupervised wy. Tble. Identifiction of liner miture model using numericl mitures of two brest cncer cell line epressions. Smple/Tissue MCF7 (brest cncer) HS7 (fibroblsts) (ssigned/estimted) (ssigned/estimted) Smple.8/.77 86/9 Smple.8/.86 68/.66 Net, we tested our method on biologiclly mied epressions from two brest cncer cell lines. The mrna etrcted from the individul cell lines re mied with pre-specified proportions before subsequent procedures including mplifiction nd microrry eperiment (Tble ). Such mitures mimic the ctul biologicl smples with vrying reltive bundnces of the constituent subsets from one nother (Fig. ) (Kuhn, et l., ; Shen-Orr, et l., ). Tble. Identifiction of liner miture model using biologicl mitures of two brest cncer cell line epressions. Smple/Tissue MCF7 (brest cncer) HS7 (fibroblsts) (ssigned/estimted) (ssigned/estimted) Smple.75/.76 /.78 Smple /..75/.7576 The proposed method gin ccurtely estimted the miing proportions with E.778 (nd correltion coefficient of.99) (Tble nd Fig. ), nd cell-specific epression ptterns with n verge correltion coefficient of.99 between the estimted epression profile of ech type to the mesured epression pttern in the pure cell line (Fig. ). The high correltion tht we chieved between the estimted proportions/tissue-epressions nd ground truth suggests Figure. Sctter plot of biologicl miing nd blind model identifiction. tht unsupervised deconvolution of tissue-specific epression profiles from two-source heterogeneous smples using liner model should yield ccurte epression estimtes for most genes. Figure. Highly correlted sctter plots between the estimted nd mesured pure cell epression profiles (over mrker nd ll genes). The theoreticl rodmp lso enbles the etended detection of differentilly-epressed genes beyond mrker genes imed t mimizing the informtion obtinble from mied epressions. To ssess the specificity nd sensitivity of detecting differentilly epressed genes without deconvolution, we compred the rnked inde subsets of differentilly epressed genes between smples to gold stndrd set of differentilly epressed genes identified from the pure cell line mesurements, on both numericl nd biologicl mitures of two brest cncer cell line epressions. In ddition to the Venn digrm nd Spermn s rnk correltion coefficient (rrnk.9), receiver operting chrcteristics curve nlysis showed tht the detection of differentilly epressed genes bsed on mied epressions (Corollry ) to be both highly specific nd sensitive with n re under the curve of.85 (supplementry informtion).. Anlysis of benchmrk epression dt As n emple for the purpose of comprison, we lso nlyzed the sme public benchmrk gene epression dtset (AFFY) used by

N.Wng et l. Ahn et l. (Ahn, et l., ). This dtset consists of biologiclly mied heterogeneous smples with vrying proportions of humn brin nd hert tissues. We selected smples with brin/hert proportion rtios of /% ( smples), 5%/75% ( smples), 75%/5% ( smples) nd %/ ( smples). In contrst to the semi-supervised methods tht ll require prior knowledge of gene epression of one tissue type (Ahn, et l., ; Clrke, et l., ; Gosink, et l., 7), the 6 pure tissue smples were not used in ny step of our proposed lgorithm but simply served s the truth for ssessment. For ech proportion rtio, consistent with routine prctice, we tke the verge of the replictes (with the sme proportion rtio) s the mied/observed epression profile to be nlyzed by the lgorithm. As forementioned, in ddition to ccurte signl normliztion (Wng, et l., ), to mintin dt qulity nd computtionl efficiency, we selected subset of genes in the subsequent nlyses by ecluding minimlly-epressed nd outlier genes (supplementry informtion) (Ahn, et l., ). This provided us with bout probe sets cross ll smples. Tble. Unsupervised estimtion of unknown tissue proportions on AFFY brin-hert miture dtset. Smple/Tissue Brin Hert (ssigned/estimted) (ssigned/estimted) Smple /..75/.7658 Smple.75/.76 /.78 With pre-processed rw mesured dt, we first evluted how well the proposed method estimted tissue proportions in this dtset (Tble ). Without using ny knowledge of either tissue-specific epression or constituent proportions, s in other methods (Ahn, et l., ; Clrke, et l., ; Gosink, et l., 7; Kuhn, et l., ), our lgorithm ccurtely estimted the unknown tissue proportions with correltion coefficient of.99 (E.7), s compred with correltion coefficient of.98 produced by the semi-supervised method on the sme dtset (Fig. ) (Ahn, et l., ). Figure. Sctter plot of brin-hert mitures nd proportion estimtes. Net, we emined how well the proposed method estimted tissue-specific epression ptterns in this dtset. As shown in Fig. 5, the proposed method ccurtely nd blindly estimted the gene epressions of pure brin nd pure hert tissues, with correltion coefficient of.96-.99 between the estimted men tissue epression levels nd mesured men pure tissue epression levels, s compre to correltion coefficient of.88-.95 produced by the semi-supervised method on the sme dtset. These results suggest tht this unsupervised deconvolution method is ble to ccurtely deconvolute two-source mied epressions (estimting both proportions nd cell-specific profiles) from two or more heterogeneous smples. Detiled informtion on dditionl eperimentl results (tbles, figures, dtsets) re included in the supplementry informtion. DISCUSSION In this letter, we presented fesible rodmp for unsupervised deconvolution of two-source mied epressions, supported by the newly proved theorems under relistic conditions nd eperimentl tests on rel gene epression dt. One importnt dvntge of this unsupervised deconvolution pproch lies in its unique nd proven bility to detect cell-specific mrker genes nd estimte constituent proportions directly from mied epressions when the relevnt prior knowledge is either unrelible or unvilble. This is significnt, in reltion to semi-supervised methods, since it is well-known tht () cell-specific mrker genes (membership nd epression) re condition-specific nd () the totl mount of mrna from the sme volume of cncer cells is much higher thn tht of norml cells (due to unknown tumor ploidy) (Clrke, et l., ). Figure 5. Estimtion of tissue-specific gene epressions from AFFY: sctter plots compring deconvolved men brin/hert tissue epression vlues with mesured men pure brin/hert tissue epression vlues. We foresee vriety of etensions to the concepts nd strtegies in the proposed method. For emple, with further development, intrtumor heterogeneity cn be reveled in terms of hidden subclonl mrker genes nd subclonl repopultion dynmics. There is lso possible wy to estimte the mrker epression profiles for individul smples (Ahn, et l., ). Rewrite () s () i s () i + s () i, (5) smple tissue tissue,smple () i s () i s () i smple + tissue tissue,smple where stissue,smple(i) nd stissue,smple(i) re the smple-specific vritions in pure tissues. Then, for mrker genes, we hve ( stissuej( imgj) + stissue j,smple( imgj) ) ( stissuej( imgj) + stissue j,smple ( imgj) ) ( i smple MGj ) j, (6) ( i smple MGj ) j where j is the tissue type inde. According to (), âj nd âj re obtined vi some form of verging over tissue-specific mrker genes, where for ech smple we my resonbly ssume stissue j,smple () i. (7) n i MGj i MGj Denote stissuej,smplek(imgj)stissuej(imgj)+ stissuej,smplek(i), we hve stissue j,smplek( imgj) smplek( img j) kj, (8) ( ) for ech of k nd j, where k is the smple inde.

A fesible rodmp for unsupervised deconvolution of two-source mied gene epressions Funding: Ntionl Institutes of Helth, under Grnts NS955, CA97, CA66, HL6, in prt. Conflict of Interest: none declred. REFERENCES Ahn, J., et l. () DeMi: deconvolution for mied cncer trnscriptomes using rw mesured dt, Bioinformtics, 9, 865-87. Clrke, J., Seo, P. nd Clrke, B. () Sttisticl epression deconvolution from mied tissue smples, Bioinformtics, 6, -9. Gosink, M.M., Petrie, H.T. nd Tsinorems, N.F. (7) Electroniclly subtrcting epression ptterns from mied cell popultion, Bioinformtics,, 8-. Hoffmn, E.P., et l. () Epression profiling-best prctices for dt genertion nd interprettion in clinicl trils, Nt. Rev. Genet., 5, 9-7. Kuhn, A., et l. () Popultion-specific epression nlysis (PSEA) revels moleculr chnges in disesed brin, Nt Methods, 8, 95-97. Lu, P., Nkorchevskiy, A. nd Mrcotte, E.M. () Epression deconvolution: reinterprettion of DNA microrry dt revels dynmic chnges in cell popultions, Proceedings of the Ntionl Acdemy of Sciences of the United Sttes of Americ,, 7-75. Shen-Orr, S.S., et l. () Cell type-specific gene epression differences in comple tissues, Nt Methods, 7, 87-89. Sturt, R.O., et l. () In silico dissection of cell-type-ssocited ptterns of gene epression in prostte cncer, Proc. Ntl. Acd. Sci.,, 65-6. Wng, Y. () Independent Component Imging. US Ptent 6,78,96. Wng, Y., et l. () Itertive normliztion of cdna microrry dt, IEEE Trns Info. Tech. Biomed, 6, 9-7. Zhong, Y. nd Liu, Z. () Gene epression deconvolution in liner spce, Nt Methods, 9, 8-9; uthor reply 9. 5