Marginalized kernels for biological sequences

Similar documents
Hedonic Pricing Model Open Space and Residential Property Values

RoboCup Challenges. Robotics. Simulation League Small League Medium-sized League (less interest) SONY Legged League Humanoid League

Housing market and finance

Cube Land integration between land use and transportation

CHANGE IN VALUE ALTERATIONS MADE OCCUPANCY CHANGES

Online Appendix "The Housing Market(s) of San Diego"

TOOLS TO BALANCE SUPPLY. Rail~Volution October 22, 2013 Dan Bertolet VIA Architecture and Planning

Estimating the Value of Foregone Rights on Land. A Working Paper Prepared for the Vermillion River Watershed Joint Powers Organization 1.

Modeling of Geographic Dependencies for Real Estate Ranking

DATA APPENDIX. 1. Census Variables

Scores for Valuation Reports: Appraisal Score & BPO Score. White Paper. White Paper APRIL 2012

Empirical estimates of economies of scale in the provision

Procedures Used to Calculate Property Taxes for Agricultural Land in Mississippi

Journal of Babylon University/Engineering Sciences/ No.(5)/ Vol.(25): 2017

F o r e c l o s u r e s, R e t u r n s, a n d B u y e r I n t e n t i o n s

SAS at Los Angeles County Assessor s Office

Backyarding: Theory and Evidence for South Africa

A NOMINAL ASSET VALUE-BASED APPROACH FOR LAND READJUSTMENT AND ITS IMPLEMENTATION USING GEOGRAPHICAL INFORMATION SYSTEMS

Housing Price Prediction Using Search Engine Query Data. Qian Dong Research Institute of Statistical Sciences of NBS Oct. 29, 2014

Small-Tract Mineral Owners vs. Producers: The Unintended Consequences of Well-Spacing Exceptions

A Real-Option Based Dynamic Model to Simulate Real Estate Developer Behavior

Univalent multisets. V through the eyes of the identity type. Håkon Robbestad Gylterud. August 2014

Water Use in the Multi family Housing Sector. Jack C. Kiefer, Ph.D. Lisa R. Krentz

The Analytic Hierarchy Process. M. En C. Eduardo Bustos Farías

Package deamer. February 19, 2015

Step-by-Step Guide for Configuring and Implementing SAP REFX

The Proposal of Cadastral Value Determination Based on Artificial Intelligence

PropTech for Proactive Pricing of Houses in Classified Advertisements in the Indian Real Estate Market Sayan Putatunda

The survey also examines the underlying causes of FVM and impairment audit

Navigating the New Lease Accounting Standards for Audit Advisers Preparing Clients for the Transition to the Joint Project Lease Reporting

Purpose of this Study

Delivering return on investment in Rental management and processing within the supply chain

The Improved Net Rate Analysis

Land-Use Regulation in India and China

Strategic Study and Dynamics Decision for Development Program of City Housing

CFPB Implementation of Parcels Provision in HMDA Under Dodd-Frank

A Note on the Efficiency of Indirect Taxes in an Asymmetric Cournot Oligopoly

On the Choice of Tax Base to Reduce. Greenhouse Gas Emissions in the Context of Electricity. Generation

Neighborhood Price Externalities of Foreclosure Rehabilitation: An Examination of the 1 / Neigh 29. Program

The Relationship Between Micro Spatial Conditions and Behaviour Problems in Housing Areas: A Case Study of Vandalism

Appendix to Forced Sales and House Prices

Overview of OR Modeling Approach & Introduction to Linear Programming

An Econometric Analysis of Land Development with Endogenous Zoning

A study on the correlation between resting-state functional connectivity and intrinsic neuronal activity of stochastic neural network models

CREATING A ROOM BASE

Financial Computer Systems Inc. (203)

VALUATING RESIDENTIAL REAL ESTATE USING PARAMETRIC PROGRAMMING

The agent-based modeling approach to MFM: A call to join forces

Basic Appraisal Procedures

2016 Masters project 141 Cracking the Voynich manuscript code

Understanding the AOFC. Reporting for 2015

Leases. (a) the lease transfers ownership of the asset to the lessee by the end of the lease term.

Florenz Plassmann DOCTOR OF PHILOSOPHY. Economics. Approved: T.N. Tideman, Chairman. R. Ashley J. Christman. C.Michalopoulos S.

Chapter 1. Introduction: Some Representative Problems. CS 350: Winter 2018

The purpose of the appraisal was to determine the value of this six that is located in the Town of St. Mary s.

The Effect of Relative Size on Housing Values in Durham

Certified Corporate Financial Planning & Analysis Professional (Cert FP&A): Preparation Course Part 1

Hedonic Regression Models for Tokyo Condominium Sales

In several chapters we have discussed goodness-of-fit tests to assess the

NBER WORKING PAPER SERIES TESTING FOR INFORMATION ASYMMETRIES IN REAL ESTATE MARKETS. Pablo Kurlat Johannes Stroebel

Sales Concessions in the US Housing Market

Ad-valorem and Royalty Licensing under Decreasing Returns to Scale

Gregory W. Huffman. Working Paper No. 01-W22. September 2001 DEPARTMENT OF ECONOMICS VANDERBILT UNIVERSITY NASHVILLE, TN 37235

New Methodology for Constructing Real Estate Price Indices Applied to the Singapore Residential Market

Journal of Business & Economics Research Volume 1, Number 9

A SUBMISSION FROM THE GLOBAL SHIPPERS FORUM. TO THE INFORMATION NOTE Issued by the Directorate General for Competition 29 th September, 2006

Waverley Elementary School Feasibility Study

FASB/IASB Update Part II

Accepted Manuscript. New Methodology for Constructing Real Estate Price Indices Applied to the Singapore Residential Market

Assessing the Market Value of Real Estate Property with a Geographically Weighted Stochastic Frontier Model

Working in the Appraisal Firewall Relationships Screen

A Brief Overview of H-GAC s Regional Growth Forecast Methodology

Measuring Urban Commercial Land Value Impacts of Access Management Techniques

Vol 2016, No.14. Abstract

Entry and Inefficiency in the Real Estate Brokerage Industry

STAT 200. Guided Exercise 8 ANSWERS

UNDERSTANDING DEVELOPER S DECISION- MAKING IN THE REGION OF WATERLOO

Damage Measures for Inadvertant Breach of Contract

Special Report. Australia s Cheapest Suburbs with the Greatest Potential for Capital Growth. For more reports head to

MEASURING THE IMPACT OF INTEREST RATE ON HOUSING DEMAND

Oil & Gas Lease Auctions: An Economic Perspective

Section 5 SERVICE EXTENSIONS Electric Tariff 1 st Revised Sheet No. 15 Filed with Iowa Utilities Board Cancels Original Sheet No.

Prediction of House Unit Price in Taipei City Using Support Vector Regression

4 STEPS TO COMPLIANCE THE NEW FASB/IASB LEASE ACCOUNTING GUIDE

Waiting for Affordable Housing in NYC

Editorial Manager(tm) for Environmental and Resource Economics Manuscript Draft

Table of Contents SECTION 1. Overview... ix. Course Schedule... xiii. Introduction. Part 1. Introduction to the Income Capitalization Approach

The (In)efficiency of Share-Tenancy Revisited: Evidence from Pakistan

CHOICE BASED LETTING (CBL) HOW TO USE THE CBL SYSTEM

IHS Regional Housing Market Segmentation Analysis

Sorting based on amenities and income

Low cost cadastral updating approach: an alternative for the Brazilian cities

New Models for Property Data Verification and Valuation

Click to edit Master title style REVENUE RECOGNITION Understanding the New Revenue Recognition Standard ASC 606

The Price Elasticity of the Demand for Residential Land: Estimation and Implications of Tax Code-Related Subsidies on Urban Form

LEASES & HOT TOPICS PRESENTED BY: JASON MYERS & BRYAN CALLAHAN

CFA UK response to the Exposure Draft on Leases

Evacuation Design Focused on Quality of Flow

Return on Investment Model

Asset. Capital Asset Management Module. Asset Lookup Form

Transcription:

Marginalized kernels for biological sequences Koji Tsuda, Taishin Kin and Kiyoshi Asai AIST, 2-41-6 Aomi Koto-ku, Tokyo, Japan Presented by Shihai Zhao May 8, 2014 Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 1 / 23

Overview 1 Introduction kernel functions hidden Markov model 2 Methods new kernel connections to the Fisher kernel 3 Results and Conclusion Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 2 / 23

kernel functions In kernel methods such as S.V.M., a kernel function should be determined a priori. Supervised learning Objective function is clear. Kernels are designed to optimize the function. Unsupervised learning The choice of kernel is subjective. It is determined to reflect the user s notion of similarity. Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 3 / 23

kernel functions for sequences Texts Count features, which represent the number of each symbol contained in a sequence Biological sequences Count does not work out of the box primary due to frequent context change Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 4 / 23

A DNA sequence with hidden context information. Suppose the hidden variable ( h ) indicates coding/noncoding regions. Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 5 / 23

New way to design a kernel Visible Hidden HMM Joint & Marginalized Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 6 / 23

HMM A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 7 / 23

Example of HMM A limited number of sequences whose structures are known. We want to train the four HMMs of secondary structures to make the prediction Helix Sheet Turn Other Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 8 / 23

block diagram Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 9 / 23

HMMs of secondary structures Combined HMM for prediction Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 10 / 23

marginalized kernel x, x X, h, h H, where H is a finite set. z = (x, h), z = (x, h ) K(x, x ) = h H p(h x)p(h x )K z (z, z ) h H p(x x) has to be estimated from the data. When the cardinality of H is too large, the calculation can be intractable. Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 11 / 23

marginalized kernel from Gaussian mixture K(x, x ) = p(h x)p(h x )x T A h x h H where A h is the inverse of covariance matrix. Distance in feature space D(x, x ) = K(x, x) + K(x, x ) 2K(x, x ) Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 12 / 23

marginalized count kernel Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 13 / 23

second-order marginalized count kernel Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 14 / 23

Definition of the Fisher kernel Assume a probabilistic model p(x θ) is defined on X, where θ is a parameter vector. Let ˆθ denote parameter values which are obtained by some learning algorithm. Then the Fisher kernel between two objects is defined as K f (x, x ) = s(x, ˆθ) T Z 1 (ˆθ)s(x, ˆθ) where s is the Fisher score s(x, ˆθ) := θ log p(x ˆθ) and Z is the Fisher information matrix Z(ˆθ) = x X p(x ˆθ)s(x, ˆθ)s(x, ˆθ) T Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 15 / 23

Fisher kernel from latent variable models the Fisher score is described as θ log p(x ˆθ) = h H θp(x, h ˆθ) p(x ˆθ) = h H p(x, h ˆθ) θ p(x, h ˆθ) p(x ˆθ) p(x, h ˆθ) = h H p(h x, ˆθ) θ p(x, h ˆθ) Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 16 / 23

The Fisher kernel is described as a marginalized kernel K f (x, x ) = θ p(x ˆθ) T Z(ˆθ) 1 θ p(x ˆθ) = p(h x, ˆθ)p(h x, ˆθ)K z (z, z ) h H h H where the joint kernel is K z (z, z ) = θ p(x, h ˆθ) T Z(ˆθ) 1 θ p(x, h ˆθ) Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 17 / 23

experiment settings 84 amino acid sequences from 5 genera in Actinobacteria The number of sequences in each genus is listed as 9,32,15,14,14 Pairwise identity is 62%-99% BLAST scores cannot directly be converted to kernels Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 18 / 23

Two kinds of experiments clustering and supervised classification are performed on the following kernels: CK1: Count kernel CK2: Second-order count kernel FK: Fisher kernel MCK1: Marginalized count kernel MCK2: Second-order marginalized count kernel Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 19 / 23

clustering result Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 20 / 23

classification result Genera 1 and 2 are not used because they can be seperated easily by all kernels. We do one vs one for the rest three. Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 21 / 23

effect of HMM states Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 22 / 23

conclusion Fisher kernel is a special case of MCK. second-order kernels perform better than first-order kernels number of HMM states effect Koji Tsuda, Taishin Kin and Kiyoshi Asai Marginalized kernels May 8, 2014 23 / 23