Chart-Based Decoding

Similar documents
Tree-based Models. Dr. Mariana Neves (adapted from the original slides of Prof. Philipp Koehn) January 25th, 2016

Network Analysis: Minimum Spanning Tree, The Shortest Path Problem, Maximal Flow Problem. Métodos Cuantitativos M. en C. Eduardo Bustos Farías 1

The ecrv Submit application opens with the following important warning message on privacy:

Demonstration Properties for the TAUREAN Residential Valuation System

CONSUMER CONFIDENCE AND REAL ESTATE MARKET PERFORMANCE GO HAND-IN-HAND

CREATING A ROOM BASE

1 / 22 SPLIT AND REPHRASE. Shashi Narayan, Claire Gardent, Shay B. Cohen and Anastasia Shimorina

HOW TO CREATE AN APPRAISAL

learning.com Streets In Infinity Streets Infinity with many thanks to those who came before who contributed to this lesson

Network Analysis: Minimum Spanning Tree,

Cube Land integration between land use and transportation

Can tenant participation thrive in an increasingly pressurised social housing system?

Briefing Note. Voluntary Registration of Land in the Land Register of Scotland

The agent-based modeling approach to MFM: A call to join forces

WHO I AM. Prof. Ralf Niebergall. Architect in Magdeburg, Germany (2-8 Employees) Professor for architecural design and building theory in Dessau

The Hennepin County platting process and common platting problems

Asset. Capital Asset Management Module. Asset Lookup Form

Outline. Section 21.6 (pp ) ISC

WORKOUT EACH PROBLEM BY SHOWING ALL THE NECESSARY STEPS.

Myth Busting: The Truth About Multifamily Renters

EXPLANATION OF MARKET MODELING IN THE CURRENT KANSAS CAMA SYSTEM

Locking in Value A quick guide to Locked Box Closing Mechanics M&A

Sell Your House in DAYS Instead of Months

86 years in the making Caspar G Haas 1922 Sales Prices as a Basis for Estimating Farmland Value

Business English. (Answer Keys)

Investment Guide. home loans

Automatic Cryptanalysis of Block Ciphers with CP

Cash Flow for Life #3 September 2014

ACCA Paper F7. Financial Reporting (INT) theexpgroup.com

UNPLANNED URBAN DEVELOPMENT

DS0444 AP Allocations

Chapter 11 Investments in Noncurrent Operating Assets Utilization and Retirement

RoboCup Challenges. Robotics. Simulation League Small League Medium-sized League (less interest) SONY Legged League Humanoid League

Acquisition cost Purchase price plus all expenditures needed to prepare the asset for its intended use

An Assessment of Recent Increases of House Prices in Austria through the Lens of Fundamentals

Using rules for assessing and improving data quality: A case study for the Norwegian State of Estate report

Hampshire Home Choice Scheme Guide

About Streams. Streams is 2017 by HousingLink

Buy a house in the Netherlands

Relocation. Community Bulletin. Jane/ Firgrove & Firgrove Crescent. April 2017

Selling residential property in England and Wales: the basics

Frequently Asked Questions:

BUREAU OF LAND MANAGEMENT. Fractional Sections. With John Farnsworth and Belle Craig C A D A S T R A L S U R V E Y

Density Transfer Credits. A workable approach to TDR for New Hampshire

CHARTER TOWNSHIP OF FENTON SEWER SYSTEM FINANCIAL OVERVIEW MARCH, 2018

DSC Delivery Sub-Committee. 20 Nov 17

A NEW TENANT LAW. Suggested Changes to Current Tenant Law in Ontario

Address & Road Name Application Form

Introduction. Game Contents

SUPPORTING PEOPLE TO MOVE ON

FASB Updates Business Definition

1 MLS Settings Glossary

Here s How to Do a Simple Property Search.

Ohio Department of Transportation. Division of Engineering. Office of Real Estate. Synergy. Real Estate Business Analysis

California Rapid Re-Housing Webinar Series #2

Six Steps to a Completed Appraisal Report

Technical Line FASB final guidance

Gentrification Analysis of Minneapolis & St. Paul

STEWART MY FILES. Stewart My Files allows you

Summary of Findings & Recommendations

Is Your Operating Lease An Asset or Liability? It s Now Both

Home Buyer s Guide. Everything you need to know before buying a home

AVM Validation. Evaluating AVM performance

If the address does not have a street number and name, provide sufficient information and MOBILE HOME RATING /

Aircraft Leases. Wednesday 10 May 2017: Module 9. Andrew Charlton Charles Stotler Matthew Feargrieve Richard Gimblett 8-13 May 2017

For Internal Use Only. Understanding Reports. esite

Staff recommends the City Council hold a public hearing, listen to all pertinent testimony, and introduce on first reading:

The Analytic Hierarchy Process. M. En C. Eduardo Bustos Farías

PORTER GROUP LIMITED REMARKABLES PARK LIMITED SHOTOVER PARK LIMITED ORAL SUBMISSION LOCAL GOVERNMENT AND ENVIRONMENT SELECT COMMITTEE

ROYAL MALAYSIAN CUSTOMS GOODS AND SERVICES TAX GUIDE

Is Mixed-Tenure Neighborhood Conducive to Neighborhood Satisfaction?

PP Course # Instructor Information. Patrick Vandergriff 35 Cottonwood Canyon Road La Luz, NM

A Method For Building Legal Digital Cadastre Without Using Cadastral Measurements Field Book Data Is It Accurate Enough?

Questions and Answers about Neighborhood Conservation Districts

Büromarktüberblick. Market Overview. Big 7 3rd quarter

Expectations for including affordable housing in rezoning applications o 15% of units or o comparable contributions cash

We were established in 2006 by Angus and Heidi King and between us we have decades of experience in property management.

Macro-prudential Policy in an Agent-Based Model of the UK Housing Market

Innovative approaches to Land Governance Programme management; a Contractors View. Clive English & Owen Edwards

Chapter 1. Introduction: Some Representative Problems. CS 350: Winter 2018

PUBLISHER S NOTE. Careers in Green Energy contains twenty-three alphabetically arranged chapters

RENTERS GUIDE TO EVICTION COURT

SAN RAMON. Current prices for homes on the market Trends in pricing Current levels of supply and demand Value metrics

Agenda Item 11: Revenue and Non-Exchange Expenses

AN OVERVIEW OF LAND TOOLS IN SUB- SAHARAN AFRICA: PAST, PRESENT AND FUTURE

Rolling Out RAD Webinar Q&A

Oligopoly. Introduction: Between Monopoly and Competition. In this chapter, look for the answers to these questions: Two extremes

UTEN Training Week Aveiro, Portugal May 10 11, 2011

Ministry of Health and Long Term Care Community Health Capital Program Operational Framework-Training Narration Module 4

Table of Contents SECTION 1. Overview... ix. Course Schedule... xiii. Introduction. Part 1. Introduction to the Income Capitalization Approach

17 July International Accounting Standards Board 30 Cannon Street London EC4M 6XH United Kingdom. Dear Sir/Madam

April 3 rd, Monitoring the Infill Zoning Regulations. Review of Infill 1 and 2 and Proposed Changes

Prof. Derek Abbott, Yaxin Hu

Tel: Fax: The Exchange Letting & Management Services

Planning and Managing A Construction/Renovation Project

Capturing the Geographic Value of Living in 3-D3. Boulder County Assessor s s Office

Major Transport Scheme Appraisal An Overview

Duplex and Tandem Development Community Workshop. Presented by: Elisabeth Dang, AICP

Digital Track Notebook

HOTEL Scenarios for the year ahead. Choice CEO Stephen Joyce : Ideas, insights and wishes for Is your hotel ready for the Chinese?

Transcription:

Chart-Based Decoding Kenneth Heafield University of Edinburgh 6 September, 2012 Most slides courtesy of Philipp Koehn

Overview of Syntactic Decoding Input Sentence SCFG Parsing Decoding Search Output Sentence

Overview of Syntactic Decoding Parallel Corpus Input Sentence Translation Model SCFG Parsing Decoding Monolingual Corpus Search Language Model Output Sentence

Syntactic Decoding Inspired by monolingual syntactic chart parsing: During decoding of the source sentence, a chart with translations for the O(n 2 ) spans has to be filled Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP

Syntax Decoding VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP German input sentence with tree

Syntax Decoding PRO she VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent)

Syntax Decoding PRO she coffee VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent)

Syntax Decoding PRO she coffee VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Purely lexical rule: filling a span with a translation (a constituent)

Syntax Decoding NP NP PP DET a cup PRO she coffee IN of VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Complex rule: matching underlying constituent spans, and covering words

Syntax Decoding VBZ wants VP TO to NP VP VB NP NP PP DET a cup PRO she coffee IN of VB drink Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Complex rule with reordering

Syntax Decoding Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP VP S PRO she VB drink cup IN of NP PP NP DET a VBZ wants VB VP VP NP TO to coffee S PRO VP

Bottom-Up Decoding For each span, a stack of (partial) translations is maintained Bottom-up: a higher stack is filled, once underlying stacks are complete

Chart Organization Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF NP S VP Chart consists of cells that cover contiguous spans over the input sentence Each cell contains a set of hypotheses Hypothesis = translation of span with target-side constituent

Dynamic Programming Applying rule creates new hypothesis NP: a cup of coffee NP+P: a cup of apply rule: NP NP Kaffee ; NP NP+P coffee NP: coffee eine ART Tasse Kaffee trinken VVINF

Dynamic Programming Another hypothesis NP: a cup of coffee NP: a cup of coffee NP+P: a cup of apply rule: NP eine Tasse NP ; NP a cup of NP NP: coffee eine ART Tasse Kaffee trinken VVINF Both hypotheses are indistiguishable in future search can be recombined

Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee

Recombinable States Recombinable? NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee Yes, if max. 2-gram language model is used

Recombinability Hypotheses have to match in span of input words covered output constituent label first n 1 output words not properly scored, since they lack context last n 1 output words still affect scoring of subsequently added words, just like in phrase-based decoding (n is the order of the n-gram language model)

Language Model Contexts When merging hypotheses, internal language model contexts are absorbed S (minister of Germany met with Condoleezza Rice) the foreign...... in Frankfurt NP (minister) the foreign...... of Germany VP (Condoleezza Rice) met with...... in Frankfurt relevant history plm(met of Germany) plm(with Germany met) un-scored words

Stack Pruning Number of hypotheses in each chart cell explodes need to discard bad hypotheses e.g., keep 100 best only Different stacks for different output constituent labels? Cost estimates translation model cost known language model cost for internal words known estimates for initial words outside cost estimate? (how useful will be a NP covering input words 3 5 later on?)

Naive Algorithm: Blow-ups Many subspan sequences for all sequences s of hypotheses and words in span [start,end] Many rules for all rules r Checking if a rule applies not trivial rule r applies to chart sequence s Unworkable

Solution Prefix tree data structure for rules Dotted rules Cube pruning

Storing Rules First concern: do they apply to span? have to match available hypotheses and input words Example rule np x 1 des x 2 np 1 of the nn 2 Check for applicability is there an initial sub-span that with a hypothesis with constituent label np? is it followed by a sub-span over the word des? is it followed by a final sub-span with a hypothesis with label nn? Sequence of relevant information np des nn np 1 of the nn 2

Rule Applicability Check Trying to cover a span of six words with given rule NP des NP: NP of the das Haus des Architekten Frank Gehry

Rule Applicability Check First: check for hypotheses with output constituent label np NP des NP: NP of the das Haus des Architekten Frank Gehry

Rule Applicability Check Found np hypothesis in cell, matched first symbol of rule NP des NP: NP of the NP das Haus des Architekten Frank Gehry

Rule Applicability Check Matched word des, matched second symbol of rule NP des NP: NP of the NP das Haus des Architekten Frank Gehry

Rule Applicability Check Found a nn hypothesis in cell, matched last symbol of rule NP des NP: NP of the NP das Haus des Architekten Frank Gehry

Rule Applicability Check Matched entire rule apply to create a np hypothesis NP des NP: NP of the NP NP das Haus des Architekten Frank Gehry

Rule Applicability Check Look up output words to create new hypothesis (note: there may be many matching underlying np and nn hypotheses) NP des NP: NP of the NP: the house of the architect Frank Gehry NP: the house : architect Frank Gehry das Haus des Architekten Frank Gehry

Checking Rules vs. Finding Rules What we showed: given a rule check if and how it can be applied But there are too many rules (millions) to check them all Instead: given the underlying chart cells and input words find which rules apply

Prefix Tree for Rules NP DET NP NP: NP1... NP: NP1 IN2 NP3 NP: NP1 of DET2 NP3 NP: NP1 of IN2 NP3 PP VP...... des um VP...... NP: NP1 of the 2 NP: NP2 NP1 NP: NP1 of NP2... DET NP: DET1 2...... das Haus NP: the house......... Highlighted Rules np np 1 det 2 nn 3 np 1 in 2 nn 3 np np 1 np 1 np np 1 des nn 2 np 1 of the nn 2 np np 1 des nn 2 np 2 np 1 np det 1 nn 2 det 1 nn 2 np das Haus the house

Dotted Rules: Key Insight If we can apply a rule like to a span p A B C x Then we could have applied a rule like q A B y to a sub-span with the same starting word We can re-use rule lookup by storing A B (dotted rule)

Finding Applicable Rules in Prefix Tree das Haus des Architekten Frank Gehry

Covering the First Cell das Haus des Architekten Frank Gehry

Looking up Rules in the Prefix Tree das Haus des Architekten Frank Gehry

Taking Note of the Dotted Rule das Haus des Architekten Frank Gehry

Checking if Dotted Rule has Translations DET: the DET: that das Haus des Architekten Frank Gehry

Applying the Translation Rules DET: the DET: that DET: that DET: the das Haus des Architekten Frank Gehry

Looking up Constituent Label in Prefix Tree DET: that DET: the das Haus des Architekten Frank Gehry

Add to Span s List of Dotted Rules DET: that DET: the das Haus des Architekten Frank Gehry

Moving on to the Next Cell DET: that DET: the das Haus des Architekten Frank Gehry

Looking up Rules in the Prefix Tree Haus ❸ DET: that DET: the das Haus des Architekten Frank Gehry

Taking Note of the Dotted Rule Haus ❸ DET: that DET: the house ❸ das Haus des Architekten Frank Gehry

Checking if Dotted Rule has Translations Haus ❸ : house NP: house DET: that DET: the house ❸ das Haus des Architekten Frank Gehry

Applying the Translation Rules Haus ❸ : house NP: house DET: that DET: the NP: house : house house ❸ das Haus des Architekten Frank Gehry

Looking up Constituent Label in Prefix Tree Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house house ❸ das Haus des Architekten Frank Gehry

Add to Span s List of Dotted Rules Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ das Haus des Architekten Frank Gehry

More of the Same Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Moving on to the Next Cell Haus ❸ ❹ NP ❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Covering a Longer Span Cannot consume multiple words at once All rules are extensions of existing dotted rules Here: only extensions of span over das possible DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Extensions of Span over das Haus ❸ ❹ NP ❺, NP, Haus?, NP, Haus? DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Looking up Rules in the Prefix Tree Haus ❻ ❼ Haus ❽ ❾ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Taking Note of the Dotted Rule Haus ❻ ❼ Haus ❽ ❾ DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Checking if Dotted Rules have Translations Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Applying the Translation Rules Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house ❾ NP: DET NP: that house NP: the house DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Looking up Constituent Label in Prefix Tree Haus ❻ NP: the house ❼ NP: the Haus ❽ NP: DET house NP ❺ ❾ NP: DET NP: that house NP: the house DET ❾ DET Haus❽ das ❼ das Haus❻ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Add to Span s List of Dotted Rules NP: that house NP: the house NP ❺ Haus ❻ ❼ NP: the Haus ❽ NP: the house NP: DET house ❾ NP: DET DET ❾ DET Haus❽ das ❼ das Haus❻ NP❺ DET: that DET: the NP: house : house ❹ NP ❺ house ❸ IN: of DET: the des NP: architect : architect ❹ Architekten P: Frank P Frank P: Gehry P Gehry das Haus des Architekten Frank Gehry

Even Larger Spans Extend lists of dotted rules with cell constituent labels span s dotted rule list (with same start) plus neighboring span s constituent labels of hypotheses (with same end) das Haus des Architekten Frank Gehry

Reflections Complexity O(rn 3 ) with sentence length n and size of dotted rule list r may introduce maximum size for spans that do not start at beginning may limit size of dotted rule list (very arbitrary) Does the list of dotted rules explode? Yes, if there are many rules with neighboring target-side non-terminals such rules apply in many places rules with words are much more restricted

Difficult Rules Some rules may apply in too many ways Neighboring input non-terminals vp gibt x 1 x 2 gives np 2 to np 1 non-terminals may match many different pairs of spans especially a problem for hierarchical models (no constituent label restrictions) may be okay for syntax-models Three neighboring input non-terminals vp trifft x 1 x 2 x 3 heute meets np 1 today pp 2 pp 3 will get out of hand even for syntax models

Where are we now? We know which rules apply We know where they apply (each non-terminal tied to a span) But there are still many choices many possible translations each non-terminal may match multiple hypotheses number choices exponential with number of non-terminals

Rules with One Non-Terminal Found applicable rules pp des x... np... PP of NP PP by NP PP in NP PP on to NP the architect... architect Frank... the famous... Frank Gehry NP NP NP NP Non-terminal will be filled any of h underlying matching hypotheses Choice of t lexical translations Complexity O(ht) (note: we may not group rules by target constituent label, so a rule np des x the np would also be considered here as well)

Rules with Two Non-Terminals Found applicable rule np x 1 des x 2 np 1... np 2 a house a building the building a new house NP NP of NP NP NP by NP NP NP in NP NP NP on to NP the architect architect Frank... the famous... Frank Gehry NP NP NP NP Two non-terminal will be filled any of h underlying matching hypotheses each Choice of t lexical translations Complexity O(h 2 t) a three-dimensional cube of choices (note: rules may also reorder differently)

Filling a Constituent X :VP X :V X :NP a vu Hyp Score seen 3.8 saw 4.0 view 4.0 l homme Hyp Score man 3.6 the man 4.3 some men 6.3

Beam Search man -3.6 the man -4.3 some men -6.3 seen -3.8 seen man -8.8 seen the man -7.6 seen some men -9.5 saw -4.0 saw man -8.3 saw the man -6.9 saw some men -8.5 view -4.0 view man -8.5 view the man -8.9 view some men -10.8

Cube Pruning [Chiang, 2007] seen -3.8 saw -4.0 view -4.0 man -3.6 the man -4.3 some men -6.3 Queue Queue Hypothesis Sum seen man -3.8-3.6=-7.4

Cube Pruning [Chiang, 2007] man -3.6 the man -4.3 some men -6.3 seen -3.8 seen man -8.8 Queue saw -4.0 Queue view -4.0 Queue Hypothesis Sum saw man -4.0-3.6=-7.6 seen the man -3.8-4.3=-8.1

Cube Pruning [Chiang, 2007] man -3.6 the man -4.3 some men -6.3 seen -3.8 seen man -8.8 Queue saw -4.0 saw man -8.3 Queue view -4.0 Queue Queue Hypothesis Sum view man -4.0-3.6=-7.6 seen the man -3.8-4.3=-8.1 saw the man -4.0-4.3=-8.3

Cube Pruning versus Beam Search Same Bottom-up with fixed-size beams Different Beam filling algorithm

Queue of Cubes Several groups of rules will apply to a given span Each of them will have a cube We can create a queue of cubes Always pop off the most promising hypothesis, regardless of cube May have separate queues for different target constituent labels

Bottom-Up Chart Decoding Algorithm 1: for all spans (bottom up) do 2: extend dotted rules 3: for all dotted rules do 4: find group of applicable rules 5: create a cube for it 6: create first hypothesis in cube 7: place cube in queue 8: end for 9: for specified number of pops do 10: pop off best hypothesis of any cube in queue 11: add it to the chart cell 12: create its neighbors 13: end for 14: extend dotted rules over constituent labels 15: end for

Two-Stage Decoding First stage: decoding without a language model (-LM decoding) may be done exhaustively eliminate dead ends optionably prune out low scoring hypotheses Second stage: add language model limited to packed chart obtained in first stage Note: essentially, we do two-stage decoding for each span at a time

Coarse-to-Fine Decode with increasingly complex model Examples reduced language model [Zhang and Gildea, 2008] reduced set of non-terminals [DeNero et al., 2009] language model on clustered word classes [Petrov et al., 2008]

Outside Cost Estimation Which spans should be more emphasized in search? Initial decoding stage can provide outside cost estimates NP Sie PPER will VAFIN eine ART Tasse Kaffee trinken VVINF Use min/max language model costs to obtain admissible heuristic (or at least something that will guide search better)

Open Questions Where does the best translation fall out the beam? Are particular types of rules too quickly discarded? Are there systemic problems with cube pruning?

Summary Synchronous context free grammars Extracting rules from a syntactically parsed parallel corpus Bottom-up decoding Chart organization: dynamic programming, stacks, pruning Prefix tree for rules Dotted rules Cube pruning