The Datacycle Architecture for Very High Throughput Database Systems Gary Herman, Gtta Gopal, K C Lee, and Abel Wemnb Bell Commumcanons Research, Inc 435 South Street, Momstown, NJ 07960-1961
ABSTRACT
The evolunonary trend toward a database-drtven pubhc commumcanons network has mottvated research mto database architectures capable of executing thousands of transacnonsper second In th paper we mtroa!uce the Datacycle archUecture, an attempt to expht the enormous transmtsswn bandwuith of optical systems to penmt the unplementatwn of lugh throughput m&processor database systems The architecture has the potent&for unlmuted query throughput, sunplzfied data management, rapid execution of complex quenes, and t$ictent concurrency control We a&r&e the logtcal operatron of the architecture and discuss vnplementatton tssues tn the context of a prototype system currently under constructaon
1 INTRODUCTION
In thts paper, we mtroduce the Datacycle architecture, a novel system concept which we beheve can pemnt the Implementaaon of very high transaction throughput database systems Our research mto high throughput systems is monvated by the srchltectural evolutton of the Umted States pubhc telephone network toward a fully database-drwen network In such a network, a smgle, log~ally mtegrated database would be. consulted dunng call processmg to estabhsh caller percussions, perform name-address lranslatlons, execute feature logX, establish routes, etc In today’s network, centralized databases are consulted only for a small fraction of calls, for example, to seek authorzanon to complete the call by vahdatmg a cm& card number or to obtam routmg mstruchons spectic to a g~ven call, as m an “800” call attempt In the future, we foresee a network where access to a global network database 1s the norm m call processmg, rather than the exception The database technology challenge posed by thts view of the architecture of the public network 1s one of enormous transacnon volume To promde 800 SeMce, each database node m today’s pubhc network [She1821 receives approxnnately 100 sunple queues per second on its copy of a database of a few hundred thousand records Each record 1s accessed by a PermIssIon to copy wlthout fee all or part of this material IS granted provided that the copies are not made or dlstrlbuted for direct commercial advantage, the ACM copyrlght notlce and the title of the pubhcatlon and its date appear, and notlce IS given that copymg IS by permlsslon of the Assoclatlon for Computmg Machmery To copy otherwlse, or to repubhsh, reqmres a fee and/or specfic permission @ 1987 ACM 0-89791-236-5/S7/~5/~g7 75~ smgle key, updates are batched and executed pencd~ally In OUT mew of the future network archlttecture, processors m the network execute tens of thousands of transactlOns per second, some quite complex, accessmg databases contammg tens of rmlhons of records, and accommodatmg unpmitctable quenes and random updates while prov&ng consistent mews to all users With the current throughput of general purpose database systems hmtted to below 1000 transacnons per second [Gray85], database systems with the performance charactermcs reqmred by dus view of network a~httectme do not appear achievable by convenhonal architectures Our approach to actievmg the reqmred throughput seeks to exploit opportumues created by advances m pmcessmg and commurucattons technologes Other recent proposals for new database machmes have taken a smnlar approach, e g , through dense, mexpensive sermconductor memory m the Massive Memory Machme [Garc84] or through massive parallelism m VLSI m NON-VON [Hill861 Whtle the Datacycle architecture also depends on VLSI technolo@es, the pnmary motwatton for our approach is the abundance of inexpensive commumcattons bandwtdth wluch has become avadable through optuzal communications systems The Datacycle architecture is commumctions mtensive, shmulated by the hypothesis that If costs for commumcauons bandmdth become small, commumca~ons, pmcessmg and storage architectures for database systems should be reassessed Bandmdth abundance pemuts xepemve broadcast of the enare contents of the database to many pmcessmg elements, avo&ng I/O bottlenecks, sunphfymg data admuustration, and achlevmg efficient concurrency control
2 OVERVIEW OF THE DATACYCLE
ARCHITECTURE
As m&cated, the strategy m the Datacycle arch&cture 1s to broadcast database contents over a very high bandw&h me&- urn to a mulmude of “hstemng” processors that observe the stream m parallel and use haxdware filters to extract the mformanon relevant to the transactions pendtng locally As flustrated conceptually m Figure 1, the entne contents of the database arc read sequentially from the storage pump, broadcast repeatedly over the broadcast transmission channel, and observed passively and mdependently by many record access managers, each of which may copy records of local mtetest Each access manager is associated wtth a general purpose computmg system, or host, on which apphcanons reside Transactions on a host request database operahens through a well-defined mterface ~nth an associated access manager The record access manager decomposes each requested operation mto a set of speclficanons to be used to identify relevant records when they appear on the broadcast 97 u Broadcast Trmsmlsslon Chantwl Rword storage b Pump I I Host Host . . . Host Flgure 1. The Datacycle Architecture I channel, processes the records to complete the database operahon, and returns the results to the host. To complete a transactton and update the database, the host subnuts an update/ comnut request to the access manager Using the upstream network, the access manager sends the update request to the record update manager, which executes a nonconfhctmg subset of the update requests received m a cycle The access manager subsequently momtors the database to determme If the update attempt was successful and noties the host. In addmon, the access manager can utihze the hardware IDONtOMg capatity to decentrahze much of the conflict detection overhead 111 concurrency control For example, urlth “optnms~c” concurrency control, the assoclated access manager can momtor the transacQon’s read set to detect potent& contkts due to transactions comuuttmg elsewhere m the system The Datacycle archtectu~ has smnlanties with mformation &stnbutton architectures hke teletext [Slge80] and the MlT Community Information System [Gtif85], which employ broadcast of small databases or database updates to many receivmg stations However, these systems are text-onented, typically &smbute relatively small amounts of data, and have relatively large access ms to a gven data item Further, they do not deal urlth the issues of updates mgmatmg at the receivmg stations, transaction management, and query processmg
III the tradmonal database sense
In a database machme context, hardware filtermg and broadcamng of data to allemate the I/O bottleneck and to pemnt parallel access to data have been suggested before For example, N [Le1178], the data stream IS searched “on the fly” as it is read from &sks with parallel read heads In DIRECT @%Wl79] selected pages of the database are bmadcast repetlttvely to mulhple query processors This broadcast feature 1s uhhzed, for example, m a nested-loop ~oln algorithm for query processing In contrast to these systems, the hardware filtermg capabtity and the repehhve broadcast of the entne database 111 the Datacycle archlttecture are mtegral to both query processmg and transaction management functions such as concurrency control
The Datacycle architecture has several strengths No central I/O bottleneck eusts for record remeval, regardless of the number of processmg elements re+nred to handle the transacnon load. Since the enme database can be searched on the fly, a pmn knowledge of the precise location of each record m the database should not be necessary Therefore, the overhead nqnrcd for mamtanung a -tory can be ehmmated, the system 1s potentially fully content-addressable Further, the content-addressab&y and repetmve broadcast charactensucs of the architecture lend themselves to effiaent execution of complex quenes, such as JOIIIS These same archItectural features perrmt efficient concurrency control mechamsms when transactions ullsh to mod& the contents of the database Because record access mana@ read data and attempt wntes (updates) w&out exuhat commumca~ons among themselves, i&cur&c y control bverhead depends only on the volume of transachons and does not depend on the number of access managers (and hosts) provided to handle the transactton volume The Datacycle approach provides an opportu~ty to achieve high throughput, efficient execution of complex quenes, and efficient concurrency contml m a system with many dlsmbuted processors An assessment of avadable commumcanons, storage, and processmg technolo@es mdcates that the Datacycle archltecture 1s techmcally plausible Transnusslon systems employmg electromc mcdulahon of sermconductor lasers at rates m excess of one glgabt per second are avadable commercially today Wavelength &-Ion mulhplexors which can multiplex thi outputs of &teen such laseri onto a smgle optical fiber are also avtiable Thus, with current technoloav. several glgabytes of data can be &ansm~tted over a smgl~&trcal fiber m l&s than a second Transnusslon technol;gy & advancmg ramdlv, and, ~rlth the uronuse of future coherent opt& techm& e&the iforrnatioi capacity of a smgle fiber w-dl be measured in terabits per second Smce storage access to support the broadcast funchon 1s sequenhal and synchronous, smular to the output of a v&o frame buffer, creation of the appropnate read and wnte band- Hrldths to storage should be prunanly an engmeenng problem For example, 111 the case of senuconductor memmes, word Hndth, bank mterleavmg schemes, parallel to senal conversion, and multlplexmg technologes can be chosen to acheve the desmA broadcast data rate With the avadablllty of 16 megabit memory parts, one ggabyte of memory will occupy less than a smgle shelf Finally, at each hstenmg pomt, the contents of the broadcast stream must be exammed to ident@, and copy, data items relevant to local transactions Because 111 the Datacycle archltecture the hardware pattern comparison funcuon is mtegral to both query processmg and concurrency control, Its efficient lmplementanon is cntical to the prachcal apphcatity of the archltecture Studies of VLSI nnnlementauon of selection hardware [Faud85] are prormsmg, &Jectmg that VLSI technology m the 1990’s wdl pernut implementahon of 200 to 500 comparators per mtegrated cncmt, wth each comparator capable of searching for a d&rent 256 byte pattern In the remamder of dus paper, we descnbe the logical operahon of the Datacycle architecture and &scuss unplementauon issues m the context of an archrtecture prototype being built m our laboratory Specifically, Sechon 3 provides an overview of query handhng N the Datacycle archlttecture, with concurrency control and recovery discussed m Sections 4 and 5 Section 6 describes our design for a prototype system. Fmally, we conclude wtth a dlscusslon of the domam of apphcabillty of the arclutectuxe and descnbe ongomg research on the Datacycle approach to high throughput database systems 98
3. QUERY PROCESSING
Quenes are presented to a record access manager m a highlevel database language such as SQL [Date861 W&m the record access manager are many hardware pattern comparators managed by a supervisory subsystem, compnsmg pmcessmg elements, buffers, and an mterconnectlon network The supeMSOry subsystem translates quenes mto selechon pntrntlves that can be processed m the pattern comparator hardware The pattern comparator accepts commands of the form Select (data type) From (segment) For (request ld) Where (expression) The data type can be a record, or some fragment of a record, the segment defines the subset of the broadcast channel (e g , subchannel or a fragment of a subchannel) to be searched, the request Id associates the mstruaon with a specific query, the expression defines the pattern to be matched and comists of snnple boolean expressions relatmg the fields of a record and constants through operations such as equals, greater than, etc Using the select command to retum a field or a set of fields from all records 111 a relahon nnplements a proJect capab&y (without duphcate ehmmatlon) m hardware Duphcates must be detected and removed at a higher level to complete the project operahon To remeve d data items that satisfy some expression, select commands reqmre up to one full cycle tnne of the database to complete The architecture penmts an efficient Join algomhm To perform a Jam, m the first broadcast cycle of the database one pattern comparator performs a hardware proJect over the JOUP mg atmbute on the two relations to be ~olned The proJected values are sent to the supe~sory subsystem where a data structure 1s bmlt to mamtam a count of the number of tunes each value appears m each of the two relations The data structure can be an array If the domam of the JOUI atmbute is known m advance and 1s relatively small Othemse, a hash table can be used. Every value winch has non-zero count for both relanons wfi take part in theJom, the count gves the Size of the result exactly If enough pattern comparators are avadable, the supervmxy subsystem loads pattern comparators wth each value that wdl take part m the Join, and III the next cycle all the desired records are retneved. The actual Join 1s then performed III the supemsory subsystem If fewer pattern comparators are aviulable, more cycles may be reqmred This JOUI operahon resembles the ht array technique m [Babb79] and the hardware-assisted semJom approach 111 [Faud85] However, the hardware proJect capatity, the cychc nature of data presentahon, and the content-addressable feature of the Datacycle architecture penmt a pamcularly efficient nnplementahon The algorithm has the potential to re.meve all the records requred for a JOT in only two broadcast cycles
4. CONCURRENCY CONTROL
An effiiclent concurrency control mechamsm is cnncal to achlevmg high throughput The concurrency control mechamsm m the Datacycle architecture ensures database connstency, pernuts record access managers to read and wnte the database Hrlthout explicit cornmumcations among themselves, and decentrahzes much of the confhct detecuon for update transacnons Concurrency control 1s precllcated on the assumptions that each broadcast cycle of the database has a &stmct begmmng and end and that the contents of the database w&m a smgle cycle are guaranteed to be consistent That is, the record update manager provides atomrc record updates for the writes of a transacnon, no parhally completed updates are perrmtted m a cycle Thus, an access manager can execute a read-only minsacnon Hrlthout any concurrency control overhead at all, so long as It reads all records withm a single cycle However, some scheme must be used to control the concurrent attempts of the various access managers to update the database m the storage pump Most of the techmques to control snnultaneous, mulhuser access to a smgle database fall mto three categones - lockmg, tunestamps and cemfication [Bern811 Each of the three techmques has a domam of apphcab~lty. 1 e , there are transactton enwonments for which each techmque 1s best smted. In envnonments where the number of data items 1s large compared to the number of transactions runnmg concurrently and the probabtity of overlappmg transactions 1s small (network apphcanons may fall mto dus category), the overhead of grantmg locks or mamtammg tunestamps does not seem Jumfiable In these cases cemfication schemes are expected to pexform well The Datacycle arch&cttne pernuts an efficient nnplementatlon of such a scheme, which 1s described here In general, update transactions include a read phase, an execubon phase, and a comnut phase For the present, we consider only transachons which complete their read phase w&m a smgle cycle and, therefore, obtam a consistent read set In the read phase of a transaction, the access manager mdependently extracts remrds from the broadcast stream and mamtams Its own hst of records 111 use, based on the records reqmred locally Durmg the execuaon phase of a transation, the access manager momtors the transachon’s read set For each record on the local hst of records m use, the supemsory system in the access manager mstructs the pattern comparator hardware toscanthebroadcaststreamto&terrmneIftherecordhasbeen changed (updated) smce it was ongmally read for the assocated transaction With dus hardware testmg capahhty and the RpetihVe broadcast of the database, the access manager can detect any mod&ahon to a pendmg lransachon’s read set, nonfy the host, and pass the new record values to the transacbon Based on these new values, the transachon can abort, back up, or contmue The declslon is smctly local Unhke most cemfication schemes, the “cemficahon urlth momtormg” scheme m the Datacycle arclntecture pernuts transactions to abort as soon as a conflict 1s detected, reducmg processmg wasted on already doomed transachons An access manager has respon&hty for detectmg mterference with a transachon’s read set from the begmnmg of the transacaon through the cycle m winch a transac~on reaches its commt phase and wshes to wnte to the database This cycle de fines a local comnut tnnestamp Then, the access manager sends an update request, mcludmg the new record values, the idenhaes of records 111 the read set, and the comrmt tunestamp, to the update manager The update manager performs two types of central cemficahon on the update requests it recewes Fast, it vahdates each request agamst update requests gmnted m the recent past Some latency wdl exist between the tune a locally cemfied transachon generates an update request and the tune the results of the update, If granted, actually appear m the database Consequently, the update manager 1s responsible for detectmg update attempts mvahdated by previously granted updates whose effects appear m the database after the local comt hmestamp, escapmg detecnon at the access manager Second, the update manager also detects update requests that conflict with other requests subnntted wrth the same tnnestamp The update manager accepts a senahzable subset of attempts and mtcoduces the changed records mto pump storage Fa&d attempts are sunply &scarded by the update manager Each access manager subsequently detemnn es whether the tmnsactlons mth subnutted update requests have cormrutted successfully by elthez mommrmg the actual data values or examuung a porhon of the bandwidth used to broadcast a log of commmed transactions Cornnut authorzitlon is then retumed to the host Extensions to this basic scheme for concurrency control are described m [Gopa871 For example, multiversion optmusac techmques for concurrency control can be apphed m the Datacycle archtecture to pemut transactions to read over multiple database cycles and to protie greater overall concurrency, as well As ~nth the snnple optnmsac techmque described here, the mulnversion approach decentrahzes much of the concurrency control overhead and allows read-only transactions to complete locally
5 FAILURE RECOVERY
Graceful and rapid recovery from subsystem fiulure 1s a reqmrement for a practical database machme archtecture The novel charactenstlcs of the Datacycle architecture provide conceptually sunple mechanisms for recovery to a known consistent state If component fdm occurs The update manager and storage pump manage the enme database and must, therefore, be made very xehable Because the costs of these central subsystems are shared over the many record access managers, tradmonal approaches to rehab&ty through redundancy should be econormcal Redundancy can lead to qmck recovery, as well For example, smce the database 1s consistent w&m a cycle, the last complete cycle of the database (essenhally, a full database checkpoint) nught be kept by a redundant “shadow” pump Thus, while the Bctlve pump 1s transrmttmg cycle K+l, the shadow pump retams the database of cycle K If the acnve pump fds dunng cycle K+l. the “shadow” pump becomes actwe, and the system “nzcovers” to the cycle K state If access managers waUto see that the full broadcast cycle is completed before authmg transachons to comnut, fa&ue of the pump or transnussion channel dunng cycle K+l results m the set of pendmg transactions rolhng back to the state exlstmg followmg complehon of cycle K If cycle K+l completes successfully, the shadow copy must then be updated to reflect state K+l Recovery from access manager fa&ue can also be relaavely snnple Record access managers operate completely mdependently of one anotheG even then connaon to the shared opocal broadcast medmm 1s through passive taps Hence, the fatlure of one access manager can have no effect on the others Also, If one of these subsystems fads, requests whch would normally be served by the faded access manager can be r&- rected to other access managers, and the system as a whole wll expenence httle degradation To recover from fadure of an access manager ~rlth transactions m the phase between comrmt request and cormmt authonzahon, the update manager must mamtam a log of comrmtted transactions to be accessed when the access manager recovers or the host 1s rehomed to a new access manager
6 IMPLEMENTING THE DATACYCLE
ARCHITECTURE We are currently constructmg a prototype of the Datacycle archltecture, both to demonstrate technical feaslbdlty and to serve as a vehicle for further research on the architecture Our efforts m the first version of the prototype focus on the record access manager, which we consider to be the crmcal subsystem m the architecture, both m terms of functionahty and cost Our strategy for the prototype emphasm% flexlbtity and ease of Implementation over efficiency Nevertheless, the prototype deslgu 1s mtended to consider the srgnlficant arc~tectural issues to be addressed m Implementing a full scale system To prowde an overview of the structure and scale of the Datacycle prototype, we first bnefly descnbe the design of the transrmssion channels, storage pump, and record update manager We then descnbe our rationale and design for the record access manager Transmission Channels To allow for flexlhhty m the total size of the database, Implementation of the pump and the record access manager must be mdependent of the total transnussion capacity pmwded m the broadcast channel The approach, as dustrated m Figure 2,ls to multiplex together a number of relahvely slow data streams from the pump, carry out the transnussion at the higher aggregate rate, and then, at each access manager, demultiplex the stream mto a number of slower channels, each of which operates at a rate smtable for the pattern comparator technology m the access managers A need for more storage 1s accommodated by addmg a new storage subsystem to the pump and new pattern comparamr subsystems at each of the access managers Besides providmg flexlbtity m database SUR, this approach also reduces the clock rate reqmred ~nthm the pattern comparator ele.ctromcs Record Access Manager Subsystems 3 + Host Communlcatlons Figure 2. The Datacycle Prototype
The database prototype uses frammg and transrms~ion elecrromcs created for an expernnental broadband commumcations system nnplemented m Bellcore [Hayw87] The basic subchannel rate 1s 140 megalxts/second, accessible for readmg or wnhng through a 17 5 megabyte/second byte parallel mterface prowded by an mtegrated cucmt framer Up to sxteen such subchannels feed cascaded 4 1 GaAs multiplexors to create an aggregate 2 24 @gabit/second bit stream that modulates a semconductor laser The outputs of sixteen lasers can be mulnplexed optically onto one single mode ophcal fiber Thus, the transrmssion technology avadable for the prototype could transrmt a database of over four ggabytes m less than one second over a smgle fiber Early versions of the prototype wdl use 4 to 16 140 Mbps subchannels In addmon to the pnmary broadcast channel, an upstream channel is requu?d to transrmt record update requests from the access managers to the record update manager In the snnple version of the architecture presented here, this upstream channel 1s separate from the broadcast mednun and may be implemented by any commumcanons network provldmg the requlred funchonahty and performance In our prototype, upstream commumcahons between access managers and the update manager urlll be through EthemeP EthemeiTM 1s a trademark of Xerox Corporation
Storage Pump. Implementahon of the record storage pump IS driven by system tequnements for teadmg records from the pump and transmmmg them onto the broadcast medmm, and for wnhng new or altered records mto the pump storage to update the database An evaluation of the two most popular storage technologes today, movmg head magnetic &Sk and senuconductor memory, m&cates that seuuconductor memory 1s a more appropnate choice for the storage pump apphcation m terms of cost and design sunphclty The sustamable read rate for most c ommerclal dnves-appmmtely 8 5 megabyte&c , mcludmg seeks, platter rotation, etc , m the case of one high performance dnve [FuJi84+is too low relahve to storage volume That IS, such a dnve can transmtt less than 3% of its capactty m a second, m the Datacycle at&tecture, the extra capacity IS wasted On the other hand, generatmg the appropnate I/O bandwtdths for synchronous, sequenti read access to senuconductor memory 1s a relatively str~ghtforward design exercise The mayor disadvantage of senuconductor memory as pnmary storage for a database apphcation 1s its volatity, which affects strateges for system rehablhty and recovery (c f , [Hagm86]) In the Datacycle prototype, the storage pump wfi conust of a number of 4 megabyte banks of sermconductor memory, with each bank managed by a rmcroprocessor, as shown 111 Figure 3 Using a DMA-l&e address sequencer, each bank w& m turn, create a 17 5 megabytes/secondata stream to feed the framer mterface Dual-ported memory m each bank proties the mechamsm for the update manager to promde new records to pump storage and to mstruct the bank nucroprocessor to delete old records The bank nucroprocessor 1s responsible for mamtauung the memory bank as instructed by the update manager Assummg that four tisy-chamed banks feed a smgle framer, each 140 Mbps transnusslon subchannel broadcasts 16 megabytes repeated every 0 9 seconds At any tune 111 a broadcast cycle, only one fourth of the banks Hrlll be m “broadcast” mode, m the other banks, the bank nucroprocessors will be addmg and delehng records, perfonmng storage housekeeping duties, etc
Figure 3 Prototype Pump and Update Manager
Record Update Manager. The update manager m the Datacycle prototype is a general purpose n~croprocessor which receives update requests from the access managers over a local ma network and provides altered records to the cell rmcroprocessors through shared memory The mmal algonthm for resolvmg update i~nfhcts 1s snnply first-come, fiistserved For each cycle, the update manager mamtams tables for the read sets and wnte sets of granted updates Record IDS of mcommg update requests are checked agamst the tables, If a non-senahzable confhct 1s detected, the update 1s ignored. If the atonucity of the update to storage can be guaranteed, it is executed, and the tables are updated to mclude the new read and wnte set If the update atomcity cannot be guaranteed, due to lack of mne mmauung 111 the cycle, the update 1s lgnored. Update throughput can be unproved, at the cost of delaymg transaction completion, by allowmg one or more adds honal broadcast cycles between update request and confirmanon Thus, an update request subnutted dunng cycle K nught be mtmduced m the database dunng cycle K+n. The update manager has responsltity for &tecMg con&B between each new update attempt and updates which were granted prev~- ously but which were not vlslble to the access manager The cnhcal design issue for the pump and update manager 1s how to prom& the guarantee of atonnc updates for each transacnon w&out constrammg the update throughput to too low a level This is an area of current research that is clearly cnhcal to the successful reahzahon of the archltectuxe Record Access Manager. The access manager provides its host or hosts Hrlth the functlonahty of a typical database machme, decomposmg requests mto database pnmmves, asngnmg the prnmtwes to pattern comparator hardware, and then processmg the returned data items to generate the response, as described m Secnon 3 The access manager slmultaneously supervises the concurrency control process, one mew of which was described m Section 4 The access manager 1s the focus for our mmal research because of its novel use of both the pattern comparator hardware and the repehtwe broadcast of the database contents for database quenes and for concurrency control The atchlttecture 1s cnhcally dependent on the efficiency of the pattern companson operation For example, a selectton may reqmre dedtcatmg pattern comparators for an exhaustive search of every sub channel of the database Momtormg for concurrency control also mcxases the demand for pattern comparators, as do complex quenes such as moms If pattern comparator hardware Hrlth the nzqmred boolean functions 1s capable of hundreds of smmltaneous compansons per IC, as [Faud85] protects, then sunple, exhaustive seamh strategtes are plausible On the other hand, If pattern comparators are expensive, then more sophtshcated and haxdwareefficient query pmcessmg, concurrency control, and content searchmg smateges may be reqnned. For example, the aviulable pattern comparators mtght be more efficiently used by utthzmg the channel structure together ~rlth some mdexmg or hashmg scheme so that only some of the channels need to be searched for a smgle pattern Further, segments defined by saons of tune on a channel can also be used m the mdexmg Our prototype record access manager 1s deslgned to prowde us an environment m which to expenment ~tb altemaave approaches m unplementmg key functions m the Datacycle archltectme In the Datacycle prototype, the access manager consists of dlstmct subsystems, the Query Manager (QM). Data Manager (DM), and Intelhgent Record Retneval Modules
(IRS). as shown m Figure 4 In the model of Section 3, the QM and DM compnse the supe~sory subsystem, and the IRS contam the pattern comparators
su orvlsory Su system ki I Host
Figure 4 Record Access Manager Prototype
The Query Manager receives quenes from a host, decomposes those quenes mto operatmg pnmmves, and subnuts the prnmhves to the DM for execution The Query Manager also manages the updates for comnuttmg transactions, nutzatmg the update request to the update manager and, m conJunction ~th the DMs and IRS, confirmmg the success of the update In our prototype nnplementatton, the QM IS a m -based rmcrocomputer which communicates with the host and the update manager usmg Ethernet As shown m Figure 4, the Data Manager m the prototype conslsts of multiple processing nodes connected to form a network expandable to manage access to many broadcast subchannels The DM maps the lognzal data space (based on relatton names and field values) to the physical data space (based on subchannels and subchannel segments), routes the prmutlves to the appropnate IRS, buffers mtermdate results, and routes the results of the pnmmve database operations to the Query Manager Fmally, each IR consists of a processor managmg four VLSI pattern comparator umts that perform the low level search operanons 111 the archmzchue The pattern comparator ICs are bem 8 designed for lmt~al rmplementation m 2 0 nucron CM S Thev anaapated capatines mclude vanable length pr&z.ates, capacity to store up to 150 (four byte) pticates on UN&m 1s a trademark of AT&T Bell Laboratones chip, a throughput of 20 rmlhon snnple pr&cate tests per second, capabtity to evaluate snnple boolean expressions, and the abfity to surltch between target patterns on database segment boundanes Pattern comparator ICs receive data from the hansnusslon subsystem over the 17 S megabyte per second byte parallel interfaces provided by tier ICs
7. DISCUSSION
As a system concept, the Datacycle architecture has amcnve charactenstlcs, both for our ongmal network apphcatlon and as a general purpose database system St& the architecture has hnutations and weaknesses, and both techmcal feaslbhty and pract~ahty remam unproven In this section, we &scuss our current view of the domam of apphcablllty of the archltecture and Identify key open Issues and areas of current research The archltectmz appears well sated for very high throughput, read-onented apphcations m which transactions have relatively small. predeclared read sets and wish to access a common database through any of several keys The “database-driven network” problem which motivated our research has these charactenstlcs The maxnnum size of a database slutable for the Datacycle approach 1s a SubJect of current mvestiganon, for a fixed broadcast cycle tune, a larger database requires a higher bandwdth channel and, consequently, either more pattern comparators (to search more subchannels) or faster comparators We databases of hundreds of megabytes appear xealzable with sub-second access hmes using current transnusslon technology, the dommant cost of a large database system usmg the Datacycle archlttecture 1s the hardware cost of searchmg Whde the Datacycle approach 1s “read-onented”, It 1s not a “read-only” anhecttne hke teletext. Use of the repetmve broadcast of the database to decentrahze con&t detection and synchroNze the aCtIOnS of the many access managers is a key feature of amh~tecture The ar&tectme should exhlbt excellent update throughput when compared with convenuonsl archltectuxes We the centrahzed update manager remams an ulmte throughput bottleneck m the archlttecture for transachens which write, as well as read, the database, we are mveshgatmg archltecmres which &smbute the update certlfcatlon funChOn to achieve even greater throughput, applymg the approach of [wem87] An issue III the Datacycle architecture 1s support for transactions which have large read sets and/or extended execuhon phases, such transachons may have &fficulty completmg successfully To address dus Issue, we are explonng concurrency control techmques wluch increase the propomon of trans.- achons which complete successfully m a high conflict envlronment [Gopa87] One approach is to employ a multiversion scheme, penmthng consistent reads to occur over multiple cycles and mcreasmg concurrency, at the cost of increased storage and bandHrldth reqmrements Other approaches may exploit the hardware momtormg capabtity at the access managers to penmt finer grsnulanty for local vabdation, pnzhcate-based vahdahon, and recovery from con&t by parhal transactron backup A weakness of the archltectunz 1s the access latency which results from the synchronous, pen&c presentation of data llus read latency adversely affects the system response hme for transacnons that consist of a senes of dependent reads, that 1s. where each database read depends on the result of Its predecessor A transachon with K such reads applied to a database with cycle tune T can take KT or longer to complete Two approaches may reduce this problem first, structure the database and query processmg language to anticipate such quenes, and, second, dnve down the cycle hme T AS before,
reducmg T reqmres a h@er bandwdth channel and more mvestment m search hardware, 1 e , either more, or faster, pattern comparators The cost of the hardware pattern comparison operation 1s crmcal to query processmg, database search strateges, and concurrency control m the Datacycle architecture Through the VLSI design achmty m our prototype, we are mveshgahng altemahve approaches to achlevmg highly parallel selechon operahons m VLSI, technolo@es to pernnt very high speed compansons are also desuable, smce they pemut shortenmg the database cycle tune and reducmg the number of subchannels reqmred for very large databases We are also explormg techxnques exploltmg a pnorr knowledge of the data placement m subchannels and subchannel segments to utilze hardware pattern comparators more efficiently, while retammg most of the snnpllclty of data management prormsed by the content-addressable feature of the archltecture Research on query decomposlhon explores the dependency of decomposmon strategy and access manager architecture on the funchonahty and cost of the hardware seh?chon operahons More funchonahty (e g , tid card capabhty, boolean exprcssions) can reduce the pattern comparatofls speed or capacity for sunultaneous searches On the other hand, less funchonahty results in less efficient fitenng of the data stream, mcreasmg the I/O and processmg burden on the supe~sory subsystem
8 SUMMARY
In this paper, we have described a novel architecture for high throughput database systems based on exploltahon of abundant commumcations bandwdth In this architecture, repetihve broadcast of the entue contents of a database to many pm cessmg elements ehmmates read bottlenecks, slmphfies data management, supports rapid execuhon of complex queries, and pernuts efficient concurrency control The architecture 1s plausible Dven current transrmssion, storage, and VLSI technologes, a system prototype is under Construchon
9 ACKNOWLEDGEMENTS
Many of the concepts described III dus paper draw on the conmbuhons of past and present members of the Datacycle pro- Ject Enc Gausmann, Kalwant Grewal, Dorothy DeLuca, &ug Redmg, and David Wagner Also, we acknowledge wth grahtude the comments and suggestions of our colleagues elsewhere m Bellcore |