A Specialized Architecture For Object Serialization With . PDF Free Download

2y ago

33 Views

1 Downloads

792.79 KB

13 Pages

Report/dmca

Download PDF

Transcription

A Specialized Architecture for Object Serializationwith Applications to Big Data AnalyticsJaeyoung Jang†Sung Jun Jung‡Hoon Shin‡† Sungkyunkwanjaey86@skku.eduSunmin Jeong‡Tae Jun Ham‡University‡ SeoulJun Heo‡Jae W. Lee‡National University{miguel92, sunnyday0208, j.heo, zmqp, taejunham, jaewlee}@snu.ac.kr[41]. According to Google, serialization, along with other lowlevel operations like memory allocation, compression, andnetwork stack processing, consumes 20-25% of total CPUcycles as "datacenter tax" [9].Unfortunately, existing S/D libraries still have relatively lowthroughput for various reasons. For example, the Java built-inserializer (Java S/D) embeds type strings (e.g., class and fieldnames) in a string format to require expensive string matchingoperations for type resolution during deserialization and bloatthe serialized stream. Other S/D libraries improve throughput byadopting more compact representations for types (i.e., integerclass numbering) [34], [41], [51], hand-optimized serializationfunctions [34], [48], compilation-based approach to obviatethe need for extracting field information at runtime [45], directoperations at backing array to reduce the encoding cost [12],and sending a raw object graph while overlapping computationwith communication [41].Although these proposals address some of the inefficiencies,there is still substantial room for improvement. For example,our experiments demonstrate that S/D operations still take about28% of total execution time on average (and up to 83.4%)for six Spark applications even if we use Kryo [34], a highlyoptimized S/D library for Java (details in Section III).A serialization operation requires a recursive traversal ofIndex Terms—Object serialization, Domain-specific architecture, Data analytics, Apache Spark, Hardware-software co-design object graph from the top-level object being serialized asall the referenced child objects should be serialized together.Furthermore, it invokes a large number of function calls toI. I NTRODUCTIONextract individual fields for each object (e.g., reflection [30]).Object serialization and deserialization (S/D) is a fundamen- Existing S/D libraries are inefficient when handling thosetal operation in modern big data analytics for portable, loseless operations, partly due to their limited use of parallelism withincommunication between distributed computing nodes. For the S/D process. Even with highly parallelized S/D algorithms,efficient inter-node data transfers, the sender node first serializes their performance on general-purpose CPUs is bottleneckeda sea of objects into a stream of bytes; the receiver node then by limited microarchitectural resources such as instructionreconstructs the objects from the serialized byte stream. Massive window and load-store queues. Even worse, S/D operationsdata manipulation operations in today’s big data analytics are characterized by frequent indirect loads, random memoryframeworks [3]–[6], [11], [55], such as map/reduce, shuffle, accesses, large-volume memory copies, and little data reuse,which make it difficult to handle them efficiently on CPUs.and join, heavily utilize S/D operations.S/D operations incur substantial performance overhead toTo address these limitations, we propose Cereal, a hardwaremodern data analytics frameworks running on managed runtime accelerator for S/D operations. We carefully co-design theenvironments like Java and Scala. Recent studies report that serialization format with the hardware architecture to effectivelyserialization overhead accounts for some 30% of the total leverage multiple levels of parallelism during the S/D process.execution time in popular big data analytics frameworks [40], This multi-level parallelism harnesses massive memory-levelAbstract—Object serialization and deserialization (S/D) is anessential feature for efficient communication between distributedcomputing nodes with potentially non-uniform execution environments. S/D operations are widely used in big data analytics frameworks for remote procedure calls and massive datatransfers like shuffles. However, frequent S/D operations incursignificant performance and energy overheads as they musttraverse and process a large object graph. Prior approachesimprove S/D throughput by effectively hiding disk or network I/Olatency with computation, increasing compression ratio, and/orapplication-specific customization. However, inherent dependencies in the existing (de)serialization formats and algorithmseventually become the major performance bottleneck. Thus, wepropose Cereal, a specialized hardware accelerator for memoryobject serialization. By co-designing the serialization formatwith hardware architecture, Cereal effectively utilizes abundantparallelism in the S/D process to deliver high throughput. Cerealalso employs an efficient object packing scheme to compressmetadata such as object reference offsets and a space-efficientbitmap representation for the object layout. Our evaluation ofCereal using both a cycle-level simulator and synthesizable ChiselRTL demonstrates that Cereal delivers 43.4 higher averageS/D throughput than 88 other S/D libraries on Java SerializationBenchmark Suite. For six Spark applications Cereal achieves7.97 and 4.81 speedups on average for S/D operations overJava built-in serializer and Kryo, respectively, while saving S/Denergy by 227.75 and 136.28 .

parallelism (MLP) for Cereal to achieve high memory bandwidth utilization, and hence high S/D throughput. To keep thespace overhead for metadata low, we also propose an efficientobject packing scheme to compress object reference offsets bypreserving only significant bits (e.g., discarding leading zeros)and a bitmap representation for the object layout. In summary,this paper makes the following contributions: We propose a new serialization format to better expose coarsegrained parallelism in the S/D process, while maintainingcompact representation of metadata via efficient objectpacking. We architect and implement Cereal, a specialized architecturefor S/D operations, which is co-designed with the newserialization format. We integrate Cereal with Apache Spark on HotSpot JVM tovalidate its functionality and evaluate its performance usingreal-world data analytics workloads. We demonstrate that Cereal achieves significant speedupsand energy savings over state-of-the-art software S/D implementations on Java Serialization Benchmark Suite and Sparkapplications.II. BACKGROUNDObject Serialization. Serialization is a process of convertingthe objects into a stream of bytes and deserialization is a reverseprocess of reconstructing the original objects from the streamof bytes. S/D is a fundamental operation in data analyticsframeworks for massive data communication and manipulation(e.g., for join and shuffle), I/O management in a distributed filesystem, and messaging for remote procedure calls [5], [18], [55].There are a variety of S/D implementations depending on usecase, data types, and necessity for supporting references (pointerobjects) [34], [45]. The serialization process becomes morecomplicated with references as a recursive object graph traversalis required to identify all child objects transitively pointed to bythe top-level object. Indirect loads to fetch information aboutthe layout of an object make this process even slower. Recently,S/D has been identified as a major performance bottleneck indata analytics frameworks [39], [41], [52]. Thus, this paperfocuses on optimizing S/D operations in Java as many suchframeworks build on JVM-based execution environments.Java Object Layout. Figure 1(a) illustrates the memorylayout of Java objects in HotSpot [27], the most widely usedproduction JVM, using an example code snippet. An object hasa fixed-length header, followed by fields that hold the valuesand references of the object. The header is 16B in lengthand composed of a mark word (8B) and a klass pointer (8B).The mark word includes an identity hash code (31 bits), asynchronization state (3 bits), GC state bits (6 bits), and 25unused bits. A klass pointer points to type metadata, calledtype descriptor, which contains the object layout as well asthe total object size. The object layout includes the offsets ofall the references in the object, which a serializer utilizes tolocate them.Java built-in Serializer. Java built-in serializer [28] (Java S/D)is the baseline serializer provided by Java. When an objectCodesnippetClass ObjA implements Serializable {Int valA 10;ObjB objB new ObjB();}Class ObjB implements Serializable {Long valB 200L;}Reference offsets(a)MemorylayoutTypedescriptor1 Loadreference offsetsoffset value forobjB field (24)object objAmarkklassHeader (16B)2 Find the location ofthe reference10valA (8B)object objBmarkklassField informationClass information(b)JavaS/DSer format200Length of Class # of Field Length of FieldValue or class name name fields type field name namereferenceobjA4 ObjA 2 I 4 valA S 4 ObjB104 ObjB 1 L 4 valB200objB(c)KryoType Registry ObjA11ObjB12Ser format ClassID Null check Value ReferenceobjA11 1 1012 1200objB(d)SkywayType Registry ObjA11ObjB12markSer format mark ClassIDaddr 1001110objA40Value orreferenceaddr 140mark12200objBFig. 1. Serialization format of three Java serializersis serialized, the serialized stream contains the field data ofthe object and metadata for its class and the classes of allthe objects it recursively references to. Figure 1(b) shows anexample of a serialized stream of Java S/D. Java S/D stores thenames of all classes and fields as string types. In addition, JavaS/D stores the length of the name, the metadata of the fields(number of fields, field types), and other metadata associatedwith serialization. As a result, the serialized stream containsa large amount of metadata. For reference fields, Java S/Duses Package java.lang.reflect [30] to get the contentof each reference field and insert it to the serialized stream.For example, Field getField (String name) in the packagereturns a Field object of the class specified by name and voidset (Object obj, Object value) sets the target field withthe input value. Generally, the methods in this package are awell-known source of computational overhead in Java S/D asthey perform string lookups with no given type information.Kryo Serializer. Kryo [34] is one of the most popular thirdparty serialization libraries for Java. Kryo addresses thelimitations of Java S/D with manual registration of classes.Figure 1(c) shows a serialized stream in Kryo. Unlike JavaS/D, Kryo represents all classes by just using a 4B ClassID(integer class numbering). All field types and primitive types(e.g., Int, Long) are also registered to get assigned distinctClassIDs. By registering classes and types (type registration),Kryo reduces the overhead of storing type names as strings(plus additional metadata) in Java S/D. And, there is a Nullcheck (1B) field to mark an object with no fields. Overall,

SERDESSER I/O DES I/OGCComputeuse cases are common in other large-scale data analyticsframeworks. We select six S/D-intensive applications fromBayesIntel HiBench [24] and execute them using two Spark executorLRinstances on Intel i7-7820X Processors [25]. More detailedTerasortexperimental setup is available in Section VI-A. We breakdownALSthe total application time into 1) computation time, 2) GC time,0%10%20%30%40%50%60%70%80%90% 100%3) I/O time, and 4) S/D time.(a) Java S/DFigure 2 shows a runtime breakdown of the six applicationsSERDESSER I/ODES I/OGCComputeusing Java S/D (Figure 2(a)) and Kryo (Figure 2(b)), which areNweightSVMused by Spark. Our analysis shows that the overhead of S/DBayesoperations is substantial, accounting for an average of 39.5%LRof total execution time for Java S/D (and up to 90.9% withTerasortALSSVM), and 28.3% for Kryo (and up to 83.4% with SVM). As0%10%20%30%40%50%60%70%80%90% 100%discussed in Section II, Java S/D embeds a string for every class(b) Kryoand field into the serialized stream, which leads to substantialI/O overhead (up to 13.9% for NWeight). Even worse, theFig. 2. S/D overhead in Spark applications with different serializer librariesexcessive use of reflection functions for type resolution, whichthere is much less metadata than Java S/D as Kryo only stores involve expensive string match operations, makes S/D processthe actual data. Kryo also uses an optimized library [49] to very slow. In contrast, Kryo employs integer class numbering,get class/field information (serialization) or set field values minimizing type metadata overhead. Also, Kryo does not need(deserialization), instead of the original reflect package in Java to use reflection for type resolution but utilizes a customS/D. These optimizations reduce the time and space overhead reflection library for (re)storing values, to yield substantiallydrastically, but it requires users to manually register the class lower runtime overhead than Java S/D. However, even withof every single object being serialized. The same type registry Kryo, the S/D overhead is still substantial, taking nearly 30%must be used for deserialization. Therefore, Kryo requires of total execution time on average.To understand the S/D performance better, we use three dataadditional user effort for performance.Skyway Serializer. Skyway [41] is a state-of-the-art serializer structure benchmarks commonly used in many applications,specialized for data shuffling between nodes in a distributed Tree, List, and Graph, and measure the IPC, cache miss rate, andsystem. Skyway proposes a way to transfer an object by a bandwidth utilization of S/D process using Linux Perf tool [14].simple memory copy to reduce the computational overhead For each benchmark, we scale the number of children nodesof disassembling and reassembling it. This eliminates the cost and depth (Tree), the size (List), and the number of connectedof accessing any field or type imposed on existing serializers. edges (Graph). More details of these benchmarks are coveredSkyway also uses an integer type ID (like Kryo) for identifying in Section VI-A.There are several observations with the results. First, Figthe class of the object instead of type string. Skyway also keepsa global type registry, which maps every type string to its unique ure 3(a) shows relatively low average IPCs for both Java S/Dtype ID (like Kryo). Unlike Kryo, however, Skyway can reduce (1.01) and Kryo (0.96). Second, Figure 3(b) demonstrates thatmanual type registration effort by utilizing an automatic type the degree of (temporal) locality is pretty low (i.e., cache missregistration system. During serialization, Skyway converts the rate is high). Finally, even with a high cache miss rate, both failabsolute address of the object to a relative address, eliminating to fully utilize memory bandwidth due to limited parallelismthe overhead of invoking methods in the reflect package for in the S/D process (Figure 3(c)).adjusting references. Skyway reduces the user effort as well as Motivation for Specialized Hardware. In an abstract form,computational overhead, to report a 16% speedup over Kryo the key operations in the S/D process include (i) object graphon average [34]. However, the sequential reference adjustment traversal for serialization and (ii) copying the contents of theof objects at the receiver is still inefficient. Also, the size of visited object into another space. These operations are handledthe serialized byte is larger than Kryo’s size as the object is poorly in general-purpose CPUs for the following reasons.First, object graph traversal requires many indirect loads toserialized as is including reference fields and headers.visit its child objects. Such indirect loads inherently generateIII. A NALYSIS AND M OTIVATIONlots of random memory accesses, which significantly degradeS/D Overhead. To quantify the S/D overhead in data analytics the cache performance. A copying operation, which touchesapplications, we analyze the performance of six data analytics large memory regions, evicts cache lines to increase cacheapplications on Apache Spark running on HotSpot JVM [27]. miss rate. Moreover, the object graph has a low degree ofSpark extensively utilizes S/D operations for 1) input/output parallelism without exploiting intra-object parallelism, andmanagement in a distributed file system (e.g., HDFS [50]), 2) CPUs cannot fully utilize memory-level parallelism (MLP) duecommunication between the master and worker nodes, 3) data to limited instruction window and load/store queue size, hencemanipulation across the nodes such as Shuffle, and 4) software yielding low memory bandwidth utilization (Figure 3(c)). Ascaching and data spill for efficient memory management. These a result, Kryo improves the Java S/D by removing severalNweightSVM

Java S/D3Kryo0obj AIPC20010001VBVVVC100(a)Memorylayout0Narrow WideTreeSmallLarge Sparse Dense Narrow WideListGraphTreeSmallLarge Sparse DenseListSERGraph0obj B0Header20000obj DDES0DVV10AVBitmap1072Address 100010obj C204800Header00Header00VV30003040A Reference field (A’s address)V Non-Reference field (Value)4000(a)LLC miss rate (%)0Header1004040Value arrayClassmeta mark ID75VVVVHObj A50(b)serializedformat250Narrow WideTreeSmallLarge Sparse Dense Narrow WideListGraphTreeSmallLarge Sparse DenseListSERGraphDES(b)VVHVVObj CObj BReference arrayLayout bitmap72 120160 0200 000010001 000100 00000 00010A’s relative addressD’s bitmapD’s relative addressC’s bitmapC’s relative addressB’s bitmapB’s relative addressA’s bitmapVObject graph size10Bandwidth (GB/s)HObj DNew object A’s address 80005(c)deserializedformat0Narrow WideTreeSmallLarge Sparse Dense Narrow WideListSER10GraphTreeSmallLarge Sparse DenseListGraphDES(c)42.944.842.944.8112.4H8000 72VVV V8000 120HVVHVV8000 160HV8000 0Fig. 4. Illustration of Cereal’s baseline serialization format: (a) originalmemory layout (b) serialized format (c) deserialized format39.0reduce the size of the serialized stream. Figure 4(c) shows howthe four objects are reconstructed at the base address 80004from the serialized stream.20A notable difference of Cereal from Skyway is the wayNarrow Wide Small Large Sparse Dense Narrow Wide Small Large Sparse DenseTreeListGraphTreeListGraphto store references and values and the aforementioned objectSERDESpacking scheme. During deserialization, Skyway needs to adjust(d)the references sequentially using relative addresses. However,Fig. 3. S/D process analysis with microbenchmark (a) IPC (b) LLC miss ratebecause Cereal stores values and references separately, it can(c) bandwidth (d) speeduphandle value copying and reference adjustment in parallel.Since the layout bitmap contains the location of each reference,inefficiencies like size reduction and reducing the use ofthe operation on the value and the reference can be performedexpensive reflective functions, but the performance gain isindependently. This enables Cereal to exploit a greater demarginal (Figure 3(d)). Moreover, many big data applicationsgree of parallelism, hence achieving higher throughput usingare compute-intensive [41], [44], [54] for user computationspecialized hardware.to contend with S/D operations for CPU cycles. Thus, aspecialized architecture designed to extract parallelism within Space Overhead Analysis of the Baseline Format. Thethe S/D process can substantially improve the S/D performance overhead of the layout bitmap itself is proportional to theobject size. Since one bit of the layout bitmap is responsibleand reduce the CPU load.for 8B, the size of the layout bitmap is 1.56% of the objectsize. Also, when the layout bitmap is stored as a byte stream, itIV. S ERIALIZATION F ORMAT FOR C EREALis necessary to tell the boundary between two adjacent objectsA. Serialization Stream Layoutin the layout bitmap. The easiest way to distinguish the layoutSerilization Format. Figure 4 presents the baseline serializa- bitmap boundaries would be to store the layout bitmap length.tion format of Cereal (before object packing) with an illustrative However, since the space overhead of layout bitmap lengthexample. Figure 4(a) shows an object graph with four objects is 8B per each object, if the object size is small, the layoutwith objA as the root object. It also shows how the layout bitmap length can be much larger than the data size stored.bitmap marks the location of each reference field. Since all Alternatively, a layout bitmap could be stored with paddingfields in a JVM object are 8B aligned, one bit of the layout using fixed bucket sizes (4B, 8B, and so on), but, depending onbitmap corresponds to an 8B in the heap. If the object contains the range of layout bitmap values, the space overhead due toa reference, the bit at that location is set to 1, and the size of padding can be significant. For references, the value (relativethe object can be obtained by multiplying the layout bitmap address) is the difference between the top-level object addresssize in bits by 8B. Figure 4(b) illustrates the serialized format and the referenced object address in the deserialized stream.of Cereal composed of three structures: value array, reference Typically, it takes much fewer bits than 8B (long type) toarray, and layout bitmaps (including object graph size). These represent the relative address. Finally, there is a 4B overheadstructures are fed into the object packer (Section IV-B) to representing the sum of object sizes (before serialization) thatSpeedup86

(a)Object packingexample(referencearray)A2 Packing1 Extracting significant bits& Adding End bitLong type (8B)100 0001B 72 0 01001000CerealPacked value201000000100100011001000111110001D 160 0 0101000001010000011111000110100000 10000000End bitBucket (1B) paddingHostCPUDU poolDU 0DU 1 Request SchedulerClassIDTable(SRAM)ClassVVVVHObj A(b)OptimizedserializedformatSU poolSU 0sID0SU 1 sID1MAIValue arraymeta mark IDKlassPointerTable(CAM)GlobalsID C 120 0 01111000padding significant bitsCMD QueueVVHVObj CObj BVHVI/O devicesObj DMemory ControllerPacked reference array01000000 10010001 11110001 10100000 100000001110DRAM1End mapPacked layout bitmapFig. 6. Cereal architecture overview00001000 11000000 00010010 00000100 0001010001111(MAI), which performs basic functionalities such as requestingcoalescing (as in conventional MSHRs).Software Interface. Cereal shares the similar serialization,deserialization interfaces with other popular serializers suchare included in this serialized format. Using the serializationas Kryo [34] and Skyway [41], which essentially makesformat in this section as is can result in large space overheadreplacing a conventional serializer/deserializer to a Cerealthan the existing schemes. To compensate for this overhead,serializer/deserializer a trivial work. Initialize is simplywe propose an object packing scheme to compress both thecalled at the beginning of the application to secure a cerlayout bitmap and references in the following section.tain amount of memory region for Cereal’s S/D process.RegisterClass(Class Type) registers the specified class soB. Object Packing Schemethat the class can be serialized and deserialized with Cereal.Cereal uses an optimized packing scheme to eliminate This function should be called once for every type that needsthe redundant metadata (layout bitmap length and excessive to be serialized or deserialized. Note that this function ispadding) and reduce the overall size of serialized bytes. Figure 5 effectively identical to the Kryo serializer’s function havingshows the optimized serialization format of Cereal with object the same name. To serialize an object obj, the user needs topacking. Figure 5(a) illustrates a step-by-step process of use WriteObject(ObjectOutputStream oos, Object obj).packing four references. In Step 1, the object packer extracts Here, the ObjectOutputStream oos is often connected to theonly significant bits from the reference bits (i.e., dropping FileStream for the output file. To deserialize an object, the userleading zeros) and add an end bit (1 in bold). In Step 2, it should use ReadObject(ObjectInputStream ois) which reputs the bit string (with optional padding zeros) to 1B buckets turns the deserialized object. Here, ObjectInputStreamto make it byte-aligned. Using the end map to indicate the ois provides the raw byte sequences from the associatedboundaries of the packed references (shown in Figure 5(b)) FileStream.occupies much less space than storing the layout bitmap length Memory Access Interface (MAI). MAI is a memory accessfor each object or using a static bucket size. When deserializing interface for our accelerator. The main role of the MAI is tothe serialized stream Cereal can process multiple values and correctly return the response to requests from various modulesreferences by inspecting the end map. Thus, we apply this of Cereal. The MAI contains an associative memory with 64object packing scheme to both the layout bitmap and references. entries. Each entry has the memory address for this outstandingIn contrast, the value array (with headers and values) is stored request, the list of modules that it should send the responsesas in the baseline format (Section IV-A) without applying this to, and the associated metadata related to this memory request.scheme.In addition, the MAI also contains multiple reorder bufferwhich reorders the unordered memory responses so that theV. C EREAL A RCHITECTURErequester can receive these responses in order of the originalrequests. Finally, the MAI also supports an atomic read-modifyA. Overviewwrite (within an accelerator) so that a module within thisFigure 6 shows an overview of Cereal architecture. The host accelerator can safely update a value without a race. Forissues a serialization or deserialization request through the this atomic operation, the MAI contains another associativesimple software interface (explained in the paragraph below). memory structure, which buffers outstanding read-modify-writeThis request is then buffered in Cereal’s command queue. The operations.request scheduler inspects the request at the head of the queueand finds the available serialization unit (SU) or deserialization B. Serialization Unit (SU)unit (DU). If found, it forwards a request to the corresponding Serialization Flow. Figure 7 shows the block diagram of ourunit, which then starts the execution. Our Cereal interacts with serialization unit (SU). At a high level, this unit consists ofthe memory system directly without going through the cache four components: header manager, object metadata manager,hierarchy. Instead, Cereal has its own memory access interface object handler, and reference array writer. Each of theseFig. 5. The optimized serialized format in Cereal (a) object packing examplefor the reference array (b) the optimized serialized format

6 Object addressvisited or not. If it is not, it means that this object needs tobe serialized. In this case, the loaded header is passed to theInputobject2 Object/klassobject metadata manager, which will then utilize this headeraddressaddress4 MetadataObject metadataObjectHeadermanagerhandlermanagerto fetch various object metadata. Another job of the header3 Metadata1 Headermanager is to update the object header. First, it needs to mark2 Relative addressthat this object is visited if this is the first time to trace thisReferencearray writerobject. Second, the header manager needs to record the relative4 Layout5 Object6 Value2 Update3 Referencebitmaparrayarrayheaderaddress of the current object to the header (see Figure 4) . ThisMemory controllervalue equals the total serialized object size so far and indicatesthe relative address of the currently serialized objects in theFig. 7. The data flow of Serialization Unitdeserialized format. When the object metadata manager returnscomponents performs a different functionality. Specifically, the the object size information, the header updates a counter thatheader manager is responsible for inspecting object headers and tracks this value. The last responsibility of the header managerupdating the headers. The object metadata manager receives the is to pass the relative addresses of the current object (i.e., theobject header from the header manager, fetches all the necessary relative address in the deserialized format) to the reference arrayobject metadata from memory, and then generates a packed writer. As explained, if the object was not visited previously,layout bitmap in the Cereal serialization format (Figure 4 and this address is equal to the total serialized objects size. In case5). The object handler receives the object layout information the object was previously visited, the relative address is alreadyfrom the object metadata manager and fetches the object from recorded in the header and thus can be extracted from it. Notememory. Using this layout information, the object handler that the header manager cannot process another object until itcan distinguish the references and values within this object. receives the object size from the object metadata manager andThen, it sends the references (i.e., addresses of the referenced update its counter.objects) to the header manager, and updates the value array in Object Metadata Manager. Object metadata manager rethe memory. Lastly, the reference array writer receives object trieves the object header from the header manager and fetchesrelative addresses from the header manager, performs packing relevant object metadata (e.g., object layout, object size) from(Section IV-B), and outputs the packed reference array.memory. Then, it passes the object layout fetched from memory4 MetadataSerialization starts when the object enters the header manager.1 Every time an object in the object graph is visited, the headermanager needs to check its header to determine if it is visited.2 If the object has not been visited, the header managersets the object’s relative address as the sum of the alreadyserialized object sizes, sends it to the refe

A Specialized Architecture for Object Serialization with Applications to Big Data Analytics Jaeyoung Jangy Sung Jun Jung zSunmin Jeong Jun Heo Hoon Shin zTae Jun Ham Jae W. Lee ySungkyunkwan University zSeoul National University jaey86@skku.edu {miguel92,