Introduction - National Archives

 National Archives Catalog (NAC)1673225-390524andDescription and Authority Services (DAS)Software Design Document (SDD)For the DAS and NAC ModernizationFinal Version Version 2.0November 15, 2017Prepared for:National Archives and Records Administration (NARA)Catalog Task NAMA-16-G-005, DME 16 Task 005 Prepared by:1760 Old Meadow RoadMcLean, VA 22102-38099375920This document contains proprietary information provided by PPC – a DSA company.Handle in accordance with proprietary and confidential restrictions.-38099375920Acknowledgements:PPC would like to acknowledge the significant technical contributions made by the following staff:Dr. Urmi MajumderJohn HensonEdwin PiedmontMatthew MarianoFawad ShaikhAurora Rischak Table of Contents TOC \h \u \z 1 PAGEREF _30j0zll \h Introduction11.1 PAGEREF _1fob9te \h Background11.1.1 PAGEREF _3znysh7 \h Description and Authority Services11.1.2 PAGEREF _2et92p0 \h National Archives Catalog11.2 PAGEREF _tyjcwt \h Purpose of this Document21.3 PAGEREF _3dy6vkm \h Scope and Organization of this Document31.4 PAGEREF _1t3h5sf \h References41.5 PAGEREF _4d34og8 \h Assumptions52 PAGEREF _2s8eyo1 \h DAS REST API Framework Design62.1 PAGEREF _17dp8vu \h Overview62.2 PAGEREF _3rdcrjn \h Resource Overview62.3 PAGEREF _26in1rg \h Collection Resource72.3.1 PAGEREF _35nkun2 \h Response Format for Collection Resources82.4 PAGEREF _1ksv4uv \h Single Resource102.4.1 PAGEREF _44sinio \h Read Operation102.4.2 PAGEREF _2jxsxqh \h Update Operation112.4.3 PAGEREF _z337ya \h Delete Operation112.4.4 PAGEREF _3j2qqm3 \h Create Operation112.5 PAGEREF _1y810tw \h Sub-Resource Collection122.6 PAGEREF _4i7ojhp \h Modeling Ingestion Operations132.6.1 PAGEREF _2xcytpi \h Description132.6.2 PAGEREF _1ci93xb \h Authority162.7 PAGEREF _3whwml4 \h Modeling Search Operations162.7.1 PAGEREF _2bn6wsx \h Description162.7.2 PAGEREF _3as4poj \h Authority222.8 PAGEREF _49x2ik5 \h Modeling Workflow Operations262.8.1 PAGEREF _2p2csry \h Description262.8.2 PAGEREF _147n2zr \h Authority262.9 PAGEREF _3o7alnk \h Modeling Reporting Operations272.9.1 PAGEREF _23ckvvd \h Description272.9.2 PAGEREF _ihv636 \h Authority272.10 PAGEREF _32hioqz \h Modeling Export Operations272.10.1 PAGEREF _1hmsyys \h Description272.10.2 PAGEREF _41mghml \h Authority272.11 PAGEREF _2grqrue \h Modeling User Operations272.12 PAGEREF _vx1227 \h REST Controller Design282.12.1 PAGEREF _3fwokq0 \h Description Controller Design292.12.2 PAGEREF _1v1yuxt \h Ingest Controller Design332.12.3 PAGEREF _4f1mdlm \h Search Controller Design343 PAGEREF _2u6wntf \h Description Ingest Module Design363.1 PAGEREF _19c6y18 \h Overview363.2 PAGEREF _nmf14n \h Detailed Design383.2.1 PAGEREF _37m2jsg \h Database Support383.2.2 PAGEREF _1mrcu09 \h Software Components393.2.3 PAGEREF _46r0co2 \h Logical Flow403.2.4 PAGEREF _2lwamvv \h Exception Handling413.3 PAGEREF _3l18frh \h Ingest Services Low-Level Design423.3.1 PAGEREF _206ipza \h Chunking Service Design423.3.2 PAGEREF _4k668n3 \h Processing Service Design433.3.3 PAGEREF _2zbgiuw \h Job Management Service Design434 PAGEREF _1egqt2p \h Search Engine Design444.1 PAGEREF _3ygebqi \h Overview444.2 PAGEREF _2dlolyb \h Search Index Design454.3 PAGEREF _2r0uhxc \h Search Schema Design484.3.1 PAGEREF _1664s55 \h Approved Description Index484.3.2 PAGEREF _3q5sasy \h NAC Annotations Index514.4 PAGEREF _25b2l0r \h Query Builder Design544.5 PAGEREF _kgcv8k \h Data Migration564.5.1 PAGEREF _1jlao46 \h Extract, Transform, and Load Design574.6 PAGEREF _43ky6rz \h Search Access Layer Design575 PAGEREF _2iq8gzs \h Database Design625.1 PAGEREF _xvir7l \h Overview625.2 PAGEREF _1x0gk37 \h Conceptual Model645.2.1 PAGEREF _4h042r0 \h Authority Model645.2.2 PAGEREF _1baon6m \h Description Model665.3 PAGEREF _2afmg28 \h Physical Model685.3.1 PAGEREF _pkwqa1 \h Authority Model685.3.2 PAGEREF _2nusc19 \h Description Model725.4 PAGEREF _haapch \h Data Migration from Oracle on EC2 to Amazon PostgreSQL Aurora RDS795.4.1 PAGEREF _1gf8i83 \h ETL Design805.5 PAGEREF _2fk6b3p \h Database Access Layer Design84A. PAGEREF _upglbi \h Appendix: DAS Data Model86A.1 PAGEREF _3ep43zb \h Description Ingest Database Table86A.2 PAGEREF _4du1wux \h Description and Authority Data Dictionary88A.3 PAGEREF _184mhaj \h ETL Script for Indexing NAC Annotations in Elasticsearch99A.4 PAGEREF _3s49zyc \h ETL Script for Authority List Migration from Oracle to Aurora100List of Figures TOC \h \u \z Figure 1: Current DAS Ingestion Module Components36Figure 2: Future DAS Ingestion Module Components and their Integration38Figure 3: DAS NAC Combined Search Index45Figure 4: DAS Description Keyword and Advanced Search46Figure 5: DAS Authority Keyword and Advanced Search47Figure 6: NAC Keyword Search48Figure 7: AWS Elasticsearch Resource Domain Architecture57Figure 8: As-Is Hybrid Data Model63Figure 9: DAS To-Be Fully Normalized Data Model for Authority66Figure 10: DAS To-Be Fully Normalized Data Model for Description68Figure 11: Primary Authority Physical Data Model72Figure 12: Relationship among Description and Physical Occurrence Tables74Figure 13: Amazon RDS Aurora DB Cluster80List of Tables TOC \h \u \z Table 1: CRUD Functionality and HTTP Verb Correspondence7Table 2: authority_list Table Columns68Table 3: primary_authority Table Columns69Table 4: Child Tables73Table 5: Description Ingest Database Table86Table 6: Description and Authority Data Dictionary88IntroductionBackgroundDescription and Authority ServicesThe National Archives and Records Association (NARA) Description and Authority Services (DAS) is an archival repository built by Project Performance Corporation (PPC) that replaces NARA’s Archival Research Catalog (ARC). DAS provides data-entry capabilities to NARA’s staff, and this data is made available to the public via the National Archives Catalog (NAC) system. DAS allows NARA’s staff to describe and control archival holdings at appropriate descriptive levels for all media, as well as to create and maintain Authority records that provide access points to the Description data. DAS has about 600 users (as of September 2017) throughout the United States. As DAS is responsible for the collection of metadata related to the publicly accessible information about NARA’s holdings, this data must go through a rigorous workflow process that ensures data integrity prior to becoming publicly available. The application ensures this integrity through the implementation of business rules that must be met before Descriptions and Authorities can be published into the approved domain. Additionally, DAS supports a comprehensive role- and group-based security model and offers users sophisticated search capabilities, workflow, record editing and global operations. From an implementation perspective, DAS is a three-tier application, hosted in the Amazon Web Services (AWS) Cloud, with a .NET front-end, Java middle tier and an Oracle data store.National Archives CatalogNAC, originally called Online Public Access (OPA), is the online public portal to NARA’s records, and is key to maintaining NARA’s commitment to open government through public access to high-value records. NARA has stated “We will reach beyond the traditional role of making records available for others to discover and will make access happen by providing flexible tools and accessible resources that promote public participation.” NAC is the end-product of this goal to make all records available to the public in digital form to ensure that anyone can explore, discover, and learn from NARA holdings.The catalog currently provides access to records created or received by organizations of the executive, legislative and judicial branches of the Federal Government. Most of the records are textual, coming from DAS, but NAC also provides access to other forms of media such as microfilm, maps and charts, architectural and engineering plans, aerial photographs, motion picture reels, video recordings, sound recordings, computer data sets, still pictures and others. Besides content from DAS, NAC searches all web pages on , , , , and many others and presents those pages in the search results along with any catalog records, thus enabling researchers to work more efficiently.Apart from searching the content via their website, NAC allows the public to download this content, as well as access the same programmatically via application programming interfaces (APIs). Furthermore, NAC allows citizen archivists to transcribe some of the millions of digitized pages of records, thus improving search relevancy and accessibility to historical federal records. Additionally, researchers and the public alike can tag and comment on records, thus allowing NARA to add contributions from external sources to its own repository. The transformation of NAC into a social catalog enables ongoing contributions and scalability of crowdsourcing to all records within the catalog.Purpose of this DocumentAs of September 2017, almost 19 million Descriptions are stored in DAS as structured Extensible Markup Language (XML) and 37 million links to digitized materials, such as:American Standard Code for Information Interchange (ASCII) textImage formats, e.g. JPEG, Tagged Image file (TIF), Graphics Interchange Format (GIF), Portable Network Graphics (PNG), bitmap (BMP)Audio/visual formats, e.g., Audio Video Interleave (AVI), MOV, Moving Pictures Export Group (MPEG)-4 (MP4), and Windows Media Video (WMV)Microsoft (MS) Office formats such as Excel, PowerPoint, and WordSound formats including MP3 and Waveform Audio File Format (WAV)Adobe Portable Document Format (PDF) formatHypertext Mark-up Language (HTML) pagesNARA is looking to ingest an additional 6 billion Descriptions and 25 billion digital objects by the end of this decade. Based on PPC’s long experience supporting NARA and working with DAS and NAC systems, PPC knows that these systems cannot accommodate the significant growth anticipated in digital objects and corresponding Descriptions, and meet NARA’s strategic goal of Make Access Happen. The current system design for DAS using Oracle 12c running on single-node first-generation AWS instances as a data store cannot support the anticipated growth in digital images and corresponding descriptions. While current NAC search servers can scale horizontally to support such growth, it is dependent on a weekly export of Description and Authority data as structured XML from DAS to be ingested by its Content Processing module and indexed in its Apache Solr based search engine before the public can discover information about the new and updated NARA’s holdings through keyword, advanced searches, by provenance, and other access points controlled by the DAS Authority files. The primary business driver for this modernization effort is for PPC to build a system that can keep up with the ever-increasing volume of archival Descriptions that are needed to describe the billions of records generated by the federal government each year. To that effect, PPC will redesign DAS as a scalable application that will allow NARA to not only meet but exceed their existing production quotas for entering records into DAS, and establish tighter integration between DAS and NAC so that records are available for the public to view in NAC as soon as they are approved in DAS. PPC will design a new, horizontally-scalable data store that is shared between DAS and NAC, and a new search cluster that can service both NAC and DAS. The shared data store and search engine approach eliminates the need for the weekly DAS export and the added delays that stem from reprocessing the Descriptions and Authorities in the NAC ingestion server. Instead, the indexing of the data can happen directly in DAS, and the NAC APIs can be modified to present only a subset of DAS data to public users. The combined data store and search cluster for NAC and DAS is expected to provide substantial cost savings from the current cost of Oracle licenses, and the number and type of AWS instances that currently host Oracle (DAS) and Apache Solr (NAC), and perform NAC ingestion. Costs are also reduced by using AWS managed services for both relational database (RDBMS) and search engine (Elasticsearch) instead of individual data store and search engine for each application in a high-availability (HA) configuration. For instance, the Oracle database for DAS replicates in near real-time to a standby Oracle instance using Oracle Data Guard while the NAC search engine is set up in a cluster configuration to increase query capacity and provide failover.The team will apply their in-depth knowledge of the DAS and NAC systems to begin the modernization process, and to define a new system architecture and software design. See Section 1.2.3 of the DAS/NAC Modernization System Architecture Document: Preliminary Version for details on the different modules that currently make up DAS and NAC, Section 1.2.4 of the same document for details of the problem with the existing setup, and Section 1.3 for how PPC envisions the architecture of the modernized applications to alleviate these problems. Thus, the System Architecture Document describes the different DAS and NAC system components, and how the new system architecture will look upon completion. The Software Design Document, on the other hand, describes the new data model and the search schema that will be used to implement the new architecture. It also highlights the portions of the existing software (such as Description Bulk Processing, DAS Web Services) that need to be changed. This design effort will result in a draft re-design of DAS and NAC, and will provide actionable guidance for follow-on DAS and NAC modernization efforts to build a system that can keep up with the ever-increasing backlog of federal records that need to be described in DAS and scale reliably (without affecting performance) to support billions of archival descriptions in the system by the end of this decade. The team’s in-depth experience developing and enhancing both systems since 2010, particularly the enhancement to the architecture and improved performance of interaction between the DAS and NAC systems over the past two task orders, will be used to put forward a design for a horizontally scalable combined data store and search engine for both applications in this software design document.Scope and Organization of this DocumentThis software design document is a preliminary document that represents the current plan for the software design, as agreed upon by PPC engineers and NARA systems engineering leadership at the time of writing. Certain sections are left incomplete, and in those cases PPC has provided as much information as possible with the expectation that those topics will be further developed in later releases of this document. However, as the purpose of this document is to capture the low-level design of the database model, search schema and updated software modules, PPC’s goal was to make this design document as complete and detailed as possible given the information provided. The overall approach taken was to pull in, assemble, and harmonize design elements from the following sources:Existing design documents for DAS and NACInformation gleaned from own creation of an initial functioning prototype PPC’s understanding of how the current systems are designed and operateNARA feedback in the daily stand-ups and weekly technical meetings This document is organized into five main sections: IntroductionRepresentational State Transfer (REST) API Framework Design for DASBatch Processing Module DesignSearch Engine DesignDatabase DesignEach section contains several subsections that have been developed to cover each section in more details. For instance, the introductory section provides a functional overview of the current applications (Section 1.1) and describes the purpose (Section 1.2) and scope (Section 1.3) of this document as well as lists references (Section 1.4) and assumptions (Section 1.5). Section 2 addresses the external interface designs, including design of the primary resources and the associated end points in DAS using REST principles. Section 3 presents the design of the different software modules that make up the batch processing system. Section 4 discusses the logical partitioning of the search engine index (Section 4.1), schema (Section 4.2), the query builder design (Section 4.3), and the data migration plan from Apache Solr to Elasticsearch (Section 4.4). Finally, Section 5 provides a detailed database design, including both conceptual (Section 5.2) and physical (Section 5.3) database models for both Authorities and Descriptions, and a data migration plan for the existing Authorities and Descriptions from Oracle to PostgreSQL compatible Aurora Relational Database Service (RDS) (Section 5.5).ReferencesThe following documents were referenced in the creation of this design document:Development of the NARA Description and Authority Services, Detailed Design Document, Version 2.0NARA Catalog Architecture Design – Catalog Perspective, Version 1.6AssumptionsThis section lists the set of assumptions identified prior to and during Task Order 5 performance:DAS API Design: This document assumes that DAS Web Services will be designed using REST API principles.Search Engine Design: This document assumes AWS Elasticsearch service can be used to implement the physical search engine for both DAS and NAC.Database Design: This document assumes we can design a relational database for the DAS data store and use Amazon Aurora with PostgreSQL compatibility as the physical data store.DAS REST API Framework DesignOverviewREST, introduced and defined in 2000 by Roy Fielding, is an architectural style for designing distributed systems. It is not a standard but a set of useful constraints, such as being stateless, having a client server relationship, and a uniform interface. REST is not strictly related to Hypertext Transfer Protocol (HTTP), but it is most commonly associated with it. For the full modernization efforts, we will develop a set of REST APIs to allow both DAS user interface (UI) and external consumer applications to authenticate, query and fetch Description and Authority records, as well as create new records based on their roles. Specifically, the application tier in DAS will host a set of REST web services which, when combined, will implement the DAS System methods and expose data. The services will be designed to stand alone without reference to the Web UI. The UI client as well as the batch processing module and partner agencies will communicate with the application tier via a set of uniform HTTP methods such as GET, POST, PUT, and DELETE on resources such as Descriptions, Authorities, Users and so on using JavaScript Object Notation (JSON) primarily to represent the data objects and their attributes. These interactions will be completely stateless and no client context will be stored on the application server between requests. Instead, the API consumer will hold the session state.DAS APIs will be designed to support the many functionalities of DAS such as creating, accessing and searching Descriptions and Authorities or creating, accessing and searching user related resources such as work trays, roles and groups in DAS. They will be parameter-based and will support the GET, POST, PUT, and DELETE methods in the following manner:GET for performing searches and retrieving Description and Authority metadataPOST when creating new records (e.g. creating a File Unit Description)PUT when updating existing records (e.g. updating person name in Person Authority)DELETE when removing records (e.g. deleting an Organization Name Authority)All accesses to the API will use Hypertext Transfer Protocol Secure (HTTPS).Resource OverviewThe fundamental concept in any RESTful API is the resource. A resource is an object with a type, associated data, relationships to other resources, and a set of methods that operate on it (corresponding to the standard HTTP GET, POST, PUT and DELETE methods). Examples of resources in DAS include User, Workflow, Ingest, Search, Authority, and Description. Each resource is identified by one or more Uniform Resource Identifiers (URIs). A URI has three primary components:Version: The URI should include /V{version} with the major version {version} as a prefix. Uniform Resource Locator (URL)-based versioning will be utilized for its simplicity of use for API consumers, versus the more complex header-based approach. For instance, for the reference application that will be designed and implemented as part of this task order, all endpoint URLs will begin with /v1/.Namespace: In any URI, the first noun is considered a “namespace.” Namespaces generally reflect the end-user's perspective on how the application works. For the full modernization efforts, the following namespaces will be used:EntityIngestSearchWorkflowUser-profileReportExportGlobal-operationsThe combined URI template with version and namespace information will be /v{version}/{namespace}/. Using this convention, Description Ingest endpoint will start with /v1/ingest/.Resource: Resources can be considered either a collection resource or a single resource. They are described in detail in the following sections.Collection ResourceCollection resources are data resources that typically support Create, Read, Update, and Delete (CRUD) functionality. As such, CRUD resources should be implemented with adherence to POST/GET/PUT/DELETE HTTP verbs. CRUD functionality and HTTP verb correspondence is listed below:Table 1: CRUD Functionality and HTTP Verb CorrespondenceCRUD OperationHTTP VerbCreatePOSTReadGETUpdatePUTDeleteDELETECollection resource names should be plural nouns. This helps visually disambiguate collections from singletons. For instance, for the reference application and beyond, we will define collection resources such as: DescriptionsAuthoritiesDescriptionJobsAuthorityJobsUsersGroupsRolesAuthoritySearchesDescriptionSearchesNote that this is not a complete list.Hence, the combined URI template with version, namespace and resource name will be/v{version}/{namespace}/{resource}. Using this convention, the endpoint for retrieving all Descriptions will be /v1/entity/descriptions.Response Format for Collection ResourcesFor a GET request on a collection resource, an array of resources will be listed in the items field. Fields like total_items and total_pages will be used to provide context to paged results. The names of collection resource fields will be consistent across various resources to allow API clients to create generic handling for using the provided data across various resource collections.PagingPages of results will be referred to consistently by the query parameters page and page_size, where page_size refers to the amount of results per request, and page refers to the requested page. Additionally, responses will include total_items and total_pages whenever possible, where total_items indicates the total items in the requested collection, and total_pages is the number of pages (calculated from total_items/page_size).Sortingsort_by and sort_order will be implemented to allow for collection results to be sorted. sort_by will be a field in the individual resources, and sort_order should be asc or desc. An example request for retrieving all Descriptions in DAS is GET /v1/entity/descriptions appears below.Example response is{ "total_items": 1, "total_pages": 1, "items": [ { "item": { "dataControlGroup": { "groups": "RDEP", "groupId": "ou=NWME,ou=groups" }, "digitalObjectCount": 0, "isUnderEdit": false, "naId": 62600935, "physicalOccurrenceArray": [ { "copyStatus": { "termName": "Preservation" }, "locationArray": [ { "facility": { "termName": "Electronic Records Archives (ERA)" } } ], "mediaOccurrenceArray": [ { "containerId": "0", "generalMediaTypeArray": [ { "termName": "Artifacts" } ], "specificMediaType": { "termName": "Networked Storage" } } ], "referenceUnitArray": [ { "termName": "National Archives at College Park - Electronic Records" } ] } ], "recordHistory": { "created": { "dateTime": "2017-08-25T10:15:25", "userDisplayName": "describer1", "userDn": "uid=RDTP1,ou=people,ou=dasUI" }, }, "title": " Test Data", "Description Type": Item, } } ]}HTTP StatusIf the collection is empty (i.e., 0 items in response), instead of returning 404 Not Found, the API will return an empty array and collection metadata fields such as “total_count”: 0. Invalid query parameter values, however, will return 400 Bad Request. Otherwise, 200 OK will be utilized for a successful response.Single ResourceA single resource is typically derived from the parent collection of resources. For instance, the National Archives Identifier (NAID) will be used to identify one Description or an authority in the collection of all Descriptions and Authorities that exist in DAS today. Hence, the combined URI template with version, namespace, resource name and resource identifier will be /{version}/{namespace}/{resource}/{resource-identifier}. Using this convention, the endpoint for retrieving a Description with NAID 330 will be /v1/entity/descriptions/330.Read OperationExample request for retrieving Description with NAID 330 will be GET/v1/entity/descriptions/330/.HTTP Status: If the provided resource identifier is not found, DAS API server will return a 404 Not Found HTTP status. Otherwise, 200 OK HTTP status will be returned when the record is found.Update OperationThe URI template for updating a single resource will be PUT /v{version}/{namespace}/{resource}/{resource-identifier}. A sample PUT request for updating Description with NAID 330 will be PUT /v1/entity/descriptions/330/. System-calculated fields, such as sub-fields of record history element, need not be included, as either term names or NAIDs can be provided for all Authority and parent links.HTTP Status: Any failed request validation on a PUT request will be responded to with a?400 Bad Request HTTP status. If an update cannot happen because of some business rule, custom error code and message (in addition to the 400 HTTP status code) will be returned to the consumer, and these application-specific error codes and messages will be implemented as part of the full modernization when all the business rules for updating any resource will be implemented. After a successful update, however, PUT operations will respond with 204 No Content status, with no response body. We will not be considering partial updates of resource now.Delete OperationThe URI template for deleting a single resource will be DELETE /v{version}/{namespace}/{resource}/{resource-identifier}. Example request for deleting Description with NAID 330 will be DELETE /v1/entity/descriptions/330/.HTTP Status: DELETE is treated as idempotent, so it will always respond with a?204 No Content?HTTP status.?404 Not Found?HTTP status will not be utilized here. Instead, GET can be utilized to verify the affected resource exists prior to DELETE. Create OperationTo create a new resource, the following URI template will be used: POST /v{version}/{namespace}/{collection_resource}/.For instance, to create a Description, the consumer will send the following request:POST /v1/entity/descriptions with a request body. It will be like that of single resource update, except that the request body will not have a record history element or any identifier for the Description resource to be created. Instead, it will only specify either term names or NAIDs for Authority and parent links. Subsequently, the DAS API server will produce an identifier for the resource. Once the post has successfully completed, a new resource will be created. Hypermedia links will be included in the response body using rel: self to provide an easy way to get the URL of the newly created resource as shown below:201 Created{ "naId": 1223345, "links": [ { "href": ";, "rel": "self", "method": "GET" }, { "href": " ;, "rel": "delete", "method": "DELETE" } ] "accessRestriction": { "status": { "termName": "Unrestricted" } }, "dataControlGroup": { "groups": "RDEP", "groupId": "ou=NWME,ou=groups" }, "title": "Test Data", }, "DescriptionType": Item, }Sub-Resource CollectionWhen multiple identifiers are required to identify a given resource, all behaviors of a collection resource are generally implemented as a subordinate of another resource. With respect to DAS, only digital objects can be modeled as a sub resource of the Description resource. The URI template for listing all the sub resources for a specific resource will be GET /v{version}/{namespace}/{resource}/{resource-identifier}/{sub-resource}/, while the URI template for retrieving information about a specific sub-resource will be GET/{version}/{namespace}/{resource}/{resource-identifier}/{sub-resource}/{sub-resource-identifier}.Hence, an example request for retrieving all the objects for Description with NAID 330 will be GET /v1/entity/descriptions/330/objects/, while the request for retrieving object with object identifier 1111 for Description with NAID 330 will be GET /v1/entity/descriptions/330/objects/1111.To add a new digital object to a specific Description resource, the following request will be used: POST /v1/entity/descriptions/330/objects/, while to update a specific object in a certain Description, the following request will be used:PUT /v1/entity/descriptions/330/objects/111.Modeling Ingestion OperationsDescription Initial ingestion request with a hyperlink containing information about the input file in the request body will be as follows: POST /v1/ingest/descriptionJobs/.The input file will contain descriptions to ingest in the following format:<import xmlns="; <termName>101</termName></beginCongress><endCongress><termName>109</termName></endCongress><dataControlGroup><groupCd>LPBHO</groupCd><groupId>ou=LPBHO,ou=groups</groupId><groupName>Barack Obama Library</groupName></dataControlGroup><guid>f2790990-35b1-4eda-aa68-a973e218a569</guid><title>Full Record Group 1</title><isUnderEdit>false</isUnderEdit><recordGroupNumber>10101</recordGroupNumber><dateNote>Chinatown is a 1974 American neo-noir mystery film</dateNote><scopeAndContentNote>The film was inspired by the California Water Wars, a series of disputes over southern California water at the beginning of the 20th century</scopeAndContentNote><specialProjectArray><specialProject><termName>Nixon Tapes Portal</termName></specialProject></specialProjectArray></recordGroup><recordGroup><beginCongress> <termName>101</termName></beginCongress><endCongress><termName>109</termName></endCongress><dataControlGroup><groupCd>LPBHO</groupCd><groupId>ou=LPBHO,ou=groups</groupId><groupName>Barack Obama Library</groupName></dataControlGroup><guid>5b80dcc3-20d7-4386-b081-1c906ef9fab2</guid><title>Full Record Group 2</title><isUnderEdit>false</isUnderEdit><recordGroupNumber>10102</recordGroupNumber><dateNote>Chinatown is a 1974 American neo-noir mystery film</dateNote><scopeAndContentNote>The film was inspired by the California Water Wars, a series of disputes over southern California water at the beginning of the 20th century</scopeAndContentNote><specialProjectArray><specialProject><termName>Nixon Tapes Portal</termName></specialProject><specialProject><termName>Military Biographies Project</termName></specialProject></specialProjectArray></recordGroup><recordGroup><dataControlGroup><groupCd>LPBHO</groupCd><groupId>ou=LPBHO,ou=groups</groupId><groupName>Barack Obama Library</groupName></dataControlGroup><guid>6ee39332-55eb-4e65-9564-ae4e9b700c9e</guid><title>Full Record Group 3</title><isUnderEdit>false</isUnderEdit><recordGroupNumber>10103</recordGroupNumber><dateNote>Chinatown is a 1974 American neo-noir mystery film</dateNote><scopeAndContentNote>The film was inspired by the California Water Wars, a series of disputes over southern California water at the beginning of the 20th century</scopeAndContentNote><specialProjectArray><specialProject><termName>Nixon Tapes Portal</termName></specialProject><specialProject><termName>Military Biographies Project</termName></specialProject><specialProject><termName>Vietnam Military Operations Project</termName></specialProject></specialProjectArray></recordGroup></recordGroupArray></import>This request will create a Description Ingest Job and return with a 201 Created and a job identifier that can be used to query the status of the Job.Note that each Description body in the input file contains a Globally Unique Identifier (GUID) field. When the Description is created in DAS and a NAID is generated for the same, upon completion of the ingestion job, the NAIDs of the successfully created Descriptions will be returned against their GUIDs and the errors for the Description XML that could not be processed and/or validated will be returned against the corresponding GUIDs as well.The consumer can query status of the Description Ingest Job as follows:GET /v1/ingest/descriptionJobs/{job identifier}/. When the job is completed, the HTTP response code will be 200 OK and the response body will contain a list of NAIDs and their GUIDs from the input file for Descriptions that were successfully created, as well a list of GUIDs and corresponding errors that failed. While the job is being processed, the API server will return a HTTP response code 202 Accepted for Processing, but the processing is not complete for the same API request. The response body will contain the status of the job to indicate whether it is submitted, in chunking mode, chunking is complete or processing is in progress.AuthorityAuthority Ingestion will be implemented for the full modernization work, but it will follow Description Ingestion closely in terms of API endpoints. In other words, the initial ingestion request with a hyperlink containing information about the input file will be:POST /v1/ingest/authorityJobs/, and a job identifier will be returned in response. This job identifier can be later used to query the status of the job using the following API endpoint:GET /v1/ingest/authorityJobs/{job Identifier}/. When the job is completed, the same request will return a list of NAIDs for authorities against the corresponding GUIDs in the input ingest file that were successfully created and errors for the ones the application failed to create new records.Modeling Search OperationsIn DAS, a user can not only execute an advanced or keyword search, but also save a search and filter criteria object. A NARA describer can even edit the saved search later. Hence, a search operation needs to be modeled not only as a resource to perform CRUD operations, but also to perform search and return Description or Authority results based on the entity the search request is based on. Currently in DAS, users are only allowed to view and edit the searches they have saved. Furthermore, searches can be persisted and queried at the Data Control Group level. However, for this task order, only anonymous search requests will be considered. DescriptionThe following search operations will be modeled for Description search:CRUD operations on the Description Search resourcePerform keyword searchPerform advanced searchKeyword SearchFor keyword search we will use a POST request and use the request body to specify the type of search as well as name the keyword and filter fields as shown below:POST v1/search/descriptionSearch/{ “SearchType”: “Keyword”, "SearchKeyword": "Department of Agriculture", "FilterFields": ["naId", "title"]}This request will return with a 200 OK and the example response body:{ "total_items": 2, "total_pages": 1, "items": [ "item": { "title": “Department of Agriculture 1”, "naId": 62600935 }, "item": { "title": “Department of Agriculture 2”, "naId": 62600934 } ]}A keyword search cannot be saved. Hence, no CRUD operation can be performed on the keyword search criteria.Advanced SearchThe primary difference between keyword and advanced search is in the POST request body; the end points and the HTTP methods used for each operation are otherwise identical. POST request body for an advanced search is as follows:POST v1/search/descriptionSearch/{ “SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "title", "SearchOperator": "Contains", "SearchValue1": "Department of Justice" }, { "FieldName": "group_name", "SearchOperator": "Equals", "SearchValue1": "RDTP1" } ] }, "FilterFields": ["naId", "title”]}Furthermore, an advanced search can be saved in DAS application. To save a search request, the optional SaveSearch parameter is set as below:POST v1/search/descriptionSearch/{“SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "title", "SearchOperator": "Contains", "SearchValue1": "Department of Justice" }, { "FieldName": "group_name", "SearchOperator": "Equals", "SearchValue1": "RDTP1" } ] }, "FilterFields": ["naId", "title"], “SaveSearch”:”Y”}This request will return with a 201 Created and the example response body that includes the search criteria object identifier:{ “searchID”: 1, "total_items": 2, "total_pages": 1, "items": [ "item": { "title": “Department of Justice 1”, "naId": 62600935 }, "item": { "title": “Department of Justice 2”, "naId": 62600934 } ]}The search criteria object identifier can be later used to retrieve the saved search as below:GET v1/search/descriptionSearch/{searchID}/This would return a 200 OK and the following example response body:{“SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "title", "SearchOperator": "Contains", "SearchValue1": "Department of Justice" }, { "FieldName": "group_name", "SearchOperator": "Equals", "SearchValue1": "RDTP1" } ] }, "FilterFields": ["naId", "title"]}This search request can be modified and executed using a PUT request as below:PUT v1/search/descriptionSearch/{searchID}/{ “SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "title", "SearchOperator": "Contains", "SearchValue1": " Department of Agriculture" }, { "FieldName": "group_name", "SearchOperator": "Equals", "SearchValue1": " NWME" } ] }, "FilterFields": ["naId", "title", “dataControlGroup”]}This request will return with a 200 OK and will not update the original search criteria object and the following sample response body:{ "total_items": 2, "total_pages": 1, "items": [ "item": { "title": “Department of Agriculture 1”, "naId": 62600935, "dataControlGroup": { "groupCd": "RDEP", "groupId": "ou=NWME,ou=groups" } }, "item": { "title": “Department of Agriculture 2”, "naId": 62600934, "dataControlGroup": { "groupCd": "RDEP", "groupId": "ou=NWME,ou=groups" } } ]}If, however, the PUT request includes the saveSearch flag, the original search criteria object will be updated and the request will return with a 200 OK and the same response body as above. Any failed request validation on a PUT request, however, will be responded to with a 400 Bad Request HTTP status. The search criteria object can be deleted as well as below:DELETE v1/search/descriptionSearch/{searchID}/As DELETE is treated as an idempotent operation, it will always respond with a 204 No Content HTTP status irrespective of whether an object corresponding to the specified identifier exists or not.AuthorityThe following search operations will be modeled for Authority search:CRUD operations on the Authority Search resourcePerform keyword searchPerform advanced searchKeyword SearchThe Authority searches follow the same design pattern as the Description searches. To create and execute an authority keyword search, the following will be used:POST v1/search/authoritySearch/{ “SearchType”: “Keyword” "SearchKeyword": "Department of Agriculture", "FilterFields": ["naId", "termName", “authType”]}This request will return with a 201 Created and the example response body as follows:{ "total_items": 2, "total_pages": 1, "items": [ "item": { “authType”: “OrganizationName”, "termName": “Department of Agriculture 1”, "naId": 62600931 }, "item": { “authType”: “OrganizationName”, "termName": “Department of Agriculture 2”, "naId": 62600932 } ]}Like description keyword search, authority keyword search cannot be saved either and, hence no CRUD operations will be supported with authority keyword search.Advanced SearchLike advanced Description search, advanced authority searches use the same endpoint URLs and HTTP response codes as keyword authority searches. Hence, POST request body for an advanced authority search is as follows:POST v1/search/authoritySearch/{ “SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "termName", "SearchOperator": "Contains", "SearchValue1": "Department of Justice" }, { "FieldName": "authType", "SearchOperator": "Equals", "SearchValue1": "OrganizationName" } ] }, "FilterFields": ["naId", "termName"], “SaveSearch”: “Y”}This request will return with a 201 Created and the example response body that includes the search criteria object identifier:{ “searchID”: 2, "total_items": 2, "total_pages": 1, "items": [ "item": { “authType”: “OrganizationName”, "termName": “Department of Justice 1”, "naId": 62600931 }, "item": { “authType”: “OrganizationName”, "termName": “Department of Justice 2”, "naId": 62600932 } ]}The search criteria object identifier can be later used to retrieve the saved search as below:GET v1/search/authoritySearch/{searchID}/This would return a 200 OK and the following example response body:{ “SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "termName", "SearchOperator": "Contains", "SearchValue1": "Department of Justice" }, { "FieldName": "authType", "SearchOperator": "Equals", "SearchValue1": "OrganizationName" } ] }, "FilterFields": ["naId", "termName"]}This search request can be modified and executed using a PUT request as below:PUT v1/search/authoritySearch/{searchID}/{ “SearchType”: “Advanced”, "Search": { "BooleanOperator": "And", "Criterias": [ { "FieldName": "termName", "SearchOperator": "Contains", "SearchValue1": " Department of Agriculture" }, { "FieldName": "authType", "SearchOperator": "Equals", "SearchValue1": "OrganizationName" } ] }, "FilterFields": ["naId", "termName"], “SaveSearch”:”Y”}This request will update the original search criteria object and will return with a 200 OK HTTP status code and the following sample response body:{ "total_items": 2, "total_pages": 1, "items": [ "item": { "termName": “Department of Agriculture 1”, "naId": 62600931 }, "item": { "termName": “Department of Agriculture 2”, "naId": 62600932 } ]}The search criteria object can be deleted as well as below:DELETE v1/search/authoritySearch/{searchID}/Modeling Workflow OperationsDescriptionUnder-edit Description-related complex operations that would be designed as part of subsequent task orders are as follows:create Taskadvance Taskcreate Batchadvance Batchapproveget User Work Trayassign Taskreassign Taskassign Batchreassign BatchAuthorityAuthority proposal-related complex operations to be designed as part of the full modernization efforts:create Proposalreject Proposalresubmit Proposalapprove Authorityclose Proposal Workflowget User Work TrayModeling Reporting OperationsDescriptionDescription report-related complex operations to be designed later:Adhoc CannedAuthorityAuthority report-related complex operations to be designed later:CannedModeling Export OperationsDescriptionDescription Export-related complex operations to be designed later:XML ExportDelta ExportBulk Search Export (Keyword and Advanced)AuthorityAuthority Export-related complex operations to be designed later:XML ExportDelta ExportBulk Search Export (Keyword and Advanced)Modeling User OperationsThe modernized application will be designed to serve only authenticated API requests. For instance, if a NARA staff member wants to create a new Description in DAS, he would need to first authenticate against DAS using a POST request with a request body containing his username and the password as below:{“username”: “describer1”,“password”: “$$$$$$”}A successful login response will contain a unique string token identifying the user for subsequent requests. This token will be valid for 4 hours with the token validity reset to 4 hours after each subsequent request to the API. Requests with invalid or missing tokens will result in a HTTP 401 – Unauthorized status response from the API. The format of the Request Header key/value pair representing the token will be:Header (key): AuthorizationValue: “token <token>” where <token> represents the response of a successful Login.While the login web method may be one of the first ones to be implemented as part of the full modernization effort, all the user, group and roles resources will be designed and implemented as part of the full modernization efforts and will support CRUD operations for each resource as well as the complex operations listed below:Associate User RoleAssociate User Role in GroupDisassociate User RoleDisassociate User Role in Group Get Role UsersREST Controller Design Spring Boot is used to implement the REST APIs discussed in the previous sections. Spring Boot includes all the features of the traditional Spring Model-View-Controller (MVC) framework. Consequently, Spring annotations have been used for dependency injection and for request mappings. All the REST API endpoints use template Spring ResponseEntity type as the return value from the controller methods where the response types are simple Plain Old Java Objects (POJOs). Wherever the business logic calls other REST APIs, HTTP client access uses Spring RestTemplate. Spring Boot comes with Jackson library out of-the-box which can parse JSON request body to Java objects. For instance, for any POST request on a simple resource such as an Ingest Job or Search object, it suffices to use the @RequestBody Spring MVC annotation to unmarshall the incoming JSON string into Java object and annotate the entity member elements with @JsonProperty with corresponding JSON field names. For CRUD requests on Descriptions and Authorities, if the POST request body is in XML format, Java Architecture for XML Binding (JAXB) is used to convert the POST request body into POJOs that can be used to validate against business rules before persisting into the database. The following three controllers have been implemented in the reference application for this task order: Description Controller to handle CRUD operations and querying on DescriptionsIngest Controller to handle batch processing of DescriptionsSearch Controller to handle keyword and advanced Description searchDescription Controller DesignThe Description controller uses @RequestMapping annotation for create and get-by-naid REST endpoints. Thus, the create endpoint is annotated as below:@RequestMapping (value = "/v1/entity/description", method = RequestMethod.POST) A sample POST request to this endpoint to create a Series description will have the following body: <series><title> Full Series Description</title><dataControlGroup><groupCd>RDTP2</groupCd><groupId>ou=RDTP2,ou=groups</groupId></dataControlGroup><parentRecordGroup><naId>643</naId><recordGroupNumber/></parentRecordGroup><variantControlNumberArray><variantControlNumber><number>UD-06W 671</number><type><termName>HMS/MLR Entry Number</termName></type></variantControlNumber></variantControlNumberArray><arrangement>Unarranged.</arrangement><findingAidArray><findingAid><type><termName>Inventory</termName></type><url><termName> Inventory</note></url></findingAid></findingAidArray><inclusiveDates><inclusiveStartDate><year>1938</year></inclusiveStartDate><inclusiveEndDate><year>1999</year></inclusiveEndDate></inclusiveDates><coverageDates><coverageStartDate><year>1939</year></coverageStartDate><coverageEndDate><year>1990</year></coverageEndDate></coverageDates><beginCongress><termName>86</termName></beginCongress><partyDesignation><termName>Democrat</termName></partyDesignation><endCongress><termName>100</termName></endCongress><staffOnlyNote>ACCDESC</staffOnlyNote><scopeAndContentNote>ACCDESC</scopeAndContentNote><functionAndUse>ACCDESC</functionAndUse><localIdentifier>ACCDESC</localIdentifier><numberingNote>ACCDESC</numberingNote><transferNote>ACCDESC</transferNote><custodialHistoryNote>ACCDESC</custodialHistoryNote><scaleNote>ACCDESC</scaleNote><languageArray><language><termName>Hindi</termName></language></languageArray><specialProjectArray><specialProject><termName>Nixon Tapes Portal</termName></specialProject></specialProjectArray><accessRestriction><status><termName>Unrestricted</termName></status></accessRestriction><useRestriction><status><termName>Unrestricted</termName></status></useRestriction><generalRecordsTypeArray><generalRecordsType><termName>Textual Records</termName></generalRecordsType></generalRecordsTypeArray><digitalObjectArray><digitalObject><objectType><termName>Image (JPG)</termName></objectType><labelFlag>4392606-roll-8-California-00001.jpg</labelFlag><objectDescription>California</objectDescription><accessFilename>, the create request above receives an XML payload and unmarshalls to JAXB classes. The business layer takes the JAXB objects and validates the JSON fields. The description is then inserted in Aurora RDS. The Description’s NAID is returned with the response to the client.? The job status metrics are also updated if the Description request is coming from Ingest. Finally,?a call to AWS Elasticsearch service is made to persist the data in the proper indexes. Similarly, the get request is annotated as below:@RequestMapping (value = "/v1/entity/description/{naid}", method = RequestMethod.GET)And it simply returns the JSON representation of the Description whose NAID was part of the API request as below:{ "broadcastDate": "{}", "coverageStartDate": "", "dataCtlGp": { "groupName": "Barack Obama Library", "groupId": "ou=LPBHO,ou=groups", "groupCd": "LPBHO" }, "dateNote": "Chinatown is a 1974 American neo-noir mystery film", "underEdit": false, "descType": "RecordGroup", "title": "Full Record Group 1", "createdDate": 1510618184770, "recordStatus": false, "descTitle": "Full Record Group 1", "descNaId": 2469921}Ingest Controller DesignThe Ingest controller handles creation and status queries for ingestion jobs.?Similar to Description controller, Ingest controller uses @RequestMapping for the create and query end points as below:@RequestMapping (value = "/v1/ingest/descriptionJob", method = RequestMethod.POST)?@RequestMapping (value = "/v1/ingest/descriptionJob/{jobId}", method = RequestMethod.GET)?The create method simply creates a new record in the ingest job table, then creates a Simple Queue Service (SQS) message with the ingest file URL that was sent as part of the POST request body and enqueuers the same in the job queue and returns a job identifier and the status of the job to the API consumer as below:{ "jobId": 180951, "error": "", "status": "CREATED"}The status query method returns the full job history of the ingest job whose job_id was part of the API request as below. It also includes the status of individual SQS message corresponding to each description XML in the ingest file:{ "totalDescriptionCount": 10000, "lastUpdateDate": "2017-11-10~20:47:04", "failed": false, "jobCreatedUser": "ingestjob", "jobActionHistory": [ { "actionTime": "2017-11-10~20:45:08", "action": "chunking started" }, { "actionTime": "2017-11-10~20:47:02", "action": "chunking completed" } ], "jobId": 180950, "successCount": 10000, "messageStatuses": [ { "httpTimeout": false, "insertFailed": false, "insertTimestamp": 1510346712139, "insertSuccess": true, "messageIndex": 23, "sqsMessageId": "90ce5940-429b-422e-9eea-f644abe92226", "guid": "f2790990-35b1-4eda-aa68-a973e218a569", "naid": 2449920 }, { "httpTimeout": false, "insertFailed": false, "insertTimestamp": 1510346712142, "insertSuccess": true, "messageIndex": 32, "sqsMessageId": "b9f9ac67-fdce-4239-b5c0-2114072466c6", "guid": "f2790990-35b1-4eda-aa68-a973e218a569", "naid": 2449918 }……]}Search Controller DesignThe Search Controller uses the same design pattern as Description and Ingest Controllers to execute and/or save a Description search or look up a persisted criterion by its search Identifier. Search execution endpoint is annotated as follows:@RequestMapping (value = "/v1/search/descriptionSearch", method = RequestMethod.POST) A sample request to this endpoint may have the following POST request body:{"metaData": {"user": "urmi","saveSearch": true},"stored_fields": [ "naId"], "query": { "match": {"parentRecordGroup": 92388798} } }Essentially, for Task Order 5 prototype, we are simplifying the search request and sending the search criteria in a format that Elasticsearch engine can parse and respond to. In addition, as part of the POST request body we are sending some user related information and a flag to indicate whether to save the search or simply execute it. The search controller parses the incoming request, strips the user related metadata and sends the Elasticsearch query part of the request to the search engine. As we only store NAIDs in the search index, AWS Elasticsearch service responds with a list of NAIDs that matches the search criteria. The search controller uses the same list to make a GET request to the Description API for each NAID to get back the corresponding description XML and ultimately all these records are returned to the API consumer as a collection.Because we are saving each search request, the search criteria can be queried later. The GET endpoint for the Search Controller is annotated as below:@RequestMapping (value = "/v1/search/descriptionSearch/{id}", method = RequestMethod.GET)This request returns the persisted search criteria as well as all the user-related metadata information.Description Ingest Module DesignOverviewToday, DAS supports bulk upload of XML files to create archival Description and to attach digital objects to existing Descriptions in batch mode. This is accomplished through the DAS Ingest module, which provides a web-based user interface that enables any DAS user with the DAS Import Manager role to upload such XML files for processing. XML records can be imported directly into the Authorized domain or the Import Manager can specify that the records be imported into the Under-edit domain. Once uploaded, XML records are then processed and validated against the DAS business rules by Windows scheduler services that run in the background. Records that meet the business rules are flagged for import into the Authorized domain. Records that fail to meet business rules are flagged for import into the Under-edit domain. If a file contains records that are missing Data Control Group, Title, or Sequence number, those records are flagged as failed. However, the ingest module runs parallel to DAS UI communicating with the same data store as is indicated in Figure 1.Figure 1: Current DAS Ingestion Module Components One of the primary goals of the DAS/NAC modernization efforts is to unify the external interfaces of the DAS application. In other words, as indicated in section 2, the same set of REST web services will be used by the UI client and the batch processing module.The new Description Batch Processing module is anticipated to operate in the following manner (see Figure 2 for the full process flow):User posts to Description Ingest API endpoint with the Simple Storage Service (S3) URL of a file containing, for example, 100,000 Descriptions (Step 1 in Figure 2).The Ingestion web service processes the incoming request and creates an SQS message for this request, and enqueues the same into the Job Queue for Description Bulk Ingestion (Step 2 in Figure 2).A Chunking service (scheduler task) monitors the job queue and, once it sees the new message in this queue, dequeues and parses the message. This service downloads the file from S3, uses a XML parser to chunk the downloaded file into 100,000 SQS messages and enqueues each message into the Single Description queue meant for Description Bulk Ingestion (Steps 3, 4, 5 in Figure 2). A Processing service (scheduler task) dequeues one message at a time from the Description queue, and creates a POST request for the Description API endpoint in the application server so that a Description can be created either in the Approved or Under-edit domain based on the result of the validation in the processing service (Steps 6 and 7 in Figure 2).The Description web service saves the Description to Amazon Aurora and makes a PUT request to the AWS Elasticsearch Index API endpoint for Description search index to add the newly created Description so that it is available for search and access by end-users (Steps 8 and 9 in Figure 2).Throughout the lifecycle of an ingest job, the end-user can query the status of the job he submitted using the job identifier he received when he sent the initial request. The API server will respond with the status of his job and, if the job is completed, the response will include the list of NAIDs for the Descriptions against the GUIDs in the input ingest file that were successfully created and a list of errors for the ones that could not be created in the application.Figure 2: Future DAS Ingestion Module Components and their IntegrationDetailed DesignDatabase SupportAs the Ingest module is distinct from the primary data model in DAS, the database tables used to track the state of an Ingest job and maintain relevant information throughout its lifecycle will be created in its own schema in the Aurora instance. The tables that will be used to support Description Ingest are as follows:Ingest_job: Primary job history table to track current job status, total number of incoming Descriptions, number of Descriptions that were successfully created, number of Descriptions that failed to create and the usual user related audit information for each job identifier.Job_action_types: Reference table that contains the name of all possible states an Ingest job can be such as:SubmittedChunking in progressChunking complete Processing completeJob_action_history: This table maintains the audit history for each state of an Ingest Job.Message_status: The most crucial of all Ingest-related tables to track the status of each SQS message that is enqueued into the Single Description Queue. It tracks the message ID of the SQS message, GUID, and the title of the Description to which the message corresponds. This way, if the Description failed to be created and no NAID (which is also tracked in the same table) can be generated for this title, the end user can easily find out which Description XML from the input collection he needs to correct and send back to the ingest module as a new job. It also tracks the HTTP response code for the POST request that the Processing service makes for each SQS message it dequeues. Once processing is complete, these codes are used to calculate summary totals for successfully created Descriptions and error counts from the ingest_job table.Failure_types: Reference table for input file and chunking related errors.Validation_failure_types: Reference table for various types of validation errors that may occur in the API server while trying to validate an incoming Description request.Message_status_types: Reference table for the various states of an SQS message in the Single Description Queue.NOTE: Full table definition for the ingest schema has been included in Appendix A.1 of this document.Software ComponentsThe entire Batch Processing module is highly decoupled and can be thought of being composed of REST API Web Services, three distinct background services and two queues.The REST API service endpoints were discussed in Section 2.6.1.The two queues are:Job Queue: The API server and the chunking services interface with this queue. Single Description Queue: A first-in, first-out (FIFO) queue. Both chunking and processing services interface with this queue. It is a FIFO queue to ensure that messages associated with a specific job are mostly processed either sequentially or in parallel and there is minimum interleaving of processing individual Description across multiple ingest jobs. Because the default queue does not guarantee processing in order, there is a likelihood of throttling messages from requests that came in first.The three background services are:Chunking Service: Converts input XML file into single Description XMLs.Processing Service: Takes the single Description XML as input and creates a POST request for Description Web Services. Job Management Service: Responsible for managing the status and overall metrics for a job so that, once an ingest request is submitted, the API server can correctly respond to all GET requests on the job resource. Logical Flow The steps below detail the logical flow for ingesting a batch file containing one or more Descriptions into DAS.API receives POST request to /v1/ingest/descriptionJobs/.a. API server enqueues incoming request to Job Queue as an SQS message.b. API server creates the job status record in ingest_job table setting current_action_type_id=SUBMITTED (SUBMITTED is the ID for the record in the job_action_types with action_name=Submitted). It also creates a record in job_action_history table with the action_type_id=SUBMITTED.c. It returns HTTP 201 response code and job identifier (referred to as job_id in the steps below) to the client.Chunking Service polls job queue and sees a message.a. Chunking Service reads a message and immediately deletes it.b. Chunking Service downloads the message data from S3.c. It updates the job record with current_action_id=CHUNKING_IN_PROGESS.d. It inserts a record into the job_action_history with the same action_id.e. It creates a new message using all the data from the message and adds the Job ID, SQS message ID and other data.Chunking Service finishes processing all the messages for the job.a. It then updates total_description_count column in the job record with the count of the messages for the S3 job file. It also updates the current_job_action_id to CHUNKING_COMPLETE.b. The Chunking Service inserts a record to job_action_history with CHUNKING_COMPLETE as the action_type_id.Processing Service polls the Single Description Queue and sees a message.a. The service reads a message and immediately deletes it.b. It creates a payload for a POST request using the message data.c. It adds the SQS message ID or other unique identifier for the message to the POST request.d. The service sends the POST request to /v1/entity/descriptions/.Processing Service has sent a POST request and received a response.a. It upserts a record into the message_status table with the job_id, sqs_message_id, http_post_request_time, http_response_time, http_response_code.The API receives a POST request to /v1/entity/descriptions/.a. API validates the message.b. If validation passes, it inserts the Description into Aurora RDS and indexes the same in AWS Elasticsearch Service.c. If the unique message identifier was supplied in the POST request (true when the consumer is Ingest Processing service and not DAS Web Client), it is used to insert new record into the message_status table, setting insert_success column=true.If the API received a POST request to /v1/entity/descriptions/ and validation fails:a. If a unique message identifier was supplied in the POST request, it is used to insert new record into the message_status table, setting insert_success column=false, insert_failed=true. It also sets validation_failure_id to the appropriate id corresponding to the correct record in validation_failure_types.The Job Management Service periodically reads the job_status table and finds a record where the status=CHUNKING_COMPLETE, failed=false and (current time - job_creation_time) > scan_threshold (amount of time after the job was created)a. It reads all the message_status records for the job ID.b. It counts all the insert_success and insert_fail in the message_status table for the job.c. It updates the counts in the ingest_job table.d. If the count of success and fail equals the total_description_count, the ingest_job record is updated and the current_status_type_id is set to COMPLETED. e. If the count of success and fail equals the total_description_count, a record is inserted into job_action_history table with action_type_id=COMPLETED.f. If the (current time - job_creation_time) > dead job threshold (amount of time after job creation at which the Job Management Service considers the job to be completed), the job status record is updated with failed=true.Exception HandlingThe following only describes exception handling in the scheduler services. Validation failure is handled in the application server and, hence, briefly covered in the Logical Flow of Description Ingestion section:Chunking reads message and S3 file not found. Job status record in ingest_job table is updated with failed=true, reason_failed_id=FILE_NOT_FOUND.Chunking reads message, downloads its S3 file and XML can't be parsed. Job status record in ingest_job table is updated with failed=true, reason_failed_id=INVALID_XML_FILE.Chunking reads message and encounters issue with XML, like bad tags or bad character data. Job status record in ingest_job table is updated with failed=true, reason_failed_id=INVALID_XML_DURING_CHUNKING.Chunking successfully reads message, acquires the job id and then encounters an exception.Chunking service updates the job record in ingest_job table with failed=true, failure_type_id=CHUNKING_UNKNOWN_EXCEPTION.Processing encounters timeout during POST to API.Upserts a record into the message_status table with the job_id, sqs_message_id, http_post_request_time, http_response_time, http_response_code and http_timeout=true.Ingest Services Low-Level Design Chunking Service DesignIn the reference application for Task Order 5, Chunking Service has been implemented as a Spring application that uses Streaming API for XML (StAX) to parse ingest requests and put individual Description messages onto the single Description queue. Below is the algorithm for Chunking Service:For each new message found in the job queue, the message data is retrieved and the message is deleted. The message data is processed in its own thread. The process downloads the S3 file and saves to a temporary file. It then creates a StAX parser and passes the temporary file location to the parser. The StAX parser reads and parses the message.When a new item of type record Group or series is found, an output buffer is created to accumulate new messages. The StAX parser writes the contents of each type to the buffer. It also writes the Job ID, a unique Chunking ID, and the index of the current type within its array. This metadata is presently used for tracking. When the end tag for the current item is written, the message is put onto the queue using an asynchronous SQS client.Processing Service DesignIn the reference application for Task Order 5, Processing Service has been implemented as a spring application for processing the single Description queue and calling the DAS API to create new Descriptions. Below is the algorithm for Processing Service:On startup, it launches many worker threads to monitor and process the queue single Description queue. For each message encountered, a new concurrent process is created to process the payload as below:When processing the payload, the API endpoint calls an HTTP service class, which uses RestTemplate and returns ResponseEntity<CreateDecriptionResponse>?type. If the insertion was successful, then the CreateDecriptionResponse POJO has the insertion status. This is used to insert a new record into the Message Status table. Later, a query of the job status will use this record to return the naid against its GUID that was part of the input ingest file and its insertion status as MessageStatusesLongFormatResponse. Job Management Service DesignIn the reference application for Task Order 5, Job Management Service has been implemented as a spring application to monitor in process jobs and update job status. Below is the algorithm for the Job Management Service:Get a list of in-process jobs.For each in-process job, open the worker thread to check the job status using information from the message_status table.Search Engine DesignOverviewSearch is central to the Catalog, as it is essentially a public search interface for browsing both catalog information and online content. The current architecture of NAC is scalable, wherein scalability is achieved by splitting the content into multiple pieces called partitions or shards and resiliency is achieved through replica nodes. However, the current architecture uses proprietary libraries such as the Search Technologies Aspire Content Processing System to ingest records from DAS, and a proprietary Solr plugin library called Query Processing Language (QPL) to make the records searchable and accessible by the public. Additionally, there is significant overhead with managing the Solr cluster deployed on Amazon EC2 instances, especially with the use of Apache Zookeeper to provide centralized management for all search servers in the cluster. In the current architecture, there is one search index that stores not only full Description and Authority metadata records from DAS, but also NAC specific annotations such as tags, comments and transcriptions.Search is also one of the primary capabilities in DAS. It is essential for NARA describers to search existing content so that they can create new Descriptions and Authorities that are in some way using the existing content in the form of Authority and parent Description links. For instance, the Description service allows any user that has been provisioned within DAS to perform primary searches on approved Descriptions, under-edit Descriptions or both. The default search is a basic keyword search. An advanced search, on the other hand, includes a user-entered search clause with operators such as wildcards, Boolean operators, proximity operators, exact phrase, or nesting. Additionally, in the advanced search mode, the query-builder option allows a user to stack clauses searching for specific terms across one or more fields using Boolean operators AND, OR, or NOT. Unfortunately, in the current implementation all text searches on Description data in both Approved and Under-edit space as well as proximity and stem searches on Description and Authority data are performed by the Web Services by directly executing Structured Query Language (SQL) queries against the Oracle XML database. This does not scale very well. In fact, current DAS business owners must conduct new user training for the internal NARA staff responsible for creating content in DAS to limit the extent of keyword or text searches they perform to create new Descriptions and Authorities.One of the primary goals of the DAS/NAC modernization efforts is to unify the search engine for DAS and NAC and eliminate the need for weekly exports from DAS to be ingested into NAC. We envision achieving this goal by using separate indexes for DAS specific entities such as Description, Authorities, User Workflows and NAC specific entities such as Tags, Transcriptions and Comments. Furthermore, Description and Authority metadata will only be stored in DAS instead of being duplicated in the DAS database and NAC search engine. NAC will use the DAS Description and Authority Web Services to access records by their NAID.Search Index DesignThe combined search cluster is anticipated to contain the following separate indexes (see Figure 3):Under-edit Description Approved DescriptionUnder-edit AuthorityApproved AuthorityDAS Workflowdigital objectNAC AnnotationWeb PagesFigure 3: DAS NAC Combined Search IndexEach application will use a subset of these indexes to search and access relevant content. For instance, if the end user is performing an advanced search for just Descriptions using filters such as created date and Data Control Group and so on, only Approved Description, Under-edit Description and DAS workflow search indexes will be used. If, on the other hand, he is executing a basic Description keyword search, Approved Description, Under-edit Description, Approved Authority, Digital Object, and DAS Workflow search indexes will be used as shown in Figure 4.Figure 4: DAS Description Keyword and Advanced SearchSimilarly, a basic Authority keyword search will use only the Approved Authority, the Under-edit Authority and the DAS Workflow search indexes as shown in Figure 5.Figure 5: DAS Authority Keyword and Advanced SearchOn the contrary, any text searches from NAC will use all indexes except for the Under-edit Description, the Under-edit Authority and the DAS Workflow search indexes as shown in Figure 6.Figure 6: NAC Keyword SearchSearch Schema DesignFor this task order, only the descriptions_approved and nac_annotaions indexes will be created and used. Approved Description IndexFor this task order, description_approved will have the following fields:NAIDTitledataControlGroupparentRecordGroupaccessRestrictionLevel (to indicate the type of Description)indexTime (can be an International Organization of Standardization [ISO] standard date or epoch milliseconds)Description (the full Description record)Hence, to create the description_approved index, the following PUT request will be used:PUT <elasticSearch_URL>/descriptions_approved/{? "settings" : {??????? "index" : {??????????? "number_of_shards" : 3, ????????????"number_of_replicas" : 2 ????????}??? },? "mappings": {??? "description": {????? "_all": {??????? "enabled": true????? },????? "_source": {??????? "enabled": false????? },????? "properties": {??????? "naId": {????????? "type": "long",????????? "store": true??????? },??????? "title": {????????? "type": "text",????????? "fields": {??????????? "sort": {????????????? "type": "keyword"??????????? }????????? }??????? },??????? "dataControlGroup": {????????? "type": "keyword"??????? },??????? "parentRecordGroup": {????????? "type": "long"??????? },??????? "accessRestriction": {????????? "type": "keyword"??????? },??????? "level": {????????? "type": "keyword"??????? },??????? "description": {????????? "type": "nested"??????? },??????? "indexTime": {????????? "type": "date",????????? "format": "strict_date_optional_time||epoch_millis"??????? }????? }??? }? }}Using this schema, a Description can be indexed as follows:PUT <elasticsearch_URL>/descriptions_approved/description/<NAID>{??? "naId" : 2580659,??? "title" : "Made Up Title",??? "dataControlGroup" : “RDEP”,??? "parentRecordGroup" : 82,??? "accessRestriction" : “Unrestricted”,??? "level" : "FileUnit",??? "description" : {????? "naId": "2580659",????? "accessRestriction": {??????? "status": {????????? "naId": "10031403",????????? "termName": "Unrestricted"??????? }????? },……..????? "accessionNumberArray": {??????? "accessionNumber": {????????? "naId": "10015375",????????? "termName": "NN3-082-07-001"??????? }????? },????? "dataControlGroup": {??????? "groupCd": "RDEP",??????? "groupId": "ou=NWME,ou=groups"????? } }}Notice that the ID in the URL above matches the NAID value of the Description record.? Also, notice that the Description field takes the content of the Description in raw JSON format below the root element.? This allows searching across record types with one query, and filtering by record by using the Level field.NAC Annotations IndexFor this task order, only tags and transcriptions will be considered for indexing in the Elasticsearch engine. The following describes mapping for tags and transcriptions once the nac_annotations index is created in a similar fashion as the descriptions_approved index.Transcription Mapping:{ "nac_annotations": { "mappings": { "transcription": { "_all": { "enabled": true }, "_source": { "enabled": false }, "properties": { "naId": { "type": "long", "store": true }, "object_id": { "type": "long", "store": true, "ignore_malformed": true }, "page_num": { "type": "integer", "store": true, "ignore_malformed": true }, "timestamp": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }, "transcription": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "username": { "type": "keyword" } } } } }}Tag Mapping:{ "nac_annotations": { "mappings": { "tag": { "_all": { "enabled": true }, "_source": { "enabled": false }, "properties": { "naId": { "type": "long", "store": true }, "object_id": { "type": "long", "store": true, "ignore_malformed": true }, "tag": { "type": "keyword" }, "timestamp": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }, "username": { "type": "keyword" } } } } }}Query Builder DesignA sample search on the nested Description field will be as follows:POST <Elasticsearch_URL>/descriptions_approved/search{ "query": { "bool": { "must": [ { "nested": { "path": "description", "query": { "bool": { "must": [ { "match": { "description.accessRestriction.status.naId": "10031403" } } ] } } } } ]}}}A sample search request body to Elasticsearch engine using one of the top-level fields such as data control group is as below:{ "stored_fields": [ "naId"], "query": { "match": {"dataControlGroup": "LPBHO"} } }This request returns the following sample response from Elasticsearch:{ "took": 0, "timed_out": false, "_shards": { "total": 3, "successful": 3, "failed": 0 }, "hits": { "total": 10025, "max_score": 0.0013279235, "hits": [ { "_index": "descriptions_approved", "_type": "description", "_id": "2632805", "_score": 0.0013279235, "fields": { "naId": [ 2632805 ] } }, { "_index": "descriptions_approved", "_type": "description", "_id": "2632804", "_score": 0.0013279235, "fields": { "naId": [ 2632804 ] } }……] }}Data MigrationWe are using Amazon Elasticsearch service for the physical implementation of the NAC/DAS combined search engine. Because it is a managed service, Amazon Elasticsearch Service automatically detects and replaces failed Elasticsearch nodes, reducing the overhead associated with self-managed infrastructure that we have today with Apache Solr Cloud in NAC. Amazon Elasticsearch Service will also allow us to easily scale our cluster via a single API call or a few clicks in the console, unlike having to provision new EC2 instances for new shards and replicas. Furthermore, Amazon Elasticsearch Service has tight integration with Amazon Identity and Access Management (IAM) for security, Amazon CloudWatch for monitoring, and Amazon CloudTrail for auditing, as shown in Figure 7.Figure 7: AWS Elasticsearch Resource Domain ArchitectureExtract, Transform, and Load DesignAll record groups and their series children were exported with objects from DAS UAT, and used Python scripts to construct the JSON request body for Elasticsearch PUT request. Thus, fields such as NAID, Title, parentRecordGroup, accessRestriction, and Description Type or Level are extracted from the XML export from DAS and added to the JSON request body for Elasticsearch. Description field is populated with the subtree of the parent level element (e.g., Series, Record Group etc). To populate NAC_annotations index, all related tags and transcriptions were exported from the NAC Annotations_DB in comma separated variable (CSV) format and used standard Python JSON and CSV libraries to create the request body for Elasticsearch PUT request. See sample Extract, Transform, and Load (ETL) script in Appendix A.3.Search Access Layer DesignFor the Task Order 5 reference application, we have simplified search access layer design by accepting search request body in a format that Elasticsearch engine will understand. If the request body is malformed, an error code will be returned from AWS Elasticsearch service. The API will simply pass the error code returned from the search engine back to the consumer. If, on the other hand, the search query is valid and there is one or more Description documents that satisfy the search criteria, the corresponding NAIDs will be returned to the search controller, which will then use the Get By Key method in the Description controller to retrieve the full Description record from Amazon Aurora and return them as a collection to the API consumer. Thus a sample search query coming to search access layer for top-level field looks like as follows:"query": { "match": {"dataControlGroup": "LPBHO"} }While a sample search query for a nested field looks like as follows:"query": { "bool": { "must": [ { "nested": { "path": "description", "query": { "bool": { "must": [ { "match": { "description.endCongressNaId": 12015862 } } ] } } } } ]}}While a sample search query for a wildcard search looks like as follows: "query": { "match_all": {} }Thus, by the time, a search query arrives at the search access layer, it is already in a format that Elasticsearch engine will understand. Hence, the search access layer simply passes on the query to the search engine and the search engine returns with the matching NAIDs that in turn, are returned to the Search Controller. The Search Controller then passes the same NAIDs to the Data Access Layer to retrieve the corresponding description XML. A sample response that is eventually returned from the Search Controller to the API consumer is shown below:{ "metaData": { "inserted": true, "user": "urmi", "updated": false, "saveSearch": true }, "esResponse": { "_shards": { "total": 3, "successful": 3 }, "hits": { "hits": [ { "_index": "descriptions_approved", "_type": "description", "_id": "2725531", "_score": 8.562499E-5, "fields": {"naId": [2725531]} },…. { "_index": "descriptions_approved", "_type": "description", "_id": "2725559", "_score": 8.562499E-5, "fields": {"naId": [2725559]} } ], "total": 576453, "max_score": 8.562499E-5 }, "took": 8, "timed_out": false }, "results": [ { "broadcastDate": {}, "messageIndex": 773, "dataCtlGp": { "groupName": "Barack Obama Library", "groupId": "ou=LPBHO,ou=groups", "groupCd": "LPBHO" }, "dateNote": "Chinatown is a 1974 American neo-noir mystery film", "changeHistory": "[]", "broughtUnderEditHistory": "[]", "jobId": 181104, "underEdit": false, "recordGpNo": 10101, "descType": "RecordGroup", "title": "title:773", "chunkingId": "1510687278334-773-1d3d9cbe-27d7-490f-a7de-5dffa84f5cb7", "coverageEndDate": {}, "productionDate": {}, "descriptionAuthor": "[]", "inclusiveStartDate": {}, "releaseDate": "{}", "inclusiveEndDate": {}, "createdDate": 1510687278617, "recordStatus": false, "sqsMessageId": "8ecc47fe-aefe-405f-aff1-56b51005b6d1", "descTitle": "title:773", "ptrObjAvailabilityDate": "{}", "copyrightDate": "{}", "descNaId": 2725531 },…. { "broadcastDate": {}, "messageIndex": 553, "dataCtlGp": { "groupName": "Barack Obama Library", "groupId": "ou=LPBHO,ou=groups", "groupCd": "LPBHO" }, "dateNote": "Chinatown is a 1974 American neo-noir mystery film", "changeHistory": "[]", "broughtUnderEditHistory": "[]", "jobId": 181104, "underEdit": false, "recordGpNo": 10101, "descType": "RecordGroup", "title": "title:553", "chunkingId": "1510687275494-553-05b27387-3ff9-4b2f-b9a9-a856acf9cf58", "coverageEndDate": {}, "productionDate": {}, "descriptionAuthor": "[]", "inclusiveStartDate": {}, "releaseDate": "{}", "inclusiveEndDate": {}, "createdDate": 1510687278821, "recordStatus": false, "sqsMessageId": "3834be58-bb65-4007-8dfa-cd9e8dab2c8a", "descTitle": "title:553", "ptrObjAvailabilityDate": "{}", "copyrightDate": "{}", "descNaId": 2725559 } ]}Database DesignOverviewThe basic utilization of the DAS system pertains to querying and obtaining information to create new information. These capabilities depend on the organization of the data in the database. The current DAS database design ensures the following:Flexibility in the design to adapt to the needs of different usersControlled and standardized approach to input data and update dataSystem of validation checks to maintain data integrity and consistencyData related to both Description and Authorities is stored in the Oracle 12c XML database. Tables in the DAS database are grouped together in separate schemas depending on their functionality, such as bulk ingestion, workflow, auditing and so on. The primary schema is called das_desc schema. This schema holds all the Under-edit and Approved Authorities and Descriptions, along with the other stored procedures and indexes required to support the Description and Authority Web Services. It uses Oracle XML Database (XMLDB) and persists Description and Authority data in native XML format. Description and Authority data are stored across multiple XML Type tables with selected attributes like NAID, Description Type (Authority Type for Authorities), Title (Term Name for Authorities), Created Date, Created User Name and so on extracted as separate columns during the pre-processing stage to facilitate faster processing while serving end-user queries. Oracle XML DB has all the advantages of relational database technology plus the advantages of XML. It provides high-performance, native XML storage and retrieval technology. It has standard access methods for navigating and querying XML. Oracle 12c XML Database offers a complete and efficient content management solution that is a viable platform for meeting DAS storage and search requirements up to a certain data volume. Traditional relational databases seek to achieve a design pattern called “normalization.” This pattern breaks up database content and arranges it within the database structure in such a way that the same content is never repeated. On the other hand, XML databases store data in XML as single entities within the database. With XML databases, the concern is not so much eliminating repetition of content, but rather the emphasis is on keeping the content together. This method of storage allows for the easy implementation of some search scenarios, like proximity, and full text searches. Primary tables with such a hybrid model are listed below and their relationship is shown in Figure 8:Description: Contains metadata for approved Descriptions and partitioned based on NAID range for efficient record retrieval.?Desc_Assc: Contains Authority and parent references for approved Descriptions and partitioned based on NAID range for efficient record retrieval.Authority: Contains metadata for Approved Authorities; partitioned based on NAID range for efficient record retrieval.Figure 8: As-Is Hybrid Data ModelWith a flexible schema, it is easy to add new fields or change the structure of the XMLType column in the Description table. For instance, when Begin Congress and End Congress were promoted to the status of Authority from being text fields in the XMLType column in the Description table, we only had to update existing records in the Description table and new records in the Authority and Desc_Assc table, without having to update any table definitions. In the fully-normalized model, however, if an existing field is promoted to an Authority depending on where in the Description XML hierarchy it resides, it will certainly affect more than three tables and it would affect the table definition of one or more participating tables. For instance, if a column in Description table turn into an Authority list value, that column would now need to be dropped from Description table, thus resulting in a table definition change.However, as indicated in Section 1, with the sharp increase in the size of the database, the performance degradation, especially for full text searches and the regular maintenance overhead required to keep the read and write response times at a user-agreeable level, far surpasses the benefits of having an XML database with a flexible schema definition and its ability to perform context based searches. Hence, as part of the modernization efforts, the goal is to move to a fully relational data model with a fixed schema. Since DAS went into production with Oracle XML database in May 2014, schema changes to core Description and Authority tables have been few and far between. On the other hand, repartitioning required frequent table index changes. This was required to maintain the search ability as data grew. Conceptual ModelThe anticipated new DAS data model will be comprised of three top-level tables: Description, Primary_Authority and Authority_List.Authority ModelAuthority_List table will contain all the information pertaining to Authority List values, such as Color, Base, Access Restriction, Finding Aid URL and so on. Primary_Authority, on the other hand, will contain all the information pertaining to six primary types of Authority: Person, Organization, Geographic Place Name, Topical Subject, Program Area and Specific Records Type. Because Primary_Authority will contain all the information that relates to the 6 primary Authorities, not all columns in this table will be applicable to all. For instance, it will contain name_heading column that only applies to Person Authorities and will be null when trying to save an Organization in the same table. It will also have admin_history_note column that will be populated for Organization Authority but not for Person Authority. Figure 9 demonstrates the relationship between Primary_Authority and Authority_List tables. Each of the above-mentioned Primary Authority references a Record Source and a Reference Unit Authorities. However, no crosswalk tables are needed for Authority List types Reference Unit or Record Source, because any instance of a Primary Authority can have only one reference of record source or reference unit and can directly reference Authority_List table for those records. Similarly, each of these Authorities can be linked to one or more special project Authorities. Consequently, we need to use a crosswalk table for special project linking primary Authorities with Authority List of type special project. The conceptual model below also has a number of crosswalk tables to describe the relationship among primary Authorities. For instance, it uses:The See Also table to link a Person Authority to a collection of Person AuthoritiesThe Jurisdiction table to link an Organization to a collection of Geographic Place NameThe Organization Name Reference table to link an Organization to its Organization Name childrenThe Organizational Reference table to link a Person, Geographic Place Name, Program Area, Topical Subject or Specific Records Type to a collection of Organization AuthoritiesThe Predecessor Successor table to establish links between a specific organization name and a collection of organization names that will either serve as a predecessor or a successorThe Personal Reference table to link an Organization to a collection of Person AuthoritiesThe Program Area Reference table to link an Organization to a collection of Program Area AuthoritiesThe Broader Narrower Related Term table to link a Geographical Place Name, Program Area, Specific Records Type or a Topical Subject to a collection of Primary Authorities of the same type that will either serve as a Broader Term, Narrower Term or a Related Term. The Use For table to link preferred Geographic Place Name, Program Area, Specific Records Type or a Topical Subject Authority with a collection of its non-preferred variant. To keep the conceptual model (Figure 9) relatively uncluttered, only references, be it with the authority_list table or with itself, have been included in the primary_authority table. For instance, Variant Person Name NAID column, only populated for Person Authorities, refers to a self-join (points to another non-preferred person record in the same table). The Primary_Authority table has more than one date qualifier references to Authority List table (birthdate_qualifier_naid, deathdate_qualifier_naid, establishdate_qualifier_naid, abolishdate_qualifier_naid and so on). Reference unit NAID and record source NAID are columns in the Primary_Authority table that point to the auth_list_naid in Authority_List table. For further details regarding the design, please see the physical data model and table definition for primary_authority and authority_list in Appendix A.2.In general, it is important to note that no part of the logical data model is changing. All the hierarchical information for both Descriptions and Authorities are preserved. A Description or an Authority export using the new data model will look identical to the existing export from DAS. Since one table represents all Primary Authorities, all their combined dependencies have to exist in the data tier. The Application tier would be responsible for managing all the business rules. For instance, when saving an organization, records will be inserted only into Primary Authority, Organization Name Reference, Predecessor Successor, Personal Reference, Jurisdiction and Program Area Reference. On the other hand, when saving a Person Authority, records will be inserted into Primary Authority, See Also, Organizational Reference and Special Project. Figure 9: DAS To-Be Fully Normalized Data Model for AuthorityDescription ModelThe Description table sits at the center of the DAS Description data model. The Description table will contain all the information pertaining to all seven types of Descriptions: Collection, Record Group, Series, File Unit, Item, Item AV and Preliminary Description. Figure 10 demonstrates the relationship among the Description, Primary_Authority and Authority_List tables using several crosswalk tables. These tables contain not only the Description and the Authority references, but also some Description-specific metadata. For instance, Access Restriction crosswalk table stores Access Restriction note information, in addition to storing Description and Authority NAIDs. The Description table is the central table in this model and it not only contains attributes that are common across all seven types of Descriptions such as NAID, Title, Data Control Group, Staff-only note, Record History but also Description type specific attributes such as Record Group Number (applicable to only Record Group Descriptions), Production Date (applicable to only Item and Item AV Descriptions), Collection Identifier (applicable to only Collection Descriptions). Additionally, it stores references to Authority List types such as Begin Congress, End Congress as well as all the Date Qualifiers associated with Inclusive and Coverage Dates. With respect to the crosswalk tables shown in Figure 10, not all are applicable to all types of Description. For instance, Finding Aid applies to all types of Description and it stores all Finding Aid related information that a Description may contain. Note that Finding Aid table has three Authority List references viz. Finding Aid Type, Finding aid URL and Object Type. Series Former Parent table, on the other hand, is only relevant for Series Description. While it is not necessary to track Former Parents for any Description below the Series level, there is a business requirement to track Former Parents for Series Descriptions and, hence, Series Former Parent is used to manage the references to Former Collection and Record Group Parents. Similarly, Creating Organization and Individual Table is relevant only for Series Descriptions. On the contrary, Variant Control Number, Digital Object, Access Restriction, Use Restriction, General Records Type, Organizational and Personal Contributor, Special Project, Microform Publication, Physical Occurrence, Primary Authority Reference and Donor and Language cross walk tables are relevant only for Series, File Unit, Item and Item AV Descriptions. Some of these tables such as Primary Authority Reference and Donor as well as Organizational and Personal Contributor contain multiple types of references and have a discriminator column to indicate the type of reference. Physical Occurrence has its own set of child tables such as Media Occurrence, Holdings Measurement, Reference Unit and Location. Media Occurrence, in turn, uses crosswalk tables such as General Media Type and Other Preservation Characteristics to represent a collection of General Media Type Authority List references or Other Preservation Characteristic references. A Description may have in an instance of Media Occurrence.Note: In Figure 10, even though Authority List appears thrice to minimize crisscrossing one-to-one, many-to-one, one-to-many relationship arrows, there is only one instance of Authority List table being referenced in the Description data model.Figure 10: DAS To-Be Fully Normalized Data Model for DescriptionPhysical ModelAuthority ModelSee Table 2 for a list of the columns that will be included in the PostgreSQL implementation of the Authority_List table.Table 2: authority_list Table ColumnsColumn NameColumn Typeauth_list_naid bigintterm_name textauth_type textauth_detailsjsonbcreated_date timestampbrought_under_edit_history jsonchanged_historyjsonapproval_history jsonlast_brought_under_edit timestamp last_approved_date timestamp last_changed_date timestamp imported_date timestamp See Table 3 for a list of the columns that will be included in the PostgreSQL implementation of the Primary_Authority table.Table 3: primary_authority Table ColumnsColumn NameColumn Typeprim_auth_naidbigintname_headingtextfull_nametextNumeratortextpersonal_titletextbirth_date_qualifier_naidbigintdeath_date_qualifier_naidbigintbirth_datejsonbbiographical_notetextsource_notetext []proposer_nametextproposal_datetimestampreference_unit_naidbigintrecord_source_naidbigintnaco_submittedBooleanimport_rec_ctl_notextNotestextscope_notetextsaco_submittedBooleanlat_longrealadmin_hist_notetextestablish_datejsonbabolish_datejsonbestablish_date_qualifier_naidbigintabolish_date_qualifier_naidbigintdeath_date jsonjsonbterm_nametextauth_typetextauth_detailsjsonbcreated_datetimestampImported_datetimestampbrought_under_edit_historyjsonchanged_historyjsonapproval_historyjsonlast_brought_under_edittimestamplast_approved_datetimestamplast_changed_datetimestampAs explained in the conceptual model section, a Primary Authority can be linked to one or more Primary Authorities, and its corresponding physical model is represented in Figure 11. Figure 11: Primary Authority Physical Data ModelDescription ModelThe physical model for Description is a bit more involved than that for Authority. The primary table is called Description and the child tables with an identifying relationship with Description are listed in Table 4.Table 4: Child TablesChild Tablesaccess_restrictionorg_personal_contributorcreating_org_indvother_preservation_characteristicDigital_objectpers_org_topsub_geo_sppr_reffinding_aidphysical_occurencegeneral_media_typereference_unitgeneral_record_typeseries_former_parentholding_measurmentspecial_pjt_for_desclanguagespecific_access_restrictionlocationspecific_use_restrictionmedia_occurenceuse_restrictionmicroform_pub variant_control_numberAs the full physical model is too large to include as a graphic in this document, Figure 12 describes only the relationship between the Description table and Physical Occurrence related child tables.Figure 12: Relationship among Description and Physical Occurrence TablesA full physical model for the Description table can be found as part of the Data Dictionary in Appendix A.2. Note that not all tables are used for all types of Descriptions. For instance, a Record Group cannot have any digital object. Hence, a set of insert SQL statements for Record Group persistence will not include any inserts to the Digital_Object table. A sample select query for a Record Group Description using this physical model is described below:Select d.desc_naid, desc_type, data_ctl_gp, record_gp_no, other_title, date_note, scope_content_note, title, d.is_under_edit,begin_congress_naid, bcAL.term_name,end_congress_naid, ecAL.term_name,coverage_start_date, coverage_start_date_qualifier_naid, covSDAL.term_name,coverage_end_date, coverage_end_date_qualifier_naid, covEDAL.term_name,FA.finding_aid_Note, FA.finding_aid_source, FA.finding_aid_type_naid, fatAL.term_name, FA.finding_aid_url_naid, fauAL.term_name, FA.object_type_naid, faotAL.term_name,inclusive_start_date, inclusive_start_date_qualifier_naid, inclSDAL.term_name,inclusive_end_date, inclusive_end_date_qualifier_naid, inclEDAL.term_name,SP_desc.auth_list_naid, spAL.term_namefrom description dLEFT OUTER JOIN authoritylist bcAL ON (d.begin_congress_naid = bcAL.auth_list_naid)LEFT OUTER JOIN authoritylist ecAL ON (d.end_congress_naid = ecAL.auth_list_naid)LEFT OUTER JOIN authoritylist covSDAL ON (d.coverage_start_date_qualifier_naid = covSDAL.auth_list_naid)LEFT OUTER JOIN authoritylist covEDAL ON (d.coverage_end_date_qualifier_naid = covEDAL.auth_list_naid)LEFT OUTER JOIN finding_Aid FA ON (d.desc_naid = FA.desc_naid)LEFT OUTER JOIN authoritylist fatAL ON (FA.finding_aid_type_naid = fatAL.auth_list_naid)LEFT OUTER JOIN authoritylist fauAL ON (FA.finding_aid_url_naid = fauAL.auth_list_naid)LEFT OUTER JOIN authoritylist faotAL ON (FA.object_type_naid = faotAL.auth_list_naid)LEFT OUTER JOIN authoritylist inclSDAL ON (d.inclusive_start_date_qualifier_naid = inclSDAL.auth_list_naid)LEFT OUTER JOIN authoritylist inclEDAL ON (d.inclusive_end_date_qualifier_naid = inclEDAL.auth_list_naid)LEFT OUTER JOIN special_pjt_for_desc SP_desc ON (d.desc_naid = SP_desc.desc_naid)LEFT OUTER JOIN authoritylist spAL ON (SP_desc.auth_list_naid = spAL.auth_list_naid)A sample select query for a Series Description using this physical model is described below:Select d.desc_naid, d.desc_type, d.data_ctl_gp, d.other_title, d.date_note, d.scope_content_note, d.title, d.is_under_edit, d.staff_only_note, d.record_status, d.ptr_obj_availability_date,d.is_av, d.total_footage, d.description_author, d.arrangement, d.scale_note,d.custodial_history_note, d.xfer_note, d.numbering_note, d.local_identifier, d.general_note, d.function_user, --description specificd.parent_naid, pD.title, --description parentsfp.former_parent_naid, sfpD.title former_parent_title, sfpD.desc_type former_parent_desc_type, --former parent detailsd.party_designation_naid, pdAL.term_name, --party designationd.begin_congress_naid, bcAL.term_name, --begin congressd.end_congress_naid, ecAL.term_name, --end congressd.sound_type_naid, stAL.term_name sound_type_value, -- sound typed.edit_status_naid, esAL.term_name edit_status_value, -- edit statusd.online_resource_naid, orAL.term_name online_resource_value, -- online resourced.accession_number_naid, anAL.term_name accession_number, -- accession_numberd.record_ctr_xfer_number_naid, rctnAL.term_name record_center_trasnfer_number, -- record center transfer numberd.disposition_auth_number_naid, danAL.term_name disposition_authority_number, -- disposition authority numberd.internal_xfer_number_naid, itnAL.term_name internal_transfer_number, -- internal transfer numberd.coverage_start_date, d.coverage_start_date_qualifier_naid, covSDAL.term_name, --coverage start dated.coverage_end_date, d.coverage_end_date_qualifier_naid, covEDAL.term_name, --coverage end dateFA.finding_aid_Note, FA.finding_aid_source, FA.finding_aid_type_naid, fatAL.term_name, FA.finding_aid_url_naid, fauAL.term_name, FA.object_type_naid, faotAL.term_name, --finding aidVCN.auth_list_naid variant_control_number_type_naid, vcnAL.term_name variant_control_number_type, vcn.variant_ctl_no, vcn.variant_ctl_no_note, --variant control numberd.inclusive_start_date, d.inclusive_start_date_qualifier_naid, inclSDAL.term_name, --inclusive start dated.inclusive_end_date, d.inclusive_end_date_qualifier_naid, inclEDAL.term_name, --inclusive end datemicroPub.auth_list_naid, microPubAL.term_name microform_publication_title, microPub.microform_pub_note, --microform publicationSP_desc.auth_list_naid, spAL.term_name, --special projectdo1.*, otAL.term_name digital_object_type_value, --digital objectlAL.auth_list_naid language_naid, lAL.term_name language_value, --languagegrtAL.auth_list_naid general_record_type_naid, grtAL.term_name general_record_type_value, --general records typeurAL.auth_list_naid use_restriction_naid, urAL.term_name use_restriction_value, ur.note, surAL.auth_list_naid specific_use_restriction_naid, surAL.term_name specific_use_restriction_value, --use restrictionarAL.auth_list_naid access_restriction_naid, arAL.term_name access_restriction_value, ar.access_restriction_note, sarAL.auth_list_naid specific_access_restriction_naid, sarAL.term_name specific_access_restriction_value, scAL.auth_list_naid security_classification_naid, scAL.term_name security_classification_value, --access restrictioncontributor.personal_or_organizational_contributor, contributor.auth_list_naid contributor_type_naid, contributorTypeAL.term_name contributor_type, contributor.prim_auth_naid contributor_naid, contributorPA.term_name contributor_value, --contributor type and person/org,creatingOrgInd.creating_ind_creating_org, creatingOrgInd.auth_list_naid creator_type_naid, creatingOrgIndAL.term_name creator_type_value, creatingOrgInd.prim_auth_naid creating_org_ind_naid, creatingOrgIndPA.term_name creating_org_ind_name, --creating org and individual,paReference.auth_reference_type_donor, paReference.prim_auth_naid, paReferencePA.term_name, --primary authority reference and donorpo.auth_list_naid copystatus_naid, csAL.term_name copystatus_value, po.extent, po.physical_occurance_note, po.gpra_indicators, po.container_list,--physical occurrence attributesmo.*, smtAL.term_name specific_media_type_value, dimAL.term_name dimension_value, colorAL.term_name color_value, processAL.term_name process_value, --media occurrence attributesgmtAL.auth_list_naid general_media_type_naid, gmtAL.term_name general_media_type_value, --general media typehmAL.auth_list_naid holding_measurement_type_naid, hmAL.term_name holding_measurement_type_value, --holding measurement typerefUnitAL.auth_list_naid reference_unit_naid, refUnitAL.term_name reference_unit_value, --reference unitlocationAL.auth_list_naid location_naid, locationAL.term_name location_value, lo.note --locationfrom description dLEFT OUTER JOIN description pD ON (d.parent_naid = pD.desc_naid) --parentLEFT OUTER JOIN series_former_parent sfp ON (sfp.desc_naid = d.desc_naid) --former parentLEFT OUTER JOIN description sfpD ON (sfp.former_parent_naid = sfpD.desc_naid) --former parentLEFT OUTER JOIN authority_list pdAL ON (d.party_designation_naid = pdAL.auth_list_naid) --party designationLEFT OUTER JOIN authority_list bcAL ON (d.begin_congress_naid = bcAL.auth_list_naid) --begin congressLEFT OUTER JOIN authority_list ecAL ON (d.end_congress_naid = ecAL.auth_list_naid) --end congressLEFT OUTER JOIN authority_list stAL ON (d.sound_type_naid = stAL.auth_list_naid) --soiund typeLEFT OUTER JOIN authority_list esAL ON (d.edit_status_naid = esAL.auth_list_naid) --edit statusLEFT OUTER JOIN authority_list orAL ON (d.online_resource_naid = orAL.auth_list_naid) --online resourceLEFT OUTER JOIN authority_list anAL ON (d.accession_number_naid = anAL.auth_list_naid) --accession_numberLEFT OUTER JOIN authority_list rctnAL ON (d.record_ctr_xfer_number_naid = anAL.auth_list_naid) --record_center_xfer_numberLEFT OUTER JOIN authority_list itnAL ON (d.internal_xfer_number_naid = itnAL.auth_list_naid) --internal_xfer_numberLEFT OUTER JOIN authority_list danAL ON (d.disposition_auth_number_naid = danAL.auth_list_naid) --disposition_auth_numberLEFT OUTER JOIN authority_list covSDAL ON (d.coverage_start_date_qualifier_naid = covSDAL.auth_list_naid) --date qualifier for coverage start dateLEFT OUTER JOIN authority_list covEDAL ON (d.coverage_end_date_qualifier_naid = covEDAL.auth_list_naid) --date qualifier for coverage end dateLEFT OUTER JOIN finding_Aid FA ON (d.desc_naid = FA.desc_naid) --finding aid collection for descriptionLEFT OUTER JOIN authority_list fatAL ON (FA.finding_aid_type_naid = fatAL.auth_list_naid) --finding aid typeLEFT OUTER JOIN authority_list fauAL ON (FA.finding_aid_url_naid = fauAL.auth_list_naid) --finding aid URLLEFT OUTER JOIN authority_list faotAL ON (FA.object_type_naid = faotAL.auth_list_naid) --finding aid file typeLEFT OUTER JOIN variant_control_number vcn ON (d.desc_naid = vcn.desc_naid) --variant control number collection for descriptionLEFT OUTER JOIN authority_list vcnAL ON (vcn.auth_list_naid = vcnAL.auth_list_naid) --variant control numberLEFT OUTER JOIN authority_list inclSDAL ON (d.inclusive_start_date_qualifier_naid = inclSDAL.auth_list_naid) --date qualifier for inclusive start dateLEFT OUTER JOIN authority_list inclEDAL ON (d.inclusive_end_date_qualifier_naid = inclEDAL.auth_list_naid) --date qualifier for inclusive end dateLEFT OUTER JOIN microform_pub microPub ON (d.desc_naid = microPub.desc_naid) --microform publication collection for descriptionLEFT OUTER JOIN authority_list microPubAL ON (microPub.auth_list_naid = microPubAL.auth_list_naid) --microform publication titleLEFT OUTER JOIN special_pjt_for_desc SP_desc ON (d.desc_naid = SP_desc.desc_naid) --special project collection for descriptionLEFT OUTER JOIN authority_list spAL ON (SP_desc.auth_list_naid = spAL.auth_list_naid) --special projectLEFT OUTER JOIN digital_object do1 ON (d.desc_naid = do1.desc_naid) --digital object collection for descriptionLEFT OUTER JOIN authority_list otAL ON (do1.auth_list_naid = otAL.auth_list_naid) -- object typeLEFT OUTER JOIN language lang ON (d.desc_naid = lang.desc_naid) --language collection for descriptionLEFT OUTER JOIN authority_list lAL ON (lang.language_naid = lAL.auth_list_naid) --languageLEFT OUTER JOIN general_record_type grt ON (d.desc_naid = grt.desc_naid) --general record type collection for descriptionLEFT OUTER JOIN authority_list grtAL ON (grt.auth_list_naid = grtAL.auth_list_naid) --general record type LEFT OUTER JOIN use_restriction ur ON (d.desc_naid = ur.desc_naid) --use restriction collection for descriptionLEFT OUTER JOIN authority_list urAL ON (ur.auth_list_naid = urAL.auth_list_naid) --use restrictionLEFT OUTER JOIN specific_use_restriction sur ON (ur.use_restrictionid = sur.user_restrictionid) --specific use restriction collection for descriptionLEFT OUTER JOIN authority_list surAL ON (sur.auth_list_naid = surAL.auth_list_naid) --specific use restrictionLEFT OUTER JOIN access_restriction ar ON (d.desc_naid = ar.desc_naid) --access restriction collection for descriptionLEFT OUTER JOIN authority_list arAL ON (ar.auth_list_naid = arAL.auth_list_naid) --access restrictionLEFT OUTER JOIN specific_access_restriction sar ON (ar.access_restrictionid = sar.access_restrictionid) --specific access restriction collection for descriptionLEFT OUTER JOIN authority_list sarAL ON (sar.specific_access_restriction_naid = sarAL.auth_list_naid) --specific access restrictionLEFT OUTER JOIN authority_list scAL ON (sar.security_classification_naid = scAL.auth_list_naid) --security classificationLEFT OUTER JOIN org_personal_contributor contributor ON (d.desc_naid = contributor.desc_naid) --contributor collection for descriptionLEFT OUTER JOIN authority_list contributorTypeAL ON (contributor.auth_list_naid = contributorTypeAL.auth_list_naid) --contributor typeLEFT OUTER JOIN primary_authority contributorPA ON (contributor.prim_auth_naid = contributorPA.prim_auth_naid) --contributor (org or person naid)LEFT OUTER JOIN creating_org_indv creatingOrgInd ON (d.desc_naid = creatingOrgInd.desc_naid) --creating org or individual collection for descriptionLEFT OUTER JOIN authority_list creatingOrgIndAL ON (creatingOrgInd.auth_list_naid = creatingOrgIndAL.auth_list_naid) --creating org or individual typeLEFT OUTER JOIN primary_authority creatingOrgIndPA ON (creatingOrgInd.prim_auth_naid = creatingOrgIndPA.prim_auth_naid) --creating org or individual (org or person naid)LEFT OUTER JOIN pers_org_topsub_geo_sppr_ref paReference ON (d.desc_naid = paReference.desc_naid) --primary authority reference collection for descriptionLEFT OUTER JOIN primary_authority paReferencePA ON (paReference.prim_auth_naid = paReferencePA.prim_auth_naid) --org, top sub, geo place name or person naidLEFT OUTER JOIN physical_occurence po ON (d.desc_naid = po.desc_naid) --physical occurrence collection for descriptionLEFT OUTER JOIN authority_list csAL ON (po.auth_list_naid = csAL.auth_list_naid) --copy statusLEFT OUTER JOIN media_occurrence mo ON (po.physical_occurenceid = mo.physical_occurenceid) --media occurrence collection for descriptionLEFT OUTER JOIN authority_list smtAL ON (mo.specific_media_type_naid = smtAL.auth_list_naid) --specific media typeLEFT OUTER JOIN authority_list dimAL ON (mo.dimension_naid = dimAL.auth_list_naid) --dimensionLEFT OUTER JOIN authority_list colorAL ON (mo.color_naid = smtAL.auth_list_naid) --colorLEFT OUTER JOIN authority_list processAL ON (mo.process_naid = smtAL.auth_list_naid) --processLEFT OUTER JOIN general_media_type gmt ON (mo.media_occurrenceid = gmt.media_occurrenceid) --general media type collection for descriptionLEFT OUTER JOIN authority_list gmtAL ON (gmt.auth_list_naid = gmtAL.auth_list_naid) --general media typeLEFT OUTER JOIN holding_measurment hm ON (po.physical_occurenceid = hm.physical_occurenceid) --holding measurement collection for descriptionLEFT OUTER JOIN authority_list hmAL ON (hm.auth_list_naid = hmAL.auth_list_naid) --holding measurement typeLEFT OUTER JOIN reference_unit ru ON (po.physical_occurenceid = ru.physical_occurrenceid) --reference unit collection for descriptionLEFT OUTER JOIN authority_list refUnitAL ON (ru.auth_list_naid = refUnitAL.auth_list_naid) --reference unitLEFT OUTER JOIN location lo ON (po.physical_occurenceid = lo.physical_occurrenceid) --location collection for descriptionLEFT OUTER JOIN authority_list locationAL ON (lo.auth_list_naid = locationAL.auth_list_naid) --locationData Migration from Oracle on EC2 to Amazon PostgreSQL Aurora RDSWe envision using PostgreSQL-compatible Amazon Aurora database service as the physical data store for DAS. Like Amazon Elasticsearch service, Amazon RDS is a fully managed database service in the Amazon Cloud that is easy to setup, operate and scale. It automates common database administration tasks such as hardware provisioning, database setup, patching and backups. Amazon RDS is available on several database instance types – optimized for memory, performance or input/output (I/O) as well as six distinct database engines. This modernization effort will use PostgreSQL compatible Amazon Aurora DB engine, which combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of the open source PostgreSQL database. It is built on a fully distributed and self-healing storage system and provides enterprise-level capabilities, including database monitoring, database cloning, cross-region copying and replication, and integration with common AWS services such as Identity and Access Management, CloudWatch and so on. In other words, an Aurora instance corresponds to a DB cluster that consists of one or more instances, and a cluster volume that manages the data for those instances.?An Aurora?“cluster volume”?is a virtual database storage volume that spans multiple availability zones, with each availability zone having a copy of the cluster data as shown in Figure 13.Figure 13: Amazon RDS Aurora DB ClusterETL DesignAuthority Lists and Primary Authorities must be migrated prior to Descriptions via reference application. The goal for this task order is to only migrate Record Groups and Series Descriptions from Oracle to Aurora instance for projecting storage usage in the target database.Authority List MigrationFor creating Record Groups, the following Authorities need to be migrated:Begin CongressEnd CongressDate QualifierFinding Aid TypeObject TypeFinding Aid URLSpecial ProjectFor creating Series, the following additional Authority types need to be migrated:Internal Transfer NumberHoldings Measurement TypeGeneral Records TypeParty DesignationLocation FacilityMicroform Publication TitleSound TypeDimensionGeneral Media TypeAccess Restriction StatusVariant Control Number TypeSpecific Access Restriction TypeSpecific Media TypeLanguageCopy StatusReference UnitColourProcessUse Restriction StatusSpecific Use RestrictionDisposition Authority NumberRecords Centre Transfer NumberEdit StatusOnline ResourceAccession NumberThe above set of Authorities are Authority List values. Authority_List table in Aurora is a parent table with no dependency on either Primary Authority or Description tables and, hence, can be migrated independent of Primary_Authority and Description tables. Authority_List uses a PostgreSQL specific data type called jsonb for the auth_details column to store all the specific information about that Authority type in a key-value format. For instance, when the Authority type is Begin Congress, the auth_details column will have the following keys and sample values:{?"beginCongressValue":"114",???"startDate":{????????"day":6,??????"month":1,??????"year":2015,??????"logicalDate":"06-JAN-15"???},???"endDate":{????????"day":3,??????"month":1,??????"year":2017,??????"logicalDate":"03-JAN-17"???}}While when the Authority type is End Congress, the auth_details column will have the following keys and sample values:{?"endCongressValue":"114",???"startDate":{????????"day":6,??????"month":1,??????"year":2015,??????"logicalDate":"06-JAN-15"???},???"endDate":{????????"day":3,??????"month":1,??????"year":2017,??????"logicalDate":"03-JAN-17"???}}And when the Authority type is Accession Number, the auth_details column will have the following keys and sample values:{?"number":"114"}Oracle XQuery will be used to extract such fields from the xml_data column in the Authority table in Oracle and then use PL-JSON, an open source generic JSON object written in Procedural Language (PL)/SQL to populate the auth_details column in Authority_List table in Aurora.See sample ETL script for Date Qualifier in Appendix A.4.Record Group MigrationETL script like the ones for Authority List will be used to migrate Record Group Descriptions from Oracle to Aurora. The following columns will be populated in the Aurora Description table for Record Group Descriptions:desc_naidtitledesc_typerecord_gp_nois_under_editinclusive_start_date_qualifier_naidinclusive_end_date_qualifier_naid coverage_start_date_qualifier_naidcoverage_end_date_qualifier_naidbegin_congress_naidend_congress_naiddate_notescope_content_notestaff_note_onlycreated_dateimported_datelast_changed_datelast_approved_datelast_brought_under_edit_datecreated_userimported_userlast_changed_userlast_approved_userlast_brought_under_edit_userinclusive_start_dateinclusive_end_datecoverage_start_datecoverage_end_datedata_ctl_gpchange_historyapproval_historybrought_under_edit_historyIn addition, the following crosswalk tables will be populated to support collection metadata in Record Group Descriptions:Finding_aidSpecial_pjt_for_descSeries MigrationOnly the Series with Record Group parents and having digital objects will be migrated to Aurora. The following columns will be populated in the Aurora Description table for such Series Descriptions:desc_naidtitledesc_typeis_under_editinclusive_start_date_qualifier_naidinclusive_end_date_qualifier_naid coverage_start_date_qualifier_naidcoverage_end_date_qualifier_naidbegin_congress_naidend_congress_naidedit_status_naidsound_type_naidparty_designation_naidparent_naid is_avarrangement scale_notecustodial_history_note xfer_note numbering_note local_identifier function_user date_note scope_content_note staff_note_only created_dateimported_date last_changed_date last_approved_date last_brought_under_edit_date created_userimported_user last_changed_user last_approved_user last_brought_under_edit_userinclusive_start_date inclusive_end_date coverage_start_date coverage_end_date data_ctl_gp change_history approval_history brought_under_edit_historyIn addition, the following crosswalk tables will be populated to support collection metadata in Series Descriptions:finding_aidspecial_pjt_for_descvariant_control_numberlanguagedigital_objectuse_restrictionspecific_use_restrictiononline_resourcelifecycle_numberaccess_restrictionspecific_access_restrictionphysical_occurrenceholding_measurmentreference_unitlocationmedia_occurrenceDatabase Access Layer Design Spring provides a simplification in handling database access with the Spring JDBC template. The Spring JDBC Template has the following advantages compared with standard JDBC.The Spring JDBC template allows to clean-up the resources automatically, e.g., release the database connections.The Spring JDBC template converts the standard JDBC SQLExceptions into RuntimeExceptions. This allows the programmer to react more flexibly to the errors. The Spring JDBC template converts also the vendor specific error messages into more understandable error messages.The Spring JDBC template offers several ways to query the database. For instance, queryForList() returns a list of HashMaps where the hash key is the name of a table column and the value is the actual column dataSpring JDBC provides another very useful helper class called RowMapper which allows to translate the SQL result direct into a list of objects (RowMapper).For the reference application, we have defined entity classes that closely resemble our data model. For instance, we have a Description entity class that has a property for each column in Description table in the database. Similarly, we have entity classes for FindingAid and SpecialProjectForDescription corresponding to findingAid and specialProjForDesc tables in the database. We have defined a Data Access Object (DAO) for each entity class. The DAO pattern provides an abstraction layer between the business object and the data source. In other words, business objects access data sources via data access objects. DAS business objects use the interface exposed by the DAO for its clients. Because the interface exposed by the DAO to clients does not change when the underlying data source implementation changes, this pattern allows the DAO to adapt to different storage schemes without affecting its clients or business components. For now, each DAO in our reference application contains a create method and a retrieval by primary key method. Finally, we have implemented the mapRow method in RowMapper interface for each entity class. This interface allows us to map a row of the relations with the instance of user-defined class. It iterates JDBC ResultSet internally and adds it into the collection. Consequently, we need not write a lot of boiler-plate code to fetch the records individually.Appendix: DAS Data ModelDescription Ingest Database TableDescription ingest database table details are listed in Table 5.Table 5: Description Ingest Database TableTableColumn NameTypeCommentIngest_jobjob_idintegertotal_description_countintegerSet by Chunking ServicecompletedboolSet by JobProcessManager Servicesuccess_countintegerSet by JobProcessManager Servicefailure_countintegerSet by JobProcessManager ServicefailedboolSet by JobProcessManager to true when current time minus job_creation_time > job obsolete thresholdfailure_type_idintegerfk to failure_typeschunking_error_idtextThe GUID generated and logged when an error occurs in Chunkinglast_chunked_guidUIUDThe GUID of the last message chunked. This is set only if an XML error occurs while chunking messages.job_creation_timetimestampjob_created_usertextcurrent_action_type_idintegerCorresponds to action_type_id in the job_actions tablelast_update_datetimestampjob_action_typesaction_type_idintegerunique idaction_typetextOne of submitted, chunking in progress, chunking completed, processing completedjob_action_historyidbigserialjob_idintegeraction_type_idintegerForeign key to job_action_types action_timetimestampfailure_typesfailure_type_idintegerUnique IDfailure_typetextOne of s3 file not found, invalid XML file for job, invalid XML while chunking, unknown exception in chunkingValidation_failure_typesfailure_type_idintegerUnique IDfailure_typetextTBDMessage_status_typesMessage_status_type_idintegerUnique IDMessage_status_typetextTBDmessage_statussqs_message_idGUIDtitletextjob_idintegermessage_status_type_idintegerForeign key to message_status_types insert_successboolSet by API to true on successinsert_failedboolSet by API to true if insert failsvalidation_failure_idintegerForeign key to validation_failure_types. This is set by API.http_post_request_timetimestampThe time the POST request was sent to API. This is set by Processing Service.http_response_timetimestampThe time the API responded to the insertPOST request. This is set by the Processing Service.http_response_codeThe HTTP response code is received by Processing Service, which sets it here.http_timeoutboolSet to true by Processing Service if HTTP timeout occursDescription and Authority Data DictionaryClick on the following embedded image to see the fully-relational DAS data model: All supporting table definition for Authorities and Descriptions for the full physical model has been listed in the following table:Table 6: Description and Authority Data DictionaryTable NameColumn NameColumn TypePrecisionaccess_restrictionaccess_restrictionidbigserial19access_restrictiondesc_naidint819access_restrictionauth_list_naidint819access_restrictionaccess_restriction_notetext2147483647authoritylistauth_list_naidint819authoritylistterm_nametext2147483647authoritylistauth_detailsjsonb2147483647authoritylistauth_typetext2147483647authoritylistcreated_datetimestamp29authoritylistimported_datetimestamp29authoritylistlast_changed_datetimestamp29authoritylistlast_approved_datetimestamp29authoritylistlast_brought_under_edittimestamptz35authoritylistapproval_historyjson2147483647authoritylistchanged_historyjson2147483647authoritylistbrought_under_edit_historyjson2147483647authoritylistis_under_editbool1broader_narrower_related_termbroader_narrower_related_termidbigserial19broader_narrower_related_termprim_auth_naidint819broader_narrower_related_termbnr_term_naidint819broader_narrower_related_termterm_typetext2147483647creating_org_indvcreating_org_indvidbigserial19creating_org_indvdesc_naidint819creating_org_indvprim_auth_naidint819creating_org_indvauth_list_naidint819creating_org_indvcreating_ind_creating_orgbool1descriptiondesc_naidint819descriptiondesc_typetext2147483647descriptiondata_ctl_gpjsonb2147483647descriptionrecord_gp_notext2147483647descriptioncollection_identifiertext2147483647descriptionparent_naidint819descriptionother_titletext2147483647descriptioninclusive_start_date_qualifier_naidint819descriptioninclusive_end_date_qualifier_naidint819descriptiontitletext2147483647descriptioninclusive_start_datejsonb2147483647descriptioncoverage_start_date_qualifier_naidint819descriptioncoverage_end_date_qualifier_naidint819descriptioninclusive_end_datejsonb2147483647descriptioncoverage_start_datejsonb2147483647descriptioncoverage_end_datejsonb2147483647descriptiondate_notetext2147483647descriptionscope_content_notetext2147483647descriptionstaff_only_notetext2147483647descriptionrecord_statusbool1descriptionparty_designation_naidint819descriptiondescription_authorjsonb2147483647descriptionbegin_congress_naidint819descriptionend_congress_naidint819descriptionarrangementtext2147483647descriptionfunction_usertext2147483647descriptiongeneral_notetext2147483647descriptionlocal_identifiertext2147483647descriptionnumbering_notetext2147483647descriptionaccession_number_naidint819descriptionrecord_ctr_xfer_number_naidint819descriptiondisposition_auth_number_naidint819descriptioninternal_xfer_number_naidint819descriptionxfer_notetext2147483647descriptioncustodial_history_notetext2147483647descriptiononline_resource_naidint819descriptionscale_notetext2147483647descriptionedit_status_naidint819descriptionsound_type_naidint819descriptiontotal_footagefloat48descriptionis_avbool1descriptionptr_obj_availability_datejsonb2147483647descriptionproduction_datejsonb2147483647descriptioncopyright_datejsonb2147483647descriptionsubtitletext2147483647descriptionrelease_datejsonb2147483647descriptionbroadcast_datejsonb2147483647descriptionshortlisttext2147483647descriptionbroadcast_date_qualifier_naidint819descriptionrelease_date_qualifier_naidint819descriptioncopyright_date_qualifier_naidint819descriptionproduction_date_qualifier_naidint819descriptionptr_obj_availability_date_qualifier_naidint819descriptionis_under_editbool1descriptioncreated_date_timestamp29descriptionimported_datetimestamp29descriptionlast_changed_datetimestamp29descriptionlast_approved_datetimestamp29descriptionlast_brought_under_edit_datetimestamp29descriptionapproval_history_json2147483647descriptionchange_history_json2147483647descriptionbrought_under_edit_history_json2147483647descriptioncreated_usertext2147483647descriptionimported_usertext2147483647descriptionlast_changed_usertext2147483647descriptionlast_approved_usertext2147483647descriptionlast_brought_under_edit_usertext2147483647digital_objectdigital_objectidbigserial19digital_objectdesc_naidint819digital_objectauth_list_naidint819digital_objectobject_descriptiontext2147483647digital_objectobject_designatortext2147483647digital_objectlabel_flagtext2147483647digital_objectlocate_bytext2147483647digital_objectaccess_filenametext2147483647digital_objectaccess_file_sizetext2147483647digital_objectthumbnail_filenametext2147483647digital_objectthumbnail_file_sizenumeric10digital_objectprojectIDtext2147483647digital_objectimporteddate13digital_objectstatustext2147483647digital_objectdisplaytext2147483647digital_objectin_databasetext2147483647digital_objectoriginal_processtext2147483647digital_objectscanning_colortext2147483647digital_objectoriginal_widthnumeric10digital_objectscanning_dimensionsnumeric10digital_objectserver_nametext2147483647digital_objectversionnumeric10digital_objectscanning_processtext2147483647digital_objectscanning_mediumtext2147483647digital_objectscanning_medium_categorytext2147483647digital_objectoriginal_orientationtext2147483647digital_objectmasterfile_sizenumeric10digital_objectmaster_media_backuptext2147483647digital_objectmaster_media_primarytext2147483647digital_objectdigital_object_translationtext2147483647digital_objectmaster_derivation_file_mediatext2147483647digital_objectmaster_filenametext2147483647digital_objectdigital_object_transcripttext2147483647digital_objectoriginal_dimensionnumeric10digital_objectoriginal_heightnumeric10digital_objectoriginal_mediumtext2147483647digital_objectbatch_numbernumeric10digital_objectbatch_datedate13digital_objectoriginal_colortext2147483647finding_aidfinding_aididbigserial19finding_aiddesc_naidint819finding_aidfinding_aid_type_naidint819finding_aidfinding_aid_notetext2147483647finding_aidfinding_aid_sourcetext2147483647finding_aidfinding_aid_url_naidint819finding_aidobject_type_naidint819general_media_typegeneral_media_typeidbigserial19general_media_typeauth_list_naidint819general_media_typemedia_occurrenceidint819general_record_typegeneral_record_typeidbigserial19general_record_typedesc_naidint819general_record_typeauth_list_naidint819holding_measurmentholding_measurmentidbigserial19holding_measurmentphysical_occurenceidint819holding_measurmentauth_list_naidint819holding_measurmentholding_measurment_counttext2147483647jurisdictionjurisdictionidbigserial19jurisdictionprim_auth_naidint819jurisdictiongeo_place_name_naidint819languagelanguageidbigserial19languagedesc_naidint819languagelanguage_naidint819locationlocationidbigserial19locationauth_list_naidint819locationnotetext2147483647locationphysical_occurrenceidint819media_occurrencemedia_occurrenceidbigserial19media_occurrencephysical_occurenceidint819media_occurrencespecific_media_type_naidint819media_occurrencecontaineridtext2147483647media_occurrencemedia_occurence_notetext2147483647media_occurrencephysical_restriction_notetext2147483647media_occurrencetechnical_access_requirement_notetext2147483647media_occurrencepiece_countnumeric10media_occurrencereproduction_countnumeric10media_occurrencedimension_naidint819media_occurrenceheightnumeric10media_occurrencewidthnumeric10media_occurrencedepthnumeric10media_occurrencecolor_naidint819media_occurrenceprocess_naidint819media_occurrencebase_naidint819media_occurrenceemulsion_naidint819microform_pubmicroform_pubidbigserial19microform_pubdesc_naidint819microform_pubauth_list_naidint819microform_pubmicroform_pub_notetext2147483647org_name_reforg_name_refidbigserial19org_name_refprim_auth_naidint819org_name_reforg_name_naidint819org_personal_contributororg_personal_contributoridbigserial19org_personal_contributordesc_naidint819org_personal_contributorprim_auth_naidint819org_personal_contributorauth_list_naidint819org_personal_contributorpersonal_or_organizational_contributorbool1organizational_referenceorganizational_referenceidbigserial19organizational_referenceprim_auth_naidint819organizational_referenceorgref_naidint819other_preservation_characteristicother_preservation_characteristicidbigserial19other_preservation_characteristicauth_list_naidint819other_preservation_characteristicmedia_occurrenceidint819pers_org_topsub_geo_sppr_refpers_org_topsub_geo_sppr_refidbigserial19pers_org_topsub_geo_sppr_refdesc_naidint819pers_org_topsub_geo_sppr_refprim_auth_naidint819pers_org_topsub_geo_sppr_refauth_reference_type_donortext2147483647personal_referencepersonal_referenceidbigserial19personal_referenceprim_auth_naidint819personal_referenceperson_naidint819physical_occurencephysical_occurenceidbigserial19physical_occurencedesc_naidint819physical_occurenceauth_list_naidint819physical_occurenceextenttext2147483647physical_occurencephysical_occurance_notetext2147483647physical_occurencegpra_indicatorsbool1physical_occurencecontainer_listtext2147483647predecessor_successorpredecessor_successoridbigserial19predecessor_successorprim_auth_naidint819predecessor_successorpredecessor_successor_naidint819predecessor_successorpredecessor_successortext2147483647primauthprim_auth_naidint819primauthterm_nametext2147483647primauthauth_typetext2147483647primauthname_headingtext2147483647primauthfull_nametext2147483647primauthnumeratortext2147483647primauthpersonal_titletext2147483647primauthbirthdate_qualifier_naidint819primauthdeathdate_qualifier_naidint819primauthbirthdatejsonb2147483647primauthdeathdatejsonb2147483647primauthbiographical_notetext2147483647primauthsource_notetext2147483647primauthproposer_nametext2147483647primauthproposal_datedate13primauthreference_unit_naidint819primauthrecord_source_naidint819primauthnaco_submittedbool1primauthimport_rec_ctl_notext2147483647primauthnotestext2147483647primauthscope_notetext2147483647primauthsaco_submittedbool1primauthlat_longnumeric10primauthadmin_hist_notetext2147483647primauthestablish_datejsonb2147483647primauthabolish_datejsonb2147483647primauthestablish_date_qualifier_naidint819primauthabolish_date_qualifier_naidint819primauthlast_changed_datetimestamp29primauthlast_approved_datetimestamp29primauthlast_brought_under_edittimestamp29primauthapproval_historyjson2147483647primauthchangehistoryjson2147483647primauthbrought_under_edit_historyjson2147483647primauthvariant_person_naidint819primauthispreferredbool1primauthvariant_org_naidint819primauthis_under_editbool1program_area_referenceprogram_area_referenceidbigserial19program_area_referenceprim_auth_naidint819program_area_referenceprogram_area_naidint819reference_unitreference_unitidbigserial19reference_unitauth_list_naidint819reference_unitphysical_occurrenceidint819seealsoseealsoidbigserial19seealsoprim_auth_naidint819seealsoseealso_person_naidint819series_former_parentseries_former_parentidbigserial19series_former_parentdesc_naidint819series_former_parentformer_parent_naidint819special_pjt_for_authspecial_pjt_for_authidbigserial19special_pjt_for_authprim_auth_naidint819special_pjt_for_authauth_list_naidint819special_pjt_for_descspecial_pjt_for_descidbigserial19special_pjt_for_descdesc_naidint819special_pjt_for_descauth_list_naidint819specific_access_restrictionspecific_access_restrictionidbigserial19specific_access_restrictionspecific_access_restriction_naidint819specific_access_restrictionaccess_restrictionidint819specific_access_restrictionsecurity_classification_naidint819specific_use_restrictionspecific_use_restrictionidbigserial19specific_use_restrictionuser_restrictionidint819specific_use_restrictionauth_list_naidint819use_restrictionuse_restrictionidbigserial19use_restrictiondesc_naidint819use_restrictionauth_list_naidint819use_restrictionnotetext2147483647useforuseforidbigserial19useforprim_auth_naidint819useforusefor_naidint819variant_control_numbervariant_control_numberidbigserial19variant_control_numberdesc_naidint819variant_control_numberauth_list_naidint819variant_control_numbervariant_ctl_notext2147483647variant_control_numbervariant_ctl_no_notetext2147483647ETL Script for Indexing NAC Annotations in ElasticsearchThe ETL script for indexing NAC annotations in Elasticsearch is as follows:import csvimport jsonimport syscsv.field_size_limit(sys.maxsize)csvfile = open('transcriptions.csv', 'r')jsonfile = open('transcriptions.json', 'w')fieldnames = ("naId","object_id","page_num","username","timestamp","transcription")reader = csv.DictReader( csvfile, fieldnames)for row in reader: print(row) json.dump(row, jsonfile) jsonfile.write('\n')ETL Script for Authority List Migration from Oracle to AuroraThe ETL script for Authority List migration from Oracle to Aurora is as follows:Create PostgreSQL insert Statement for Date QualifierSET serveroutput ONDECLARE p_File utl_file.file_type; v_FileName VARCHAR2(100) := 'DateQualifierUAT' || '-' || TO_CHAR(sysdate, 'DDMONYYHHMISS') || '.txt'; dateQualifierCLOB CLOB; l_auth_details json; l_start_date json; l_end_date json; l_approval_history_element json; l_approval_history json_list; l_bue_history_element json; l_bue_history json_list; l_changed_history_element json; l_changed_history json_list; v_changed_date VARCHAR2(20); v_changed_userDisplayName VARCHAR2(20); v_changed_userDn VARCHAR2(50); v_changed_history json; v_count_changed INTEGER; v_index NUMBER; v_query_dateTime VARCHAR2(1000); v_query_userDisplayName VARCHAR2(1000); v_query_userDn VARCHAR2(1000); v_changed_array VARCHAR2(1000); v_approved_date VARCHAR2(20); v_approved_userDisplayName VARCHAR2(20); v_approved_userDn VARCHAR2(50); v_approved_history json; v_count_approved INTEGER; v_approved_array VARCHAR2(1000); v_brought_under_edit_date VARCHAR2(20); v_bue_userDisplayName VARCHAR2(20); v_brought_under_edit_userDn VARCHAR2(50); v_brought_under_edit_history json; v_count_brought_under_edit INTEGER; v_brought_under_edit_array VARCHAR2(1000);BEGIN p_File:= utl_file.fopen('LOGS_DIR',v_FileName,'w'); FOR dateQualifierRecord IN (SELECT xml_data, naid, 'false' AS isUnderEdit, term_name, auth_type, created_date, changed_date, to_date((extract(xml_data,'/*/recordHistory/imported/dateTime/text()', 'xmlns=";').getStringVal()),'YYYY-MM-DD HH24:MI:SS') AS imported_Date, approved_date, (to_date((extract(xml_data,'/*/recordHistory/broughtUnderEdit/modification[last()]/dateTime/text()', 'xmlns=";').getStringVal()),'YYYY-MM-DD HH24:MI:SS')) AS last_Brought_Under_Edit_Date, extract(xml_data,'/dateQualifier/description/text()','xmlns=";').getStringVal() AS dateQualifierDesc, extract(xml_data,'/dateQualifier/scope/text()','xmlns=";').getStringVal() AS dateQualifierScope, extract(xml_data,'/dateQualifier/placement/text()','xmlns=";').getStringVal() AS dateQualifierPlacement FROM authority WHERE auth_type = 'DateQualifier' ORDER BY created_date DESC ) LOOP dateQualifierCLOB := ''; dateQualifierCLOB := dateQualifierCLOB || 'insert into urmi_sandbox.authority_list(auth_list_naid, term_name, auth_type, auth_details, created_date, imported_date, last_changed_date, last_approved_date, last_brought_under_edit, changed_history, approval_history, brought_under_edit_history, is_under_edit) values ('; dateQualifierCLOB := dateQualifierCLOB || dateQualifierRecord.naid || ',' || '''' || dateQualifierRecord.term_name || '''' || ', ''DateQualifier'''; l_auth_details := json(); l_auth_details.put('description', dateQualifierRecord.dateQualifierDesc); IF dateQualifierRecord.dateQualifierScope IS NOT NULL THEN l_auth_details.put('scope', dateQualifierRecord.dateQualifierScope); END IF; IF dateQualifierRecord.dateQualifierPlacement IS NOT NULL THEN l_auth_details.put('placement', dateQualifierRecord.dateQualifierPlacement); END IF; -- l_auth_details.print(false); dateQualifierCLOB := dateQualifierCLOB || ', ''' || l_auth_details.to_char(false) || ''''; IF dateQualifierRecord.created_date IS NOT NULL THEN dateQualifierCLOB := dateQualifierCLOB || ', ''' || TO_CHAR(dateQualifierRecord.created_date, 'YYYY-MM-DD HH24:MI:SS') || ''''; ELSE dateQualifierCLOB := dateQualifierCLOB || ', null' ; END IF; IF dateQualifierRecord.imported_date IS NOT NULL THEN dateQualifierCLOB := dateQualifierCLOB || ', ''' || TO_CHAR(dateQualifierRecord.imported_date, 'YYYY-MM-DD HH24:MI:SS') || ''''; ELSE dateQualifierCLOB := dateQualifierCLOB || ', null' ; END IF; IF dateQualifierRecord.changed_date IS NOT NULL THEN dateQualifierCLOB := dateQualifierCLOB || ', ''' || TO_CHAR(dateQualifierRecord.changed_date, 'YYYY-MM-DD HH24:MI:SS') || ''''; ELSE dateQualifierCLOB := dateQualifierCLOB || ', null' ; END IF; IF dateQualifierRecord.approved_date IS NOT NULL THEN dateQualifierCLOB := dateQualifierCLOB || ', ''' || TO_CHAR(dateQualifierRecord.approved_date, 'YYYY-MM-DD HH24:MI:SS') || '''' ; ELSE dateQualifierCLOB := dateQualifierCLOB || ', null' ; END IF; IF dateQualifierRecord.last_brought_under_edit_date IS NOT NULL THEN dateQualifierCLOB := dateQualifierCLOB || ', ''' || TO_CHAR(dateQualifierRecord.last_brought_under_edit_date, 'YYYY-MM-DD HH24:MI:SS') || ''''; ELSE dateQualifierCLOB := dateQualifierCLOB || ', null' ; END IF; /* changed history end */ IF dateQualifierRecord.changed_date IS NOT NULL THEN v_changed_array := 'ARRAY['; --count the number of changed history node SELECT XMLCast(XMLQuery('declare default element namespace ";; for $i in /*/recordHistory/changed return count($i/modification) ' PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS NUMBER) INTO v_count_changed FROM dual; -- dbms_output.put_line('number of changed nodes: ' || v_count_changed); v_index := 1; WHILE(v_index <= v_count_changed) LOOP v_query_dateTime := 'declare default element namespace ";; for $i in /*/recordHistory/changed/modification[' || v_index || ']/dateTime/text() return $i ' ; v_query_userDisplayName := 'declare default element namespace ";; for $i in /*/recordHistory/changed/modification[' || v_index || ']/userDisplayName/text() return $i ' ; v_query_userDn := 'declare default element namespace ";; for $i in /*/recordHistory/changed/modification[' || v_index || ']/userDn/text() return $i ' ; SELECT XMLCast(XMLQuery( v_query_dateTime PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_changed_date FROM dual; SELECT XMLCast(XMLQuery( v_query_userDisplayName PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_changed_userDisplayName FROM dual; SELECT XMLCast(XMLQuery( v_query_userDn PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_changed_userDn FROM dual; -- dbms_output.put_line('At index ' || v_index); -- dbms_output.put_line('user display name ' || v_changed_userDisplayName); -- dbms_output.put_line('userDN ' || v_changed_userDn); v_changed_date := REPLACE(v_changed_date, 'T', ' '); -- dbms_output.put_line('date time ' || v_changed_date); v_changed_history := json(); IF v_changed_date IS NOT NULL THEN v_changed_history.put('dateTime', v_changed_date); END IF; IF v_changed_userDisplayName IS NOT NULL THEN v_changed_history.put('userDisplayName', v_changed_userDisplayName); END IF; IF v_changed_userDn IS NOT NULL THEN v_changed_history.put('userDn', v_changed_userDn); END IF; --v_changed_history.print; v_changed_array := v_changed_array || '$$' || v_changed_history.to_char(false) || '$$' ; v_index := v_index + 1; IF v_index <= v_count_changed THEN v_changed_array := v_changed_array || ','; -- dbms_output.put_line('v_index: ' || v_index || ' and v_count_changed: ' || v_count_changed); END IF; END LOOP; v_changed_array := v_changed_array || ']::JSON[]'; --dbms_output.put_line('changed array: ' || v_changed_array); dateQualifierCLOB := dateQualifierCLOB || ', ' || v_changed_array; ELSE dateQualifierCLOB := dateQualifierCLOB || ', ' || 'null'; END IF; /* changed history end */ /* approved history end */ IF dateQualifierRecord.approved_date IS NOT NULL THEN v_approved_array := 'ARRAY['; --count the number of approved history node SELECT XMLCast(XMLQuery('declare default element namespace ";; for $i in /*/recordHistory/approved return count($i/modification) ' PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS NUMBER) INTO v_count_approved FROM dual; -- dbms_output.put_line('number of approved nodes: ' || v_count_approved); v_index := 1; WHILE(v_index <= v_count_approved) LOOP v_query_dateTime := 'declare default element namespace ";; for $i in /*/recordHistory/approved/modification[' || v_index || ']/dateTime/text() return $i ' ; v_query_userDisplayName := 'declare default element namespace ";; for $i in /*/recordHistory/approved/modification[' || v_index || ']/userDisplayName/text() return $i ' ; v_query_userDn := 'declare default element namespace ";; for $i in /*/recordHistory/approved/modification[' || v_index || ']/userDn/text() return $i ' ; SELECT XMLCast(XMLQuery( v_query_dateTime PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_approved_date FROM dual; SELECT XMLCast(XMLQuery( v_query_userDisplayName PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_approved_userDisplayName FROM dual; SELECT XMLCast(XMLQuery( v_query_userDn PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_approved_userDn FROM dual; -- dbms_output.put_line('At index ' || v_index); -- dbms_output.put_line('user display name ' || v_approved_userDisplayName); -- dbms_output.put_line('userDN ' || v_approved_userDn); v_approved_date := REPLACE(v_approved_date, 'T', ' '); -- dbms_output.put_line('date time ' || v_approved_date); v_approved_history := json(); IF v_approved_date IS NOT NULL THEN v_approved_history.put('dateTime', v_approved_date); END IF; IF v_approved_userDisplayName IS NOT NULL THEN v_approved_history.put('userDisplayName', v_approved_userDisplayName); END IF; IF v_approved_userDn IS NOT NULL THEN v_approved_history.put('userDn', v_approved_userDn); END IF; --v_approved_history.print; v_approved_array := v_approved_array || '$$' || v_approved_history.to_char(false) || '$$'; v_index := v_index + 1; IF v_index <= v_count_approved THEN v_approved_array := v_approved_array || ','; -- dbms_output.put_line('v_index: ' || v_index || ' and v_count_approved: ' || v_count_approved); END IF; END LOOP; v_approved_array := v_approved_array || ']::JSON[]'; --dbms_output.put_line('approved array: ' || v_approved_array); dateQualifierCLOB := dateQualifierCLOB || ', ' || v_approved_array; ELSE dateQualifierCLOB := dateQualifierCLOB || ', ' || 'null'; END IF; /* approved history end */ /* brought_under_edit history end */ IF dateQualifierRecord.last_brought_under_edit_date IS NOT NULL THEN v_brought_under_edit_array := 'ARRAY['; --count the number of brought_under_edit history node SELECT XMLCast(XMLQuery('declare default element namespace ";; for $i in /*/recordHistory/broughtUnderEdit return count($i/modification) ' PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS NUMBER) INTO v_count_brought_under_edit FROM dual; -- dbms_output.put_line('number of brought_under_edit nodes: ' || v_count_brought_under_edit); v_index := 1; WHILE(v_index <= v_count_brought_under_edit) LOOP v_query_dateTime := 'declare default element namespace ";; for $i in /*/recordHistory/broughtUnderEdit/modification[' || v_index || ']/dateTime/text() return $i ' ; v_query_userDisplayName := 'declare default element namespace ";; for $i in /*/recordHistory/broughtUnderEdit/modification[' || v_index || ']/userDisplayName/text() return $i ' ; v_query_userDn := 'declare default element namespace ";; for $i in /*/recordHistory/broughtUnderEdit/modification[' || v_index || ']/userDn/text() return $i ' ; SELECT XMLCast(XMLQuery( v_query_dateTime PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_brought_under_edit_date FROM dual; SELECT XMLCast(XMLQuery( v_query_userDisplayName PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_bue_userDisplayName FROM dual; SELECT XMLCast(XMLQuery( v_query_userDn PASSING dateQualifierRecord.xml_data RETURNING CONTENT) AS VARCHAR2(20)) INTO v_brought_under_edit_userDn FROM dual; -- dbms_output.put_line('At index ' || v_index); -- dbms_output.put_line('user display name ' || v_brought_under_edit_userDisplayName); -- dbms_output.put_line('userDN ' || v_brought_under_edit_userDn); v_brought_under_edit_date := REPLACE(v_brought_under_edit_date, 'T', ' '); -- dbms_output.put_line('date time ' || v_brought_under_edit); v_brought_under_edit_history := json(); IF v_brought_under_edit_date IS NOT NULL THEN v_brought_under_edit_history.put('dateTime', v_brought_under_edit_date); END IF; IF v_bue_userDisplayName IS NOT NULL THEN v_brought_under_edit_history.put('userDisplayName', v_bue_userDisplayName); END IF; IF v_brought_under_edit_userDn IS NOT NULL THEN v_brought_under_edit_history.put('userDn', v_brought_under_edit_userDn); END IF; --v_brought_under_edit_history.print; v_brought_under_edit_array := v_brought_under_edit_array || '$$' || v_brought_under_edit_history.to_char(false) || '$$'; v_index := v_index + 1; IF v_index <= v_count_brought_under_edit THEN v_brought_under_edit_array := v_brought_under_edit_array || ','; -- dbms_output.put_line('v_index: ' || v_index || ' and v_count_brought_under_edit: ' || v_count_brought_under_edit); END IF; END LOOP; v_brought_under_edit_array := v_brought_under_edit_array || ']::JSON[]'; --dbms_output.put_line('brought_under_edit array: ' || v_brought_under_edit_array); dateQualifierCLOB := dateQualifierCLOB || ', ' || v_brought_under_edit_array; ELSE dateQualifierCLOB := dateQualifierCLOB || ', ' || 'null'; END IF; /* brought_under_edit history end */ dateQualifierCLOB := dateQualifierCLOB || ', ' || dateQualifierRecord.isUnderEdit; dateQualifierCLOB := dateQualifierCLOB || ');'; --dbms_output.put_line(dateQualifierCLOB); utl_file.put_line(p_file,dateQualifierCLOB); END LOOP; utl_file.fclose(p_file);END;/ ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download