Using XML for Tax Transactions - OASIS



[pic]

Using XML for Tax Transactions

White Paper

Abstract

Tax XML is a sample schema for tax data. The schema is a model to be used by standards organizations, industry partners, and government agencies in their investigation of the use of XML in a tax setting. This schema is further intended to provide thought leadership and the first steps toward creating an open standard for tax transactions.

The Tax XML Schema is integral because it defines the data to be stored and processed by applications and the supporting infrastructure. Collaboration on an open standard will result in the specification necessary for competition on implementation.

This paper is directed to those involved in fostering the use of XML who would like to see it applied to the field of taxation. It may also provide insight to those investigating the possibility of using XML in other business applications that are highly complex with a large number of distinct data items.

© 2000 Microsoft Corporation. All rights reserved.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Other product and company names mentioned herein may be the trademarks of their respective owners.

Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA

0x99

Introduction

Overview

Using XML for tax transactions is a strategy whose time has come! Already the United Kingdom’s Inland Revenue uses an XML schema for the filing of individual returns – and has done so since 1999.

Extensible Markup Language (XML) is a technology that promises to free business and tax data from application infrastructure. The data-centric approach of XML allows the communication of data regardless of the platform, operating system, or underlying technology of existing systems.

The keystone to the use of XML for tax is the Tax XML Schema that describes the specific tax data items and their relationship to each other. The long-term goal of this schema is to define an international standard with which all tax transactions would comply. The current thinking is that the schema would be comprised of namespaces owned and maintained by governmental agencies and merged into a single schema. A standards organization similar to the World Wide Web Consortium (W3C) would be established to provide hosting services, version control, and guidance to the tax administrations around the world for the master schema.

There is a need for urgency associated with the completion of the Tax XML Schema. As mentioned above, the UK is currently using this methodology. Other national governments around the world are planning the creation of e-government systems that use XML. State and local tax administrations within the US also have development projects underway. With no standard in place, however, each of these groups will develop divergent schemas. This will complicate the communication of information between groups, and will hamper implementation efforts by requiring customization of data storage and handling at each site. To avoid such a proliferation of approaches a standard should be pursued swiftly.

Current Scope of Tax XML

The current scope of Tax XML is to present a small sample of a schema based on US taxes in order to provide a starting point for collaborative work. The current work that has been done focuses on three areas: Federal individual, Federal corporate, and California individual. For detail of data included in this work, see the Taxonomy section below. This work has been done for modeling purposes, rather than as a specific example of the content to be included.

Also within the scope of this project is the investigation of the best type of schema to use. Several approaches are explored in this paper including:

• The first approach is to use a schema comprised of elements distinctly named to represent the data contained in each element. This approach is exemplified by the UK schema and is consistent with the schema promulgated by the W3C.

• A second approach is demonstrated by the eXtensible Business Reporting Language (XBRL) organization in a schema used for financial reporting. This strategy uses a very limited number of elements and includes the identification of the data through attributes instead.

• Another approach investigated but not displayed focuses on an EDI-related schema that uses separate elements to identify the data.

Background

Consider the process of an individual preparing to file his federal income tax return for the year. Tax information must first be gathered from multiple sources. Employers must provide Form W-2 with wage information to the taxpayer and to the tax agency. Financial institutions must send Form 1099-INT with interest income, again both to the taxpayer and the tax agency. Another form contains mortgage interest expense, and once more is required to be sent to both places. Still other information is required from other payers, local governments, and state agencies.

Then this information must be accurately entered from these documents into the return, so that it matches the data sent by the same source to the tax agency. If the taxpayer plans to file electronically, extra data must be entered to ensure correct matching of these documents. For example, the taxpayer must re-enter his name and address for each Form W-2 exactly as it appears on the paper W-2 – even if it is incorrect! The same requirement also exists for Form 1099-R and Form W-2G.

The taxpayer must then send the tax return to the IRS, either as a printed return or electronically, using the IRS’s proprietary system. Because of the format of the forms, much of the data is included in the return more than once. For example, the total of itemized deductions appears on both Schedule A and Form 1040. Another exampled mentioned above is that the taxpayer’s name and address may be included in the return multiple times. These redundancies increase the data storage requirements at the IRS and make the data vulnerable to inconsistency and error. If the taxpayer resides in a state or city with its own income tax, the information must be sent to there as well – often with a copy of the same information sent to the IRS.

Even within a tax agency, data that was received electronically from the taxpayer may be manually entered into a separate application due to incompatibilities of applications and platforms. Once stored, the information is usually difficult to access across various legacy systems of the agency.

Even just modernizing systems within a tax agency will not fully solve these issues. Replacing legacy systems with fully integrated systems may enable effective data transfer within the agency, but it does not address the redundancies within the data itself nor the issues of communicating data among the taxpayer, employer, financial institution, and tax administration.

The Tax XML Schema provides a solution to these issues. Standardization of data so that it can be communicated electronically without ambiguity will change the processes of tax preparation. Employers will be able to provide data electronically in a format that will be readable by taxpayers, income tax software providers, and the IRS. The information will be automatically entered into tax software to eliminate the errors that can occur during manual entry. Matching data in a tax return with the same information sent to the IRS becomes a nearly fail-safe task.

There are other efforts underway that can do some of these same tasks. For instance, Intuit has a process that allows information from participating companies to be downloaded into a tax return being prepared using its products. However, this is a proprietary system designed only for this specific transfer. Tax XML, on the other hand, offers complete sharing of data based on an open standard and XML technology, as well as data redundancy eliminated by separating data from the government forms.

Technical Overview

The technical overview summarizes the work that has been done for Tax XML. This section includes information about the taxonomy, schema, and instance files developed as part of Tax XML.

The Taxonomy

XML focuses on data. Therefore, the first work done for Tax XML was to create a hierarchy, or taxonomy, of the tax data to be included in the schema. The goal of the taxonomy is to streamline and reorganize tax data to make it more logical in order, independent of the presentation on the existing government tax forms, and to eliminate redundancy.

In order to accomplish this, a review was done of the data contained on forms and in the electronic filing record layouts, where they were available for a form. The review focused on the following tasks:

Arrange data into logical groups. Data was moved from its location on the form and included where it would allow the data to flow functionally from a computational perspective.

Exclude computed amounts. Data was included if it was an entered field, but usually not if it was a computed amount. Computed amounts were sometimes included if they were key check totals or placeholders for parts of the taxonomy not yet completed.

Include data only once. Duplication increases the possibility of error and requires additional storage. If data appeared in more than one place, the data was captured in the hierarchy in the place that represented its source entry point.

Tax Forms Included in Tax XML

It is somewhat misleading to speak in terms of forms, but it is a helpful reference for identifying the scope of the work done to date. Tax XML includes the information from the following income tax forms and schedules, although the data is reorganized and computed amounts are not included.

Although Tax XML is intended to be a single schema, the work done has been grouped into individual income tax and corporate income tax for ease of analysis and is presented in those parts throughout this document.

Individual Forms

|Form 1040 |Form 1040A |Form 8815 |

| Schedule A | Schedules 1, 2, and 3 |Form 8828 |

| Schedule B |Form 1040EZ |Form 8829 |

| Schedule C |Form 2210-F |Form 8839 |

| Schedule E (pg.1) |Form 2441 |Form 8863 |

| Schedule EIC |Form 4255 |Form 9465 |

| Schedule F |Form 4562 |Form W-2 |

| Schedule H |Form 4797 |Form 1099-INT |

| Schedule J |Form 4835 |Form 1099-DIV |

| Schedule R |Form 8606 |Form 1099-MISC |

| Schedule SE |Form 8615 |Form 1099-R |

The Forms 1099-INT, 1099-DIV, and 1099-MISC are included above, even though these forms are not currently included in the paper filing or electronic submission of forms to the IRS. The reason for this is to show an example of what the future might hold where financial institutions and employers would also use this schema to transmit their information to the government, and then on to the taxpayer for inclusion in the individual’s income tax return. The seamless integration of data transfer in such a system shows the extended possibilities of Tax XML beyond just the filing of income tax returns.

Corporate Forms

|Form 1120 |Form 1120 (continued) |Form 4255 |

| Schedule A | Schedule K |Form 4562 |

| Schedule C | Schedule L |Form 4797 |

| Schedule E | Schedule M-1 | |

| Schedule J | Schedule M-2 | |

Form 1120 covers much of the data included so far. The additional forms listed are available from the work done on individual income tax. Because the goal of Tax XML is to define a data structure that crosses different types of tax returns, the tag names and data organization are consistent and reusable whether the tax type is corporate, individual, or another type.

Data Hierarchy for Tax XML

The beginning of the hierarchy is the same for all types of tax. This list shows the relationship of the individual and corporate tax branches to the entire tree.

|TaxML |

| Authentication |

| Identification |

| KeyID |

| TaxYear |

|+ Version |

|+ IndividualTax |

|+ CorporateTax |

The TaxML element serves as the top level, or root, of the hierarchy. Below TaxML are lower levels of the tree. The ‘+’ sign preceding an element indicates that there are levels below this item. If there is no ‘+’ sign, the element contains data rather than child elements.

The Authentication and Identification elements are shown only as placeholders. Security and verification of identity are beyond the current scope of Tax XML.

Building a Data Hierarchy for Individual Taxes

The first part done in the Tax XML project was work on individual income taxes. A data hierarchy was developed that had many levels and named the data elements with descriptive names designed to clearly identify the information they held. Element structures were developed, such as Name, to hold a set of elements and that could easily be reused. Groups of elements that could occur more than once were labeled as lists for easy identification. Some of these structures were not included in the best practices that are shown below, but they still represent a viable approach and are included for that reason.

The elements that comprise the IndividualTax element are shown below.

|- IndividualTax |- IndividualTax (continued) |

| + Taxpayer | + FarmIncomeAveraging |

| + Spouse | Tax |

| + Address | + ChildCare |

| + FilingStatusInformation | RetiredDisabledCredit |

| + DependentList | ChildTaxCredit |

| + Wages | + Adoption |

| + InvestmentIncome | ForeignTaxCredit |

| StateLocalIncomeTaxRefund | + OtherCreditsList |

| AlimonyReceived | TotalCredits |

| + ActivityList | + SelfEmploymentTax |

| CapitalGainLoss | + AMT |

| + SaleOfBusinessProperty | SocialSecurityAndMedicareTaxOnTips |

| + RetirementOrIRA | TaxOnRetirementPlans |

| + MiscellaneousIncome | + HouseholdEmploymentTaxes |

| TotalIncome | + ITCRecaptureList |

| + StudentLoanInterestList | + OtherTaxesList |

| MedicalSavingsAccountDeduction | TotalTax |

| MovingExpenseDeduction | + TaxPayments |

| SEHealthInsuranceDeduction | AddChildTaxCredit |

| KeoghSepSimpleDeductions | Refund |

| PenaltyOnEarlyWithdrawal | + Bank |

| + AlimonyPaidList | AppliedtoNextYearES |

| AGI | + BalanceDue |

| ExemptionAmount | UnderpaymentPenalty |

| StandardDeduction | + UnderpaymentPenaltyFarmers |

| + ItemizedDeductions | EstimatedTaxPenalty |

| TaxableIncome | + Preparer |

| EducationCreditRecapAmount | + PriorYearInformation |

| + KiddieTax | + StateIndividualTax |

Arrange Data into Logical Groups

The Taxpayer element is a good example of data being grouped together logically. The element is comprised of the data fields shown in the following table. This information was drawn from various tax forms and combined here. Before this rearrangement of data, these pieces were scattered across five separate forms.

|- Taxpayer |- Taxpayer (continued) |

| IDNumber | PresidentFund |

| + Name | Exemption |

| DateOfBirth | PINTPSignature |

| Blind | Occupation |

| MilitaryIndicator | Disabled |

| HomePhone | PriorYearStatementIndicator |

| BestTimeToCallHome | QualifiedCurrentYearExpensesForHopeCredit |

| WorkPhone | QualifiedExpensesForLifetimeLearningCredit |

| BestTimeToCallWork | |

Reuse Element Groups

Two methods were employed in the individual tax branch to make elements reusable. The first is demonstrated by the Name element. The Name element is comprised of the data fields shown in the following table. Once the Name element is defined under the Taxpayer element as having these parts, it can be referred to simply as Name with the sub-elements implicitly included. For example, when the Name element is used later in the hierarchy under Wages, it will only need to be identified as Name without reference to its parts; i.e., FirstName, LastName, etc.

|- Name |

| FirstName |

| MiddleInitial |

| LastName |

| Suffix |

| CompleteName |

| NameControl |

The second method involves the Activity element. This element combines tax information for businesses, rentals, farms, and farm rentals into a single branch rather than in four branches as would result from following the associated tax forms. Since the information on each of these four forms is very similar, a type code was implemented to identify what type of activity was being reported. The table below is abbreviated. There are 245 elements and sub-elements under ActivityList, so being able to reuse an element group in this way is significant.

|- ActivityList |

| BusinessIncomeLoss |

| FarmIncomeLoss |

| RentalIncomeLoss |

| FarmRentalIncomeLoss |

| Activity |

| Type |

Identify Repeating Groups

Some data structures will repeat more than once when reporting tax data. An example of a repeating structure is shown in the preceding table. The ActivityList element indicates that a repeating structure follows. In this case, the repeating element is Activity. ActivityList may occur only once and is the parent for the element Activity that may repeat an unlimited number of times.

Data Hierarchy for Corporate Taxes

The CorporateTax branch of the tree represents data used in corporate income tax returns. A sample of the elements that comprise the CorporateTax element are shown below.

|- CorporateTax |- CorporateTax (continued) |

| + entityInformation | + balanceSheets |

| + income | + incomeReconciliation |

| + deductions | + analysisOfRetainedEarnings |

| + taxAndPayments | + preparer |

Data Organization Best Practices

These best practices were developed during work on the corporate tax branch of the taxonomy.

• Organize data to follow the logical flow of tax information.

Tax data follows a predictable sequence that is comprised of the following common concepts. These concepts underlie tax transactions regardless of locale. Dealing with data at this level makes it possible to define a hierarchy that can be used globally.

Common Concepts

Identifying data – names, location, other information about the taxpayer

Income

Deductions

Tax

Payments

Credits

Balance – tax due or refund

Preparer information

• Insert data into its functional location.

This is a corollary of the item above. In the current IRS forms model, supporting data is gathered on separate pages that follow the main form (1040, 1120, etc.). In the Tax XML organization, the data is included in the flow where the summary total would appear on the IRS main form. For example in a corporate 1120 return, all of the information for Cost of Goods Sold would be included after Gross Receipts or Sales as opposed to its location on the tax form (Form 1120, page 2.)

• Gather information into logical blocks.

For example, all the information for a taxpayer was gathered and stored together. This data includes the taxpayer’s name, identification, and other information about the taxpayer.

• Capture data at its source and only once.

Including the same piece of data in more than one place often leads to errors because the data changes in one location and not the other. By capturing data at its source, and only once, these problems are eliminated. An example of this is that many forms used to calculate tax credits include data from elsewhere on the return. In this model, that information would not be included again. Only new, unique data would be included in the data related to the credit.

• Avoid including unnecessary data.

Do not include data that can be inferred reliably from other data. For example, many forms ask a Yes/No question that if answered “yes” requires the next field to be completed. The data in the field following the Yes/No question effectively provides the Yes/No answer. If the data exists, the answer is yes. If the field is blank, the answer is no.

• Include only key totals.

In general, the goal is to include the source data needed for computations, not the computed amounts. This reduces the file size for transmission and for storage. The computations would then be done by the government agency and the results checked against the taxpayer’s results using key totals. These key totals are crucial to ensuring the correct transmission of a tax return and would be included. Examples of such items are taxable income and tax.

• Name the data with descriptive tags.

Use complete meaningful names for the tags. Avoid abbreviations or jargon.

• Data names should use the camel case convention, be 40 characters or less, and not include spaces or special characters.

Camel case is a convention that begins a data name with a lower case letter. For example: corporateTax.

• Combinations of parent-child names must be unique.

The reason for creating unique parent-child combinations is that it provides an absolute reference for the location of this combination within the hierarchy and the namespace. The advantage of this is the ability to send subsets of data without the entire schema, or large parts of the schema, needing to be included. For example, suppose that in order to gain loan approval, a financial institution requires a company’s income and deductions to be submitted. This feature allows this data to be transmitted without all the grandparent data up to and including the root of the schema.

The Schema

Schema Best Practices

• Use separate namespaces for different tax entities.

Tax XML is intended to include all tax transactions. To accomplish this, it is necessary to create separate namespaces for entities such as individual, corporate, partnership, etc. This recognition is the result of two factors: size and maintainability. Combining all entities into one schema or taxonomy without employing namespaces results in a document of unmanageable size. By separating the information into namespaces, the creation and maintenance of the various parts can be done by different parties. This allows the work to be done by different tax agencies, and then combined into a whole.

• To indicate the parent of a list of items append an “s” at the end of the parent element name.

The following child element is then named with the singular version of the same name. Parent elements hold data that can be calculated – i.e. the total of detail items.

Example of Multiple Items

|Parent Element |Child Element |maxOccurs |DataType |

|income |otherIncomes |1 |monetary |

|otherIncomes |otherIncome |* |monetary |

|otherIncome |description |1 |string |

• No fields are set as required in the schema; i.e., minOccurs = 1.

This will allow a piece of the return to be communicated.

Schema in Two Different Formats

TaxML was presented in normal XML schema format and in eXtensible Business Reporting Language (XBRL) format.

• Normal XML Schema Format

The normal XML format produces a structured hierarchical schema. Data is contained within an element, named with description of its contents.

Advantages

• Highly hierarchical structured format allows easy view of structure. XSLT applies easily to a hierarchical document instance.

• A normal XML parser is able to validate the structure, required data and data format of a document instance.

• eXtensible Business Reporting Language (XBRL) Format

The XBRL XML format produces a flat schema without order requirements. Data is contained within an item element with a name attribute that describes its contents.

Advantages

• Without order constraints, the XML document instance can easily be split apart with each piece being valid XML.

• XBRL can be stored and extracted in a database using normal XML queries. Since tax data items are numerous and change with tax law changes, it is impractical to store each data type in separate columns. Since XBRL data is stored in the item element instead of its own distinctly named element, data can be stored in a database in one column named item.

Data Types

|monetary |Numeric. Used for money fields. Example: 5123.13 |

|(decimal) |indicates $5,123.13. |

|float |Numeric. Format specified on a field basis to show |

| |placement of decimal. Used for percentages. Example: |

| |50.32 indicates 50.32%. |

|boolean |Used for Yes/No situations. Example: 0 indicates “No”|

| |and 1 indicates “Yes”. |

|date |Used for dates unless a word such as “various” is |

| |allowed. Example: 2000-11-02 indicates November 2, |

| |2000. |

|int |Used for numeric fields other than percentages or |

| |money. Example: 5123 indicates 5,123. |

|string |Used for anything that is not included in the above. |

| |Text is probably obvious, but also used for ID |

| |numbers, calendar years, dates that allow words as |

| |well as numbers, and anywhere else that you want to |

| |be able to control the format of the information. |

Resources

Web Sites

This is a link to the Worldwide Web Consortium. This group was created in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability.

is an independent resource for news, education, and information about the application of XML in industrial and commercial settings.

A website presented by Architag International Corporation, a leader in XML education.

This is the link for Microsoft’s developer network web link.

Xbrl stands for extensible Business Reporting Language. This organization promotes XML for financial reporting data.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download