Connect with us

Accounting

Which generative AI model did best on the CPA exam? Depends on the section

Published

on

ChatGPT is no longer the only large language model to pass the CPA exam.

After ChatGPT 3.5 initially bombed the CPA exam and then version 4.0 passed, it does remain the top performer overall. However, like any human accountant, it has its strengths and weaknesses.

These were part of the findings of a recent paper from Case Western Reserve University and accounting automation solutions provider AIgency. The researchers systematically evaluated the performance of Google Gemini, ChatGPT-4, Claude, Mixtral and Llama-2b on multiple-choice questions from CPA test preparation tools.

Overall, they found that ChatGPT-4 scored the best, with Claude 3-opus coming in a close second, followed by Google Gemini Advanced, then Mixtral-8x7b-32768. Llama 2B-70b-4096 did the worst.

Source: William Zacher Jr. & Sanmukh Kuppannagari

However, as the results show, not every model did uniformly well on all sections. ChatGPT, while a strong performer overall, was especially good on the BAR section for business analysis and reporting. Meanwhile, although its weakest point is REG, the regulatory area that is mostly devoted to tax regulations, it did better on this section of the exam than any other model. Claude was the best performer in the AUD section on auditing and attestation. While its weakest point was FAR, the section on financial accounting and reporting, even there its performance was second only to ChatGPT. Gemini was the second strongest performer on the BAR section, but did not do so well on REG. Mixtral, overall, had decent enough scores compared to a human but would only pass BAR, making it a mediocre player compared to its peers. Llama was the only one that would not pass any section, and it did especially poorly on REG. It was also the only one that did worse than a human. The average score for human test takers on REG was 59.19%, according to the paper.

“The study revealed that while some LLMs have made significant advances in mimicking the complex decision-making skills required for CPA exams, there remains variability in performance across different sections of the test,” said the paper. “This variability underlines the importance of tailored training and specialization in developing LLMs for professional applications such as the CPA exams.”

To perform the test, the researchers drew their multiple choice questions from the Becker CPA test preparation suite. Google Gemini, Claude and ChatGPT-4 were accessed via their online platforms. Mixtral and Llama-2b models were accessed through the Groq platform, an advanced computational infrastructure for high-speed AI processing. The questions were directly copied and pasted into the AI platforms from Becker’s test preparation material without any additional prompting or modification to ensure each AI model received the questions in their original form as they would appear in a CPA exam context.

Becker’s platform randomized the questions in batches of 15 questions, which the research said further mitigated potential selection bias. The tester, responsible for inputting the questions into the AI models, deliberately refrained from reading or evaluating the questions beforehand to prevent any unconscious bias in the prompting process. For each question, the tester selected the AI model’s first response marked as “correct,” irrespective of any variations in the explanations or outputs provided by different models.

Each AI model was subjected to each multiple choice section of the CPA test three times, allowing for a comprehensive assessment of its performance across multiple attempts. The criterion for determining an AI model’s success in this study was achieving a passing score, defined as an average score of 75 or higher, on any given section.

The researchers said the data indicates there is no one universal model for all tasks, so it is important to use the right model for the right applications. For example, the paper concluded that ChatGPT is “the only real option for zero-shot BAR automation,” as “no other model came close to its performance, and it had a relatively narrow variance,” meaning that ChatGPT-4 could be used to help with automated financial statement preparation or additional forecasting. On the other hand, the researchers said Claude was probably better on auditing-related tasks, which the paper said “is a solid indication that it can be used for fraud detection and internal control validation.”

“It is apparent from the results that there is no clear-cut winner,” the researchers concluded. “Most companies utilizing AI to perform financial administration functions should use a software infrastructure that allows them to use multiple task-dependent AI models.”

However, the researchers did recommend that “model selection for AI in an applied accounting setting should avoid Llama-2B, which performed worse than any other model in every section.”

Continue Reading

Accounting

In the blogs: Higher questions

Published

on

Valuations this year; handling interviewees; AI and accounting ed.; and other highlights from our favorite tax bloggers.

Higher questions

Haunting of the Hill House

  • Eide Bailly (https://www.eidebailly.com/taxblog): The House Ways and Means Committee planned to begin to publicly debate and amend tax legislation on May 13, with the ultimate goal to produce the “one big, beautiful” bill to extend the Tax Cuts and Jobs Act: “This is the stage where seemingly dead and buried ideas mysteriously come back to life to haunt the proceedings.” 
  • Wiss (https://wiss.com/insights/read/): Key highlights of the proposed beauty.
  • Current Federal Tax Developments (https://www.currentfederaltaxdevelopments.com/): And a bulleted summary.
  • Tax Vox (https://www.taxpolicycenter.org/taxvox): If Congress expands the Child Tax Credit with TCJA extension, who might benefit and what might it cost?
  • Tax Foundation (www.taxfoundation.org/blog): Policymakers will also decide the fate of the SALT cap. Debate rages about making the cap more generous, along with possible limits on pass-through workarounds and SALT deductions  by corporations. While capping business SALT could raise additional revenue, it would risk slowing economic growth.

Soft skills

Rational decisions

Tidying up

  • Boyum & Barenscheer (https://www.myboyum.com/blog/): Should you vacuum the meeting room? How many times should you talk with a candidate? Keys — some often overlooked — to effective interviewing.
  • The National Association of Tax Professionals (https://blog.natptax.com/): A WISP is the written information security plan that verifies how your firm protects taxpayer information. You can’t ignore them anymore, and here’s how to build a compliant one.
  • Taxing Subjects (https://www.drakesoftware.com/blog): An outstanding guide to SEO for accounting firms. 
  • AICPA & CIMA Insights (https://www.aicpa-cima.com/blog): Where does AI fit into accounting education? Everywhere.

Continue Reading

Accounting

House committee marks up tax reconciliation bill

Published

on

The House Ways and Means Committee held a hearing Tuesday to mark up the so-called “one, big beautiful bill” extending the expiring provisions of the Tax Cuts and Jobs Act while adding other tax breaks for tip income, overtime pay and Social Security income and eliminating tax credits from the Inflation Reduction Act for renewable energy as well as the Direct File and Free File programs.

“Today, this Committee will move forward on President Trump’s promise of delivering historic tax relief to working families, farmers and small businesses,” said committee chair Jason Smith, R-Missouri, in his opening statement. “The One Big Beautiful Bill is the key to making America great again. This moment has been years in the making. While Democrats were defending IRS audits on the middle class and tax carveouts for the wealthy, Republicans on this Committee got on the road, to hear from real Americans about how the 2017 tax cuts benefited them. This bill wasn’t drafted by special interests or K Street lobbyists. It was drafted by the American people in communities across the country.”

Democrats blasted the bill. “In 2017, Republicans passed a tax law that was supposed to pay for itself, raise wages, and help working families,” said ranking member Richard Neal, D-Massachusetts. “None of that happened. Instead, it exploded the deficit, worsened inequality, and left everyday Americans behind. Now they want to double down on the same failed playbook. One that rigs the system for billionaires and big corporations while everyone else pays the price.”

Among the provisions, the bill would make the expiring rate and bracket changes of the TCJA permanent and increase the inflation adjustment for all brackets excluding the 37% threshold, according to a summary from the Tax Foundation. The bill would also make the expiring standard deduction levels permanent and temporarily increase the standard deduction by $2,000 for joint filers, $1,500 for head of household filers and $1,000 for all other filers from 2025 through the end of 2028. It would also make the personal exemption elimination permanent, and make the $750,000 limitation and the exclusion of interest on home equity loans for the home mortgage interest deduction permanent. It would also make the state and local tax deduction cap, also known as the SALT cap, permanent at a higher threshold of $30,000, phasing down to $10,000 at a rate of 20% starting at modified adjusted gross income of $200,000 for single filers and $400,000 for joint filers.

Other changes and limitations to itemized deductions would be made permanent, including the limitation on personal casualty losses and wagering losses and termination of miscellaneous itemized deductions, Pease limitation on itemized deductions, and certain moving expenses.

The bill is likely to go through some changes when it goes to the Senate. “Politically, we’ve been talking about the process for the last couple months,” said Mark Baran, managing director at CBIZ’s national tax office. “Congress is finally able to pass a concurrent resolution to unlock the budget reconciliation process.”

“The House and the Senate have completely different instructions on what they’re going to cut and how they’re going to score,” he added. “Some of that’s very controversial, and that needs to be worked out. But now we’re getting into the actual crafting of provisions and legislation.”

According to a summary on the CBIZ site, the bill would make permanent and increase the Section 199A pass-through entity deduction from 20% to 23%, also known as the qualified business income, or QBI, deduction. The bill includes provisions that open the door for pass-through entity owners in specified service industries to use the deduction. It would also extend current deductions for research and experimental expenses through Dec. 31, 2029, and extend 100% bonus depreciation through that same date.

The bill would also allow businesses to include amortization and depreciation when figuring the business interest limitation through Dec. 31, 2029, while making permanent the excess business loss limitation.

In addition, the bill would retroactively terminate the Employee Retention Tax Credit for taxpayers who filed refund claims after Jan. 31, 2024. 

In keeping with Trump campaign promises, the bill would eliminate taxes on tips for employees in certain defined industries where tipping has been a traditional form of compensation. There would be a new $4,000 deduction for seniors that phases out starting at $75,000 of income. The bill would also eliminate taxes on overtime pay.

The bill would give individuals an above-the-line deduction for interest on loans used to purchase American-made cars, but that would be capped at $10,000 with income phaseouts starting at $100,000 (single) and $200,000 (married filing jointly).

The bill would also increase taxes on certain private college investment income up to a maximum of 21% on universities with a student-adjusted endowment above $2 million.

It would also roll back some of the renewable energy provisions from the Inflation Reduction, including a phaseout and restrictions on clean energy facilities starting in 2029, while also limiting or eliminating clean housing energy and vehicle credits. The bill would sunset major IRA clean electricity tax credits, including the clean electricity production tax credit (45Y), clean electricity investment tax credit (48E), and nuclear electricity production tax credit (45U) begin phasing out after 2028 and finish phasing out by the end of 2031; repeal hydrogen production credit (45V) for facilities beginning construction after 2025, according to the Tax Foundation. It would also phase out advanced manufacturing production credit (45X) for wind energy components after 2027, for all other eligible components after 2031. Across several IRA clean energy credits, the bill would repeal transferability after the end of 2027 and further limit credits based on involvement of foreign entities of concern. On the other hand, it would expand the clean fuel production credit (45K), and tighten rules on the 126(m) limitation for executive compensation.

The bill would terminate the current Direct File program at the Internal Revenue Service and establish a public-private partnership between the IRS and private sector tax preparation services to offer free tax filing, replacing both the existing Direct File and Free File programs.  

Continue Reading

Accounting

FASAB mulls accounting impact of federal reorganization

Published

on

The Federal Accounting Standards Advisory Board is asking for input on emerging accounting issues and questions related to reporting entity reorganizations and abolishments as the federal government endures wide-ranging layoffs and reductions in force, including the elimination of entire agencies by the Elon Musk-led Department of Government Efficiency.

“Federal agencies and their functions, from time to time, have been reorganized and abolished,” said FASAB in its request for information and comment

Reorganization refers to a transfer, consolidation, coordination, authorization or abolition of one (or more) agency or agencies or a part of their functions. Abolition is a type of reorganization and refers to the whole or part of an agency that does not have, upon the effective date of the reorganization, any functions.

The Trump administration has recently moved to all but eliminate parts of the federal government such as the U.S. Agency for International Development and the Consumer Financial Protection Bureau, and earlier this month, Republicans on the House Financial Services Committee passed a bill that would transfer the responsibilities of the Public Company Accounting Oversight Board to the Securities and Exchange Commission. 

FASAB issues federal financial accounting standards and provides timely guidance. Practitioner responses to the request for information will support its efforts to identify, research and respond to emerging accounting and reporting issues related to reorganization and abolishment activities, such as transfers of assets and liabilities among federal reporting entities. The input will be used to help inform any potential staff recommendations and alternatives for FASAB to consider regarding short- and long-term actions and updates to federal accounting standards and guidance in this area.

The questions include:

  1. Have any recent or ongoing reorganization activities or events affected the scope of functions, assets, liabilities, net position, revenues, and expenses assigned to your reporting entity (or, for auditors, your auditees)? If so, please describe.
  2. What accounting issues have you (or your auditees) encountered (or do you anticipate) in connection with recent or potential reorganization activities and events?
  3. Please describe the sources of standards and guidance that you (or your auditees) are applying to recent, ongoing, or pending reorganization activities and events.
  4. Have you experienced any difficulties or identified gaps in the accounting and disclosure standards for reorganization activities and events? What potential improvements would you recommend, if any?

FASAB is asking for responses by July 15, 2025, but acknowledged that late or follow-up submissions may be necessary given the provisional nature of the request. Responses should be emailed to [email protected] with “RERA RFI response” on the subject line.

Continue Reading

Trending