A Model for AI-Human Collaboration

The proliferation of artificial intelligence (AI) systems has given rise to many questions. Can AI reason? Will AI develop consciousness? Will AI become too powerful? It’s natural for any of us to wonder—and even fear—what kind of world we might be making for ourselves with these machines. As I see it, AI has its limitations, but with the right kind of human intervention, it can help us accomplish certain tasks with unprecedented speed and accuracy.

The combination of powerful computers and self-correcting/self-teaching algorithms has yielded AI systems that can now compete with and even substantially outperform human experts. One example is IBM’s Deep Blue system, which defeated world chess champion Gary Kasparov under tournament conditions in 1997. With access to data from its previous matches with Kasparov, Deep Blue could abandon weak strategies and avoid past mistakes whereas Kasparov, like any human, could not.

Go, perhaps the world’s oldest board game, is orders of magnitude more complex than chess. Not until 2016 was an AI system, AlphaGO, also known as Google DeepMind, able to defeat world Go champion, Lee Sedol, winning four out of the five games played. AlphaGO won by analyzing the entire online library on Go, including all tournament games between human Go masters, every book on Go, and every game AlphaGO had played against itself or against human Go masters.

If it seems computers have advanced rapidly, they have, and for good reasons.

Moore’s Law
Moore’s Law is the observation that the number of transistors in an integrated computer circuit roughly doubles every two years. Moore’s Law explains why computing power has risen exponentially while the price of computers has dropped exponentially. 

Back in the early 1960s, during my early teens, I wrote and ran computer programs on an IBM 7090 computer (see figure 1). It was the first transistorized computer. It contained 50,000 transistors and cost over $3 million (over $30 million in today’s dollars). Apple laptop computers today contain 67 trillion transistors and sell for about $1,000. Supercomputers built since 2016 include over 12 quadrillion transistors.

Figure 1: IBM 7090 Computer: The First Fully Transistorized Computer
Credit: NASA

Moore’s Law made AI possible. It made possible the development of computer hardware and software algorithms that can perform certain tasks and calculations with speed and accuracy that far surpasses what any individual human can achieve.  

AI Achievements
The amazing successes of AI systems can be attributed solely to associative learning. Associative learning is a data mining and “learning” technique that seeks out important relationships among variables or features in a data set. Examples are AI algorithms that count the frequency of complimentary occurrences or associations across a large collection of items or actions. Retailers use such algorithms to determine which products are best placed together in a store or online catalog to maximize sales. 

Other examples include computer programs that learn from their mistakes. Google Translate does this. Four years ago, a physicist in Ukraine began asking me questions on my Facebook page. I used Google Translate to convert his Ukrainian language questions into English. Similarly, he used Google Translate to convert my English language answers into Ukrainian. The first several exchanges were quite frustrating for both of us. However, we observed that Google Translate’s capabilities steadily improved as we continued. It acquired more data and corrected its mistakes. Today, it’s as if we are both speaking the same language.

Deep AI identifies previously unobserved correlations in databases. By applying a measure of “interesting-ness,” they generate an association rule for new searches. To be effective, deep AI systems need to be fed large, diverse databases. When deep AI appears to be making a new discovery or generating a new database, the “new discovery” or “new database” is actually derived from analysis of existing databases. Deep AI explores uncharted or previously unanalyzed databases. 

AI’s  Limitations
AI is 100% dependent on the databases loaded into its systems. It can merge or otherwise “massage” these databases to create what may appear to be a new database, but it can’t actually generate new data, or a new database. 

People have dreamed about creating autonomous AI systems. However, AI is entirely dependent on humans to feed it data or to provide the hardware—such as digital cameras—that feed it data.

AI becomes competent only after humans provide it with relevant data within a constrained range. AI can sort through unevaluated data to find relevant data but only at great cost. Some of the more remarkable achievements of AI have required more than 10 gigawatt-hours of electricity. Such energy use equals the total annual energy consumption of over 1,000 typical U.S. households. For the sake of comparison, the human brain can analyze a substantial body of data using only 20 watts of energy.

AI can churn out fluent overviews and reviews on almost any topic, and yet it falters miserably on mathematical queries that require reasoning. For example, the celebrated AI darling of today, ChatGPT, scored just 26% on its attempt to pass a simple freshman high school math exam. 

A more powerful AI system, Minerva, did better. Minerva scored 50% on simple high school math problems. However, Minerva is three times ChatGPT’s size. Furthermore, it was fed a curated diet of data. Minerva received millions of mathematics papers/articles and sorted through millions of math books and web articles to estimate the most likely correct answers. Minerva’s success depended on human help—prefacing math questions with relevant sample questions and correct solutions that included all the reasoning steps. 

Minerva only appears to reason as it uses templates encountered previously. Its score of 50% required a month of training. Minerva, like all other AI systems, displayed no actual reasoning power.

While extremely large AI systems, like Minerva, are getting better at answering questions within the scope of their training data, they still lack the ability to answer entirely new questions. AI cannot make sense of something it has not seen before.

AI-Human Collaboration
Clearly, AI systems can do certain tasks that humans cannot, and they can do things at much greater speed and with fewer errors than humans can. On the other hand, humans can do certain things that AI systems cannot, and we can do certain things much faster, more reliably, and with much less energy consumption than AI. Therefore, it makes sense for humans to collaborate with AI. 

To determine the best and most efficient ways for humans and AI to collaborate, a team of nine computer scientists at the Lam Research Corporation in Fremont, California, conducted a set of experiments on semiconductor process development.1

The Lam research team had computer design engineers compete with AI in a virtual test to develop an efficient process for manufacturing computer chips. The test’s goal was to minimize the cost of producing a computer chip with specific target characteristics. The Lam team first had nine humans compete with one another: three senior engineers, three junior engineers, and three other engineers having no relevant experience. As expected, the most experienced and most capable of the three senior engineers won the competition. 

The Lam team then created several AI computer chip design players (AI algorithms that simulated human players). These AI players all used Bayesian optimization analysis on huge databases of computer chip designs. In only 13 out of 300 attempts did the AI players do better than the winning senior engineer—and then only slightly better. The Lam team concluded that the AI players’ lack of expert knowledge caused them to waste time and resources on exploring huge databases of possible processes that had no chance of success.

The team’s conclusion prompted them to try a “human-first, AI-system-second” strategy. In this approach, a human expert constrained data searches before handing over the task to the AI system. Lam’s team achieved the best and least costly result by optimizing the timing of the transfer from the human expert to the AI system.

Human-ChatGPT Collaboration 
After reading the paper published by the Lam Research Corporation, I did my own experiment in human-AI collaboration. My goal was to take technical content and rewrite it at a junior-high reading level without sacrificing scientific accuracy. The typical strategy brings together a science scholar with an editor or journalist who has demonstrated talent for writing science material for young readers. Several publications, including a few resources produced by RTB scholars and editors have shown at least some degree of success in this collaboration.

Last week, I decided to recruit a new collaborator, ChatGPT. I took my 4,000-word article on the possibility of life elsewhere in the universe, deleted the endnote citations, and found that the 3,200-word article scored a Flesch-Kincaid reading level of grade 17. I fed ChatGPT this 3,200-word article and instructed it to rewrite the article for an eighth-grade student. 

The result was unusable. AI did achieve the reading level goal, but nearly all the science in the article was mangled. To correct all the scientific errors would have taken me more time than rewriting the entire article myself. Nevertheless, the ChatGPT version showed me what a grade 8 reading level looks like.

So, I returned to the original 3,200-word article and worked on shortening the sentences, replacing technical vocabulary with simpler vocabulary, and injecting some idioms, illustrations, and humor. In less than two hours, I had brought the article down to a grade 12 reading level. However, the word count had bumped up to nearly 3,600 words.

Next, I fed this 3,600-word article to ChatGPT, again instructing it to make it readable at a grade 8 level. The result was an improvement over the initial attempt, but many scientific errors persisted. What I noted, though, was that most of the errors seemed to arise from ChatGPT going off on irrelevant tangents. So this next time around, I fed ChatGPT the 3,600-word article in bite-sized chunks. The smallest chunk was a single paragraph. The largest chunk was four paragraphs long. I chose the chunk size based on relevance to a specific subtopic. 

At this point, I was pleased with the outcome. Only a few scientific errors remained, and they were much less problematic. It became a relatively straightforward and efficient task for me to correct the errors without sacrificing readability. 

While ChatGPT rated the final readability level as grade 8, the Flesch-Kincaid rating came in at grade 9.5. Obviously, differences exist between the ChatGPT and Flesch-Kincaid algorithms for scoring readability. ChatGPT focuses more on vocabulary level and how sentences sound when read aloud. Flesch-Kincaid prioritizes paragraph length, sentence length, and average number of letters per word.

I took my experiment one step farther by feeding the grade 8 level article I had corrected to ChatGPT with the instruction to rewrite it for a grade 7 student. The new version came out with only a few and easy-to-fix scientific errors. The final result was a readable 4,400-word article containing nearly all the content of the original 3,200-word paper.

Through one day’s experimentation, I was able to develop a significantly beneficial strategy for ongoing collaboration with ChatGPT. The next time I want to transform one of my technically-written pieces into something that can be understood at an eighth-grade reading level, I can do so in a way that makes efficient use of both my time and the ChatGPT expertise. My experimental result aligns closely with the conclusions of the Lam Research Corporation’s team on how to optimize human-AI time and resources.    

AI and the Case for Human Exceptionalism
The Lam Research Corporation studies and my personal experiment with AI have something to say about human exceptionalism and the dual (body-mind) nature of human beings—two core biblical doctrines. AI can perform calculations and analyze vast, diverse databases at blinding speed. However, AI cannot reason. It has no mind. It is not conscious, much less spiritual. 

Like a parrot, AI can be trained to communicate in a way that, in many respects, mimics human communication. But this resemblance is entirely dependent on human input and training. Even then, what AI produces mimics only some of the features of human communication.

I have nearly 50 books in my office written by scientists and philosophers who have tackled the problem of explaining human consciousness in material terms. The books written by nontheists all insist that there must be some naturalistic explanation for human consciousness. However, to date, researchers have found not the slightest clue or hint as to what the explanation could be. AI seems to drive home the point rather effectively that physics and chemistry alone cannot explain human consciousness. As the Bible declared thousands of years ago when God created humans in his image (Genesis 1:26–27), that image is spiritual, not physical.   

Endnote
1. Keren J. Kanarik et al., “Human-Machine Collaboration for Improving Semiconductor Process Development,” Nature 616 (March 8, 2023): 707–711, doi:10.1038/s41586-023-05773-7.