The Dearth of High Quality Content Is A Great Opportunity For Building & Training Generative AI Platforms.

Moses Kemibaro
2 min readApr 7, 2024
Photo by Igor Omilaev on Unsplash

Let's not even get started on the fact that training large language models or LLMs for Generative AI platforms like ChatGPT is hard just because getting high quality content is not easy.

There is also the highly questionable practice of scraping copyrighted content from the internet without consent or compensation for LLM training purposes by companies like OpenAI.

As things stand, it's obvious we are in the Wild Wild West moment of Generative AI where anything goes and lots of ambiguity exists around how this works. So, for the time being, lots of these activities will continue unabated.

However, the fact that there is still a dearth of high quality content for LLM training purposes means that there is a great opportunity for organisations and individuals(?) who create, curate and publish lots of content to build and train their own LLMs with it.

In fact, the mere fact that this content will probably be domain specific and highly structured means it could require much less training and quality assurance (QA) compared to unverifiable content that is scraped from the Internet.

As we say in computing parlance, 'GiGo', as in, 'garbage in, garbage out'. In this context therefore, high quality content will be a game changer for those who own it and choose to either leverage it for their own Generative AI Initiatives or opt to license it to others who need it.
https://www.theverge.com/2024/4/6/24122915/openai-youtube-transcripts-gpt-4-training-data-google

--

--

Moses Kemibaro

20+ years driving businesses in Kenya & Africa as pure digital passion. A family man & curious by nature :)