Attention, please. Starting at 11:30 p.m., the technology and business trends and strategies of AI companies, domestically and abroad, will begin shortly, so please take your seats.
For the sake of the event, please turn off your cell phones or switch to vibration mode.
I will now begin the second presentation of CTC20242.
Please welcome Vice President Hwang Min-yeong, the co-founder of Celectista, the company that collects and processes AI learning data.
Please give him a big round of applause.
Can you hear me?
Hello. I'm Hwang Min-young from Selekstar, an AI learning data platform. Today, I'm the speaker of a very interesting event on a very small scale, and I'd like to talk to you about the current trends regarding LLM and AI learning data. Us.
Just a quick introduction to the company, we're a little bit of an image breaker. In terms of AI development, these days, the AI models are more and more what we use. There's an open AI, a GPT, a Claude, a Naver hypercolor, and all sorts of models can be used, but it's becoming more important which data you feed your AI to make it smarter. So we're responsible for all the data related to AI. Uh, through about 2300 clients, both domestically and abroad, and the cumulative investment of
We received 17,400 million won, and we're the company that's responsible for all the AI-related data in Korea regarding AI learning. We've been planning, building, selling, and even AI operations on how we're going to introduce AI. So we've got a bunch of references. Uh, let's cut to the chase. A major earthquake has occurred in the AI industry due to the appearance of Chechi Fitti. It's been around for over a year and a half now, but it's been a complete game changer in the AI market. And
We have a lot of clients, not just Neighbors and Samsung companies, but also a lot of start-ups. Many AI startups called us AI companies have given up on AI development. Some companies are closing down because they defaulted on their debts, and they received more than ten billion won in investment, but we all decided not to develop AI models anymore. Some start-ups start declaring, well, we're just going to use existing models.
So every time an open AI company announces something, the AI market changes dramatically, and there's a phenomenon where the stock price goes up and down because it's related to a theme stock. Every time ai-a announces something, ai start-ups die. They're experiencing rough waves on the outside. Some even say that. And now, more and more, we're going to help, uh, make it easier for you to develop AI. We're going to help you integrate AI. We've gone through a lot, but Google, open AI, they've now introduced a lot of platforms where you can develop AI without coding. Google has a platform like Vertex AI, or GPTS.
When my parents retired, I taught them for about 30 minutes, and they made a chatbot. It's now very easy to take your AI data and build a chatbot or something like that. So that's
So even the start-ups that help us with AI are having a hard time. So I think this is a similar moment to the iPhone Moment when the iPhone was first introduced.
In the late 2000s and early 2010s, if the ability to develop apps was competitive and a weapon, then in 2023 If I ask them to invest in me, they'll think I'm a weirdo. In the late mobile 90s and early 2000s, there was a saying that it was an internet company, but it no longer exists. In the late 2000s and early 2010s, there was a saying that it was a mobile company, but as it no longer exists in 2023, there is also a saying that it's an AI company that changes lives and changes lives, but I believe that AI companies can also disappear very soon.
Last year, we held an event with Professor Andrew, one of the pioneers of AI. Professor Andrewholm, if you're interested in AI, would have seen this face a lot, but the current supervisory learning, the existing AI, is changing the world so much, but the world is changing and changing so much due to the creation of AIILLs. Theyve completely turned the world upside down, and even though its a big issue, theyve stated that only a small amount of influence has happened so far, so Im very excited to see what kind of influence theyll have from now on, but Im also scared.
So, what am I trying to say?
It's not about AI technology itself, it's about business model at the end of the day. When the head of Ha Jungwoo Center of Neighborhood, the future of Neighborhood Clobar, and the future of AI, the head of Ha Jungwoo Center, said that in this day and age, when we asked how AI startups could survive with AI technology, he said that AI technology is something that can be used to its own advantage.
You said it's more important to create a pipeline for your multi-car value with data. But I agree with you 100 percent. AI technology itself, whether it's open AI or any model, hyperclub or llama, can just be used, and I think it's time to go back to basics and start focusing on business models again. So...
At the end of the day, it's not just for start-ups like us, but also for big companies. When Neighbor Hyper Club X announced it last year, we were able to show how to create value, finance, law, and education.
When it comes to Google, I met a team that developed Google.
What was funny was that P.T. in the open AI chat room pushed my mouth aside. I looked at what they were doing, and they were creating an AI that could easily create mobile apps, homepages, and other services. So in the end, the large-ranked models are also using platforms like Open AEL. So we're pretty much done with the first game and now we're just going to move on to what are the next steps that we're going to do with the value. So I want to emphasize again that this is essentially a business model.
So, you know, AI learns from data. For example, if you use an AI that can tell the difference between a dog and a cat, you can show an AI model a photo of a dog on the ceiling, a new photo of a cat on the ceiling, and ask a cat of puppies, So the AI learns through the data.
If there's bad data in there, it's going to be bad AI. One of the big trends that's going on now is that everything is being eaten up by the big players. Computer hardware chips, for example, Nvidia, Google, and AMD Intel are doing all of that. The same goes for cloud computing. Google, Amazon, AWS, Microsoft's cores, and even the AI model, PTGPT, the cloud, they're all taking advantage of it. And now, the application is open.
Just like AI opened the GPT Store, they also opened the application land. So at the end of the day, the only space that's blank is data.
For example, from open AI, we can collect all the data from around the world and make a consultation outfit for our child.
I'm sure you can make something for my child.
But we can't make a large concubine for Dr. Oh. Because
I can.
Yes, Dr. Oh Eun-young. The data is the data that Dr. Oh collected for decades. So in the end, you need to build something with your own multicultural data, but my company introduces AI, and it does all the data for you. You think you're running an AI business on BtoB? If you were to say that you're considering various AI start-ups, what kind of data pipeline could you create for yourself? How could you create a pipeline that would create your own multi-caran value? I think it's important to think about this.
In the case of GPT, there was a quantum jump in the data. GPT One to Two to Three was used long before GPD.
When the GPT came out three times, we just slammed in tons of data and improved the performance. So out of 500 billion tokens, 300 billion tokens have been learned through sampling. So GPT, GPD and GPT 3 are pretty much the same principles. It hasn't changed, and there have been some cases where we've seen performance improvements because we've pushed the amount of data really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, it's learning all the data in the world.
So I'd like to briefly explain the AI introduction trend. Um, anyway, like I said, artificial intelligence is learning from data and how we use that data has become more important, and it's become more important to create your own multi-car data. In this day and age, you can create your own AI model with direct data, and you can also use open AI models and fine tuning, so I think it's important to think about how we can leverage the data. But
I create a large-language model based on a small version of my own. You can use things like Lamar Two to learn your own AI model. There were a lot of things like that. Most methods are really expensive. It's very expensive to use the GPT, and it's still very expensive to use the GPT, and it's very expensive to create your own AI model. We
AI startups that we all know are making AI models, and they're spending about two billion won a month on GPU learning. It's very expensive to introduce AI. It's not like they're using an open AI, or they're using a Neighbor, or they're using SK Telecom, or whatever, but at the end of the day, the most efficient way to do that is to leverage these existing LAMs. I'm just going to briefly explain what a leg is.
Those LLMs have an input limitation. For example, if we were to use our company's chatbots, our company's code analysis chatbots, for example, it would be difficult to use them right away. because there's a limit to the amount of a sentence you can put into your influenza LAM.
So you can use data retrievers together.
You've all heard of this phenomenon. It's called halocynation. It's a phenomenon where it talks nonsense. For example, GPT to LLM. When I ask him to explain why King Sejong threw his MacBook Pro in a fit of rage, he happily answers my questions. King Sejong the Great got angry and threw a MacBook Pro at a breakfast meeting. And the reason I'm saying that is because LLM's principle itself, the highest probability that when you put together a sentence before a letter, the best thing that you can do with the next letter is you can do a letter after a letter, a letter after a letter, If you want to compare it to a person, it's just a brain matter. It's very
is designed to generate a slightly plausible answer.
So I'm going to use, uh, the two problems I mentioned earlier, So one of the ways that you can do this, that you have a limited amount of data that you can type in at a time, and this causes halosnation, is by leveraging an external database. If a user says, hey, tell me about the incident where King Sejong threw a MacBook pro, let's say there's a database outside of here called the Sejong History database. Then you search for MacBookPro in the Sejong period. According to the Sejong Chronicle, a user asked me to explain the incident where King Sejong threw a Mac-book pro, but in the Sejong Chronicle, the user asked me to combine these two pieces of information and come up with the most appropriate answer, but in the Sejong Chronicle, a user asked me to explain the incident where King Sejong threw a Mac-book Pro.
LLM, uh, Sejong the Great never threw a MacBook Pro. So I'm able to combine all of that information together and create an answer. I'm going to retrieve it from the database, use my data retriever, augment it, and produce the answers from my LAB augmentation, so it's called Rag Rag in my Retriever Augmented Generation. So these days, when a company introduces AAI, SLL has tried to incorporate their own LLM and create their own LLM.
The most common way to do that is to take the GPT. It's one of the trends in AI these days to structure documents from various databases so that you can quickly select the nearest data that's most convenient for users to search for.
There are many variables to consider in the field. You have to consider logics in your search system, how consistent the forms of existing documents are, how data structures and resources are available for the train, how to match the forms of documents that have been created in the future to make it easier for you to structure data is one of the many points of consideration for data structuring. So if you're not structuring the data, like I said before, the LLM itself has a certain text size that it can go into.
It's not going to be able to punch in all that stuff and generate the answers.
And yes, it's going to do its job. That's why we're currently introducing a lot of AI, but we haven't completely resolved the problem of AI hallucination yet, so we're starting projects for internal employees rather than for customers. For example, we make company regulations chatbots, or bank tellers
If I were to gather information from my customers, I would take their level of income, monthly income, and their assets, and call their headquarters. What kind of product should I recommend to my client? He used the A car to do banking at the bank. I'd like to substitute this as a chatbot, so that my employees can use it when they're dealing with customers.
that we're going to be using. So far, I've been considering building my own LAM with a fine tuned SLL, which I realized was very expensive last year, and then using an open AIGPT, a POC, and once it's verified, I'll build my own LAM with a step of my own. And then the third one is we're structuring the data so that we can take that leg data that you mentioned earlier
How do I use the GPT in the database that I have? That's how I use AI a lot. And then the fourth one is the evaluation, and we'll talk about the evaluation a little bit more. Now.
It's about evaluating the fire. Responsible AI is a keyword that's emerging these days. Responsible AI means responsible and literally responsible AI.
So far, AI has been a problem with POC until 2023, and that's the year AI will be introduced in earnest from late 2023 to 2024. That's why it's a real problem if you work hard on developing AI technology before, but now, when I introduce AI, it talks nonsense, I say something racist, and it says something sexist. And because we don't know what's going to happen,
There's the AI Ethics Association, the Artificial Intelligence Ethics Association, the White House creates the AI Principle Load, Google creates the AI Ethics Principle, and even Neighbor in Korea creates the ethical criteria. But now, up until the vision,
ETHICS Principles there used to be an area of autonomous regulation for companies, but now it's become part of the bill. That's why we created the first AI bill called AIX. It's obvious that the whole world is focused on this, but first, if you want to do a global business, you have to follow the EUAI eti, and secondly, in Korea,
Afterwards, the regulatory bill came out first overall, so we're following these laws in Korea. We're taking a lot of reference from them. If you want to know why we should care about AI laws, this graph shows the hype cycle of AI development released in the same year. The higher the expectation for AI is, the more money and labor are being introduced here. This axis represents time. So, the most...
I know it's in the pick, obviously, but you can't really see it the day before yesterday, but this is a generational AI. Gené AI is a manufactured AI, and this is a smart robot, but smart robots are an area that we don't know much about, so I skip it, and this is the foundation model, the generational buyer, and the AI foundation model. The next most important thing that we're looking at here is our sparrow server AI. Can you see that page? I can't quite see it, but I think our sparrow server AI is the most important thing.
What are global companies thinking about before they introduce AI? The first thing I said at the beginning of the lecture is that AI is not about technology at all, it's about what value it creates. I thought it was obvious that the top priority for global companies before introducing AI was whether or not it would be harmful, but it turns out that once we introduced AI, it would cause ethical problems, and we would be flogged. And multiple countries.
We're wondering if it's going to be a problem for us to introduce these bills. When we had a similar incident, we were arrested for trying to achieve something. I'm sure many of you know about the Yiuda crisis, but if something like that happens, it can cause companies to tumble.
That's why I'd like to emphasize one more thing. The final emphasis I'd like to emphasize to companies who are already trying to introduce BtoB AI to their business is insular contract community.
It's a trend to contractually receive commissions for responsible AI. Contractually, it's a trend to say that they're making responsible AI. In Korea, SK Telecom is talking about managing ESG and indirectly talking about being a responsible AI. So I
various companies around the world such as Samsung, Neighbor, and EUAIF, and created a principle for developing and utilizing a universal AI. This is no longer something that we should take care of independently and ethically, but from now on, it will go over to regulation and legislation.
To put it simply, AI for humans is a given. AI and human purpose should be for humans, diversity, fairness, inclusion. So you can't be racist, gender, or anything like that. You can't be religious. And for respecting human rights, when we are developing AI, we report data and the AI learns. So this has been a hot topic for a while.
There was a psychological and stressful exploitation of the data workers in South America. That's what they said. So in the process of developing AI, you have to respect people's rights, and when I say people's rights, I mean, there's a very narrow window. I can copyright it. I'll make a book that I've worked hard on, study it, and print the exact same book. You can't violate human rights by using AI. That's one way to look at it. I need to be safe, responsible, and secure. For example,
When the GPT Store first came out, I uploaded a confidential file on the GPT Store and made a chatbot. We need to maintain stability, responsibility, and security, and accuracy to answer correctly. The bottom three are more interesting. Transparency and explanationability means that AI should be able to explain why it's giving these answers.
As I explained earlier, LLM itself is a system that spews out the most likely letters one letter at a time. It's really hard to explain why an artificial intelligence would actually come up with that answer. The only way I can explain it is if it's learning from this data, and it's learning from this data, it's likely going to generate this answer. That's about it. And if I were to put it together with what's below it, it's called human monitoring and substitution.
The AI itself was introduced to reduce labor force. When I use the AI, I want a human to answer to a cow, not an AI. That's what's fun. The best example I can give you is the interview chatbot. The interview AI.
Well, AI interviewed me and rejected me. He said that you were out, and more importantly, that you were fired. Let's say he said that. Then AI should have explained why I was fired. And, uh, I can't accept the fact that I was eliminated. If a person raises their hand and asks for an explanation, then a person should be able to come and explain. So even though we introduced AI, we still explained that the person who oversees it should be a person and the person who oversees the management. At the very end.
At the very end, sustainability is something that we introduced AI using AI.
you've got to think about the environment as well. I used up a lot of computer chips for GPT Five GPT Four, and I used up so much energy that there was an image of melted snow around my data center. The fact that heat can be linked to the environmental issues is something we need to take into account in the development of AI. So
In Open A, when they introduce chips, they call them R.I.B.C. They say companies who use 100 percent renewable energy will use the chips of the companies. That's why the principle for the development and utilization of AI is quite far from mine. I'm busy developing an AI right now, but I'd also like to tell you that the time has come for regulations and laws to be applied instead of always worrying about it.
That's why AI is really I told you, it's not a problem. You can use a cat to describe it as a dog. But if my AI is functioning normally, if it doesn't cause any ethical problems, if there's no halocination, it's objective and ethical I explained the AI ethical principles earlier. I need to verify it objectively and apply my AI to it. So my company also has various needs regarding this. That's why we introduced and utilized AI, and that's why we should utilize 2ai.
How can you be sure that it won't cause a problem? So?
According to the leaderboard, I'm better than the GPT. I'm lower than the GPT, and every model A is lower than the GPT. They all say it's lower than GPT, but from an emotional standpoint, we feel that there's not yet an AI model that can beat Open IAI.
So there's a lot of research that needs to be done to quantify how to evaluate how hard you work. And even though the 2AI models that suit my business have the lowest GPT, hyperclub X is much more important for my business, and a benchmark data set for my business that allows me to quantify this. So...
We talked about this a lot today, but what's the flow of introducing ALLM? Of course, we'll review the introduction of LLM, test it first, then decide on the AI model, data structure, and leg, as I explained earlier. and structuring it and building the LN model and then moving it down the pipeline to making sure that it's really doing what we want it to do, and then finally applying it.
I'm just going to briefly explain the LLM introduction examples, which I explained earlier from Tintin Tintin Group. When we sell products to our customers, we use this as a tool to help them understand what products to recommend to their customers. So far, B to C and customer-to-customer interaction is something that we're very concerned about.
For the AI model, I just used the GPT. The internal data of the financial companies is badly managed, so we had a hard time introducing the AI, so for example, documents are scanned into images and stored as PDIEF. It was saved as a file for P.P.T.A. or in Korean, and I had to search for LLM2 and structure it in a way that I could write to LLM. And we can't objectively evaluate how accurate the AI model is, so it's not the traditional leaderboard benchmark.
It's not our finance. We needed a model that could evaluate the AI model on the benchmark that was suitable for our company. We went in here and helped a lot with the introduction. There's also a case where a fashion brand introduced GPT. They asked internal employees to manage HR. We wanted to make a chatbot that could say, hey, what's going on with this? Well,
The AI model used the GPT, and the AI model didn't have enough knowledge on these companies in general, so we even had to build a POC model for them.
So now, we think that AI itself has entered a new era of taking on a new face. At the end of the day, it's all about how you can structure your data well and evaluate good AI models to make good use of them. So the key is data-centric AI data center AI. As a company that specializes in data sensing AI, if you have any concerns about the introduction of AI, I will do my best to help you answer them. Yes, we finished in 30 minutes on time. Thank you.
Thank you, Mr. Hwang.
Let's give him another round of applause.
If you'd like to participate in the King of Review event, fill out your photo and message via QCode to our website on the screen.
If you've written previous sessions reviews, make sure to note that you have double participation.
The announcement will begin at 1 p.m.
We would like all participants to feel free to join us here after eating.
We will see you shortly.