Data in AI

Artificial Intelligence and the Law

Maximizing human well-being in the application of artificial intelligence to law.

About AI

Background of AI

Data Used in AI

Training AI

Bias in Data

Human Involvement in AI Results

AI Datasets Chat GPT 4-13-2023

Where does the large dataset used to train come from?

The large datasets used to train computers in AI come from a variety of sources, depending on the specific application. In some cases, the data may be generated by the organization or company that is developing the AI system. For example, a company developing a voice assistant may collect audio recordings of people speaking to use as training data for their system.

In other cases, the data may come from publicly available sources, such as government databases or open data initiatives. For example, a researcher developing an AI system to predict weather patterns may use historical weather data that is publicly available.

Additionally, some organizations and researchers may use crowdsourcing to collect large amounts of data. This involves recruiting a large group of people to contribute data, such as by labeling images or providing feedback on text. Crowdsourcing can be a cost-effective way to collect large amounts of data, but it can also be time-consuming and may require careful quality control to ensure the data is accurate.

Overall, the source of the data used to train AI systems will depend on the specific application and the goals of the project. The key is to use a large and diverse dataset that accurately represents the problem or task that the AI system is being developed to address.