Padariet datus pieejamus (Tehniskā pieejamība)

Open data needs to be technically open as well as legally open. Specifically, the data needs to be available in bulk in a machine-readable format.

Available
Data should be priced at no more than a reasonable cost of reproduction, preferably as a free download from the Internet. This pricing model is achieved because your agency should not undertake any cost when it provides data for use.
In bulk
The data should be available as a complete set. If you have a register which is collected under statute, the entire register should be available for download. A web API or similar service may also be very useful, but they are not a substitutes for bulk access.
In an open, machine-readable format
Re-use of data held by the public sector should not be subject to patent restrictions. More importantly, making sure that you are providing machine-readable formats allows for greatest re-use. To illustrate this, consider statistics published as PDF documents, often used for high quality printing. While these statistics can be read by humans, they are very hard for a computer to use. This greatly limits the ability for others to re-use that data.

Zemāk minētas dažas pieejas, kas var noderēt:

  • Keep it simple,
  • Move fast
  • Be pragmatic.

Daudz labāk ir iedot šodien neapstrādātus datus nekā perfektus datus pēc sešiem mēnešiem.

There are many different ways to make data available to others. The most natural in the Internet age is online publication. There are many variations to this model. At its most basic, agencies make their data available via their websites and a central catalog directs visitors to the appropriate source. However, there are alternatives.

When connectivity is limited or the size of the data extremely large, distribution via other formats can be warranted. This section will also discuss alternatives, which can act to keep prices very low.

Tiešsaistes metodes

Via your existing website

The system which will be most familiar to your web content team is to provide files for download from webpages. Just as you currently provide access to discussion documents, data files are perfectly happy to be made available this way.

One difficulty with this approach is that it is very difficult for an outsider to discover where to find updated information. This option places some burden on the people creating tools with your data.

Via 3rd party sites

Many repositories have become hubs of data in particular fields. For example, pachube.com is designed to connect people with sensors to those who wish to access data from them. Sites like Infochimps.com and Talis.com allow public sector agencies to store massive quantities of data for free.

Third party sites can be very useful. The main reason for this is that they have already pooled together a community of interested people and other sets of data. When your data is part of these platforms, a type of positive compound interest is created.

Wholesale data platforms already provide the infrastructure which can support the demand. They often provide analytics and usage information. For public sector agencies, they are generally free.

These platforms can have two costs. The first is independence. Your agency needs to be able to yield control to others. This is often politically, legally or operationally difficult. The second cost may be openness. Ensure that your data platform is agnostic about who can access it. Software developers and scientists use many operating sytems, from smart phones to supercomputers. They should all be able to access the data.

Via FTP servers

Nemodernāka metode, lai nodrošinātu pieeju datiem, ir izmantot failu transporta protokolu (FTP). Šī metode ir piemērota, ja jūsu auditorija pamatā ir tehniski izglītota, piemēram, programmatūru izstrādātāji vai zinātnieki. FTP sistēma darbojas HTTP vietā, tomēr tā ir īpaši piemērota failu pārsūtīšanai.

FTP ir zaudējusi atbalstu. Tā vietā, lai gādātu par mājas lapu, FTP servera izskatīšana vairāk līdzinās datora mapju izskatīšanai. Tāpēc, pat ja FTP ir piemērota mērķim, to izmantojot mājas lapu izstrādātājiem ir daudz mazāk iespēju prasīt samaksu par pielāgošanu.

As torrents

BitTorrent is a system which has become familiar to policy makers because of its association with copyright infringement. BitTorrent uses files called torrents, which work by splitting the cost of distributing files between all of the people accessing those files. Instead of servers becoming overloaded, the supply increases with the demand increases. This is the reason that this system is so successful for sharing movies. It is a wonderfully efficient way to distribute very large volumes of data.

As an API

Data can be published via an Application Programming Interface (API). These interfaces have become very popular. They allow programmers to select specific portions of the data, rather than providing all of the data in bulk as a large file. APIs are typically connected to a database which is being updated in real-time. This means that making information available via an API can ensure that it is up to date.

Neapstrādātu datu publiskošanai lielos apjomos jābūt katras atvērto datu iniciatīvas primārai rūpei. Ir virkne izmaksu, lai nodrošinātu API:

  1. Cena. Tie prasa daudz vairāk darba izstrādei un uzturēšanai nekā failu nodrošināšana.
  2. Gaidas. Lai veidotu lietotāju kopienu sistēmas pamatā, ir nepieciešams sniegt drošību. Ja kaut kas nojuks, tiek sagaidīts, ka jūs atradīsiet līdzekļus, lai to sakārtotu.

Liela apjoma datu pieejamība nodrošina to, ka:

  1. there is no dependency on the original provider of the data, meaning that if a restructure or budget cycle changes the situation, the data are still available.
  2. ikviens var iegūt kopiju un to izplatīt tālāk. Tas samazina izplatīšanas izmaksas organizācijai, kas ir datu avots un nozīmē, ka nepastāv viens vienīgs klupšanas akmens.
  3. citi var attīstīt savus pakalpojumus, jo viņi var būt droši, ka dati tiem netiks atņemti.

Providing data in bulk allows others to use the data beyond its original purposes. For example, it allows it to be converted into a new format, linked with other resources, or versioned and archived in multiple places. While the latest version of the data may be made available via an API, raw data should be made available in bulk at regular intervals.

For example, the Eurostat statistical service has a bulk download facility offering over 4000 data files. It is updated twice a day, offers data in Tab-separated values (TSV) format, and includes documentation about the download facility as well as about the data files.

Another example is the District of Columbia Data Catalog, which allows data to be downloaded in CSV and XLS format in addition to live feeds of the data.