Dealing with publishers that cover multiple funders


#1

Currently, the standard doesn’t differ between a publisher and a funder. There can be a situation where a publisher is not the funder. For example, the Cabinet office can publish grants data for the Ministry of Justice.

Edafe came up with the following suggestions:

Publisher:Identifier
The identifer for the publisher this grant. Where the publisher is not the same organisation as the funder, it is important to provide this information to users of your data. This promotes trust and confidence in the data as well as providing an essential link to the provenance i.e. where the data came from. The Organisation Identifier Standard guidance explains how to create this ID, based either on the known company or charity number, or upon identifiers held in the grant-maker’s internal systems.

Publisher:Name
The name of the publisher of this grant. When you provide a Publisher:Identifier, also provide a publisher name to show who the publisher of the grant. Where the publisher is not the same organisation as the funder, it is important to provide this information to users of your data. This promotes trust and confidence in the data as well as providing an essential link to the provenance i.e. where the data came from.

Advice to publishers could be to provide the above two and the data source.

Do we want to add this to the schema?
See original Github issue here


#2

I assume this issue does not come up very often, but when it does we need to have a practical way to address it. I also think it promotes trust and confidence in the data, as Mor says. I think Edafe’s suggestion is sensible.


#3

“Publisher” may not be sufficiently precise IMHO. We (SmartyGrants) are an Australian-based platform for grantmaking agencies, and we’re developing an open data feed, which we’ll try to align with the 360Giving standard as much as possible. Are we a “publisher”? What if someone takes data from our platform, combines it with other stuff, then publishes that - who’s the “publisher” now?

I think what we’re discussing here is tracking provenance, which is really important in some fields (like scientific data, evidence in legal cases etc), but maybe less so here. But if we want to capture the whole chain, then perhaps a field with some kind of comma-separated list might be necessary.

A question I’d ask is what people are going to do with that information? If I’m consuming a feed of data about grants from the Ministry of Justice, what difference does it make to me that the Cabinet Office served as an intermediary? What does their role change about my understanding of the data?


#4

I think we define a publisher as the original publisher of the data, not the re-user / publisher. I think there is a difference when the Cabinet Office for example is publishing other offices data, it means that the contact person for the data itself is different, and users should be aware of it.

To be fair, we have not encounter a case like this before where an entity published in the name of another entity, this is all in theory at the moment.


#5

This seems like a good idea, and would be fairly easy to back-fill from existing data by using the publisher info from the registry.


#6

Doing this might risk some confusion for data re-users. In citing the source (compliant with CC BY 4.0) who should they attribute - the publisher or the funder? That should be made clear if it isn’t already.

How would Cabinet Office be publishing the MoJ data? If it’s through eg data.gov.uk then the usual arrangement is that MoJ is still the data owner. data.gov.uk would just be the intermediary/publishing mechanism. MoJ is still the data owner/publisher.

However, if Cabinet Office is publishing aggregated data that they have collected and processed, I’d personally consider that as Cabinet Office published data, with MoJ as funder.


#7

I would say that under normal circumstances the publisher would be cited as the data creator, unless there is some other indication that a funder (or any other entity) should be attributed.

The 360 Data Registry is a list of datasets by publisher, the publisher provides the appropriate licence and they would usually be credited in attribution (e.g. with a CC-BY licence). When a publisher provides information to 360 about a dataset to be added it goes into this registry. This is effectively where the licensing arrangements are made clear.

This is separate from the actual 360 Standard / grants data. There’s a dataset which describes the datasets included in the Registry here http://threesixtygiving.github.io/getdata/, which provides a machine-readable means for licence statements are included.

If a publisher weren’t the funder then we would be presuming that the funder has given the publisher permission to use their data and apply the licence. A licence could also contain an attribution statement that would make it clear if a funder also needs to be credited.

In the case, the Cabinet Office data was published here. It has an Open Government Licence (OGL) attached and is “From: Cabinet Office”. If there were an arrangement that the MoJ should be attributed this would have to be made clear somewhere, e.g. on that page or in the licence - it’s the responsibility of the publisher to do this. As they haven’t, I would assume Cabinet Office receives the attribution as per the OGL.