Planned Dates:Duration (months) field

Hello!

After working on question that was related to the field of duration, I found out that the type of it is String. I assume it was done for programs that have estimated durations (i.e 12-16 months). However, from working with the data it seems that some funders use it to write a string, for example, “12 months” or “A year”. This makes it harder process the data and require a lot of data cleaning.
On the other hand, the cases of estimated durations are rare (I saw only two cases in the whole corpus).

My suggestion -

  • Change the type from String to Number.

Implications:

  • Three publishers will have to update their files to fit the schema, but their datasets are relatively small and this can be easily re-published.

I assume we can’t fix it before version 1.0, but I think this is a minor change to the standard.

Thoughts?

The description of the field is:

  • Planned Dates:Duration (months) - Events or activities lasting more than one day should have either a duration (in months) or an end date.

As we clearly specify that the duration should be in months, I think the use of string could be considered a bug.

As the change from string->number would cause validation of existing data to fail this would not strictly be a backwards compatible change - but, as you note, it would be possible to work with the existing publishers, so that no-one is affected in practice.

If we can treat this as a bug rather than a minor change, that would be great.
Anyone see it as an issue? @davidkane @stevieflow @BobHarper @lindah

Small but important point, it’s not just Planned Dates:Duration (months) but also the much less used Actual Dates:Duration (months) field is affected (the field reference being from the Events object in the JSON Schema). So suggest you can change the title of this thread @mor.rubinstein

Agree, it should be a number as the description already specifies months (I could theoretically enter “three” and argue I’ve satisfied the Standard!).

It probably is a minor change, but I’m not sure how “minor” it is to implement, given that some publishers are using values that will invalidate against a number.

Here’s an example of its use as a string in GrantNav. 6 publishers are using “12 months” alone, so I suspect there will be many more.

If the startDate and endDate date fields are used by those publishers, however, it becomes a bit easier as we can explain how to calculate the value in spreadsheets.

Hi all, thought I’d add some detail on what we do here, if it helps or not. I suppose it helps as we have start and end dates of awards (written in YYYY-MM-DD format).

We have the start date of the award in one column (J) and the end date of the award in another (K).
I then have the ‘Duration’ column, with the following formula:

=IFERROR(IF(OR(J2="",K2=""),"",IF(AND(J2>=1,K2>=1),DATEDIF(J2,K2+28,“m”),"")),"")

This then provides me with the number of months as a numeric figure (rounded up for part-months) and if either the start or end dates are blank (as they sometimes are with the very old awards due to the poor nature of some historic data we have), it is simply blank.

Do some funders not have actual start and end dates for their awards? Even if it is best-guess?

Interesting you bring best-guesses up! @NSmithSportEngland

There are proposals (changes to the Standard) for allowing the plannedDates start and end date values to be formatted as either:

  • YYYY-MM (year and month only, day unknown)
    or
  • YYYY (year only, month and day unknown)

as well as the currently accepted full-date (YYYY-MM-DD) and full date + time formatting currently accepted.

because these are quite often “uncertain” and no level of certainty beyond year or month may be expected in many cases.

See this thread for more discussion of that and other date format proposals.

It feels like more than a bug because it has an impact on the data and its validation across a number of duration fields.
I’d be happy with it being handled as a minor change if the work to ready the existing data was well understood/in train before the change was made.

I think we’re fairly well agreed here that this change should be made.

Therefore I’ve staged a branch here which I’ve tagged as bug fix for a future update to the Standard.

I’ve simply changed Event.duration type to number and haven’t edited the description - which @mor.rubinstein I think you are looking at as part of your review of documentation language (just so that changes don’t get mixed up).

We can work through the implications on publishers’ data in due course (we’ve followed up on some where we had the opportunity to already).

P.S. there is an outstanding issue (raised by @davidkane) here which I’ve linked this to.

@lindah, This actually only relevant to this to - Planned Dates:Duration (months), Actual Dates:Duration (months) only. None of the current publisher published the Actual Dates:Duration (months) field, and we know who publish Planned Dates:Duration (months) (very few publishers).

I believe we should still treat it as a bug giving the fact it is easy to fix and affect very few datasets.

To update on this, the 360 Stewardship Committee also decided this should be a positive number (no negative durations), which is reflected in the update to the branch