You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basic Requirement
I want to create a link that refers to the same hub twice. For the link itself this basically works well via different qualifiers for the Hub-Hashkey that is derived in the staging view. The problem is that the referenced Hashkeys need to be inserted in the Hub as well but as far as I see it this is not possible for different columns from one source at the same time.
Example
Let's assume we have one source src_customer_connection with the following columns:
ID
CustomerNumber_ParentRecord
CustomerNumber_ChildRecord
(representing any relation between the two customers)
When we load a link CustomerToCustomer_L it is easy to derive hashkeys for both customers and get something like Customer_HK_ParentRecord and Customer_HK_ChildRecord in the link. But these hashkeys should also be present in the Hub Customer_H itself.
Let's say we declare a hub Customer_H with just CustomerNumber as natural business key. Then we can derive the Customer_H_HK either from CustomerNumber_ParentRecord or from CustomerNumber_ChildRecord in the staging model for src_customer_connection but not from both.
If additional data (from additional source files) for the parent and child customer records are not present at the moment the src_customer_connection is loaded we get referential problems since the Customer_HK_*** in the link might reference to non existing hub-records.
Describe the solution you'd like
The best solution would be to insert the necessary records from the link into the hub automatically in the background but I understand that this is not how automate_dv is designed.
What would fit into the framework from my point of view would be the possibility to somehow declare that one row of a source file could contain multiple hub entries. I think the precise metadata-definition is subject to discussion but this option would solve the problem implicitly.
Describe alternatives you've considered
A workaround that I see right now is the declaration of separate staging views for the same source. This gives the possibility to declare the exact names for the hub based on different source column names.
Additional context
This is a problem that came up in one of my projects and I didn't find it in any other issue. We are using automate_dv on sql-server.
Basic Requirement
I want to create a link that refers to the same hub twice. For the link itself this basically works well via different qualifiers for the Hub-Hashkey that is derived in the staging view. The problem is that the referenced Hashkeys need to be inserted in the Hub as well but as far as I see it this is not possible for different columns from one source at the same time.
Example
Let's assume we have one source src_customer_connection with the following columns:
(representing any relation between the two customers)
When we load a link CustomerToCustomer_L it is easy to derive hashkeys for both customers and get something like Customer_HK_ParentRecord and Customer_HK_ChildRecord in the link. But these hashkeys should also be present in the Hub Customer_H itself.
Let's say we declare a hub Customer_H with just CustomerNumber as natural business key. Then we can derive the Customer_H_HK either from CustomerNumber_ParentRecord or from CustomerNumber_ChildRecord in the staging model for src_customer_connection but not from both.
If additional data (from additional source files) for the parent and child customer records are not present at the moment the src_customer_connection is loaded we get referential problems since the Customer_HK_*** in the link might reference to non existing hub-records.
Describe the solution you'd like
The best solution would be to insert the necessary records from the link into the hub automatically in the background but I understand that this is not how automate_dv is designed.
What would fit into the framework from my point of view would be the possibility to somehow declare that one row of a source file could contain multiple hub entries. I think the precise metadata-definition is subject to discussion but this option would solve the problem implicitly.
Describe alternatives you've considered
A workaround that I see right now is the declaration of separate staging views for the same source. This gives the possibility to declare the exact names for the hub based on different source column names.
Additional context
This is a problem that came up in one of my projects and I didn't find it in any other issue. We are using automate_dv on sql-server.
AB#5439
The text was updated successfully, but these errors were encountered: