-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for subdirectory structure within data directory #569
Conversation
- recursively read the data directory - @TODO/SUGGEST - may be worth to think about using the sub directory information for enriching the index and/or use it for access control (e.g., subdirectory names may be named like the Azure AD Group for access control)
Thanks, this seems useful! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the CI results and see CONTRIBUTING.md for instructions on running linters locally.
* Convert comment for read_files function to DocString * Handing over `use_vectors` variable in the recursive call
- Removed out-of-reach condition within read_files function
if args.remove: | ||
remove_blobs(filename) | ||
remove_from_index(filename) | ||
elif args.removeall: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its in the main body, it was already there before, thats why you dont see it in the Review.
azure-search-openai-demo/scripts/prepdocs.py
Lines 397 to 405 in 36c14b6
if args.removeall: | |
remove_blobs(None) | |
remove_from_index(None) | |
else: | |
if not args.remove: | |
create_search_index() | |
print("Processing files...") | |
read_files(args.files, use_vectors) |
CI/CD is still failing - https://github.com/Azure-Samples/azure-search-openai-demo/actions/runs/6037247433/job/16381542344?pr=569 |
* Fixing syntax error - filename isn't global anymore due to the recursive function introduced. Have to be explicitly handed over to the split_text function
Missed to pass the filename towards the split_text function. split_text was setup to utilise the global variable filename, which doesnt exist anymore after introducing the read_files function. |
Merged, thanks for the changes! |
…#569) * Allow for subdirectory structure within data directory - recursively read the data directory - @TODO/SUGGEST - may be worth to think about using the sub directory information for enriching the index and/or use it for access control (e.g., subdirectory names may be named like the Azure AD Group for access control) * Update prepdocs.py * Convert comment for read_files function to DocString * Handing over `use_vectors` variable in the recursive call * Update prepdocs.py - Removed out-of-reach condition within read_files function * Update prepdocs.py * Update prepdocs.py * Fixing syntax error - filename isn't global anymore due to the recursive function introduced. Have to be explicitly handed over to the split_text function
Purpose
Does this introduce a breaking change?
Pull Request Type
What kind of change does this Pull Request introduce?
How to Test
What to Check
Verify that the following are valid
Other Information