Files Connector
Overviewโ
The Files Connector is a straightforward but powerful tool that enables you to upload and extract content from various document formats. Whether you need to incorporate PDF manuals, Word documents, spreadsheets, or plain text files into your knowledge base, this connector makes the process simple and efficient. Uploaded files become searchable and accessible through our products ACE Search and Chat.
Table of Contentsโ
- Files Connector
Supported File Typesโ
The Files Connector supports various document formats, with particular emphasis on:
File Type | Extension | Description |
---|---|---|
Excel Spreadsheets | .xlsx | Microsoft Excel workbooks (specifically mentioned in transcript) |
PDF Documents | Adobe Portable Document Format files | |
Word Documents | .docx, .doc | Microsoft Word documents |
PowerPoint Presentations | .pptx, .ppt | Microsoft PowerPoint slides |
Text Files | .txt | Plain text documents |
HTML Files | .html, .htm | Web page documents |
JSON Files | .json | JavaScript Object Notation files |
CSV Files | .csv | Comma-separated values files |
Using the Files Connectorโ
Basic Upload Processโ
- Navigate to Knowledge Management > Add Knowledge
- Find and click on the "Files" option in the organization connector section
- Click the plus button to expand the options
- Provide a descriptive knowledge name
- Upload files using one of these methods:
- Drag and drop files onto the upload area
- Click the browse button to select files from your computer
- Wait for the green progress bar to reach 100% for each file
- Set group permissions by clicking "Group Permissions" and selecting the appropriate groups
- Click "Save" to finalize and process the files
File Management Optionsโ
During the upload process, you have several options:
- Click the delete button to remove a file before saving
- Click "Attach More Documents" to add additional files
- Monitor the upload progress via the green progress bar
File Size Limitationsโ
- Each file can be up to 100MB in size
- Minimum of two files can be uploaded
- The maximum number of files can be configured in application settings
- This limit can be adjusted by admins or super admins
- Found in Application Settings > File Upload Limits
Working with Added Filesโ
After saving your file knowledge:
- You'll be redirected to "Existing Files" section
- Here you can view:
- Knowledge name
- Status (Enabled/Success)
- Action options (Delete)
- Review button
Review Processโ
When you click the "Review" button:
- You'll be redirected to the Manage Knowledge section for that specific file knowledge
- Here you can see:
- Knowledge name
- Last indexed date
- Configuration details (file paths)
- Creator information (who added the knowledge)
- Group permissions (which groups the knowledge is shared with)
- Indexing attempts with timestamps
Monitoring Processing Statusโ
To check the processing status:
- In the Indexing Attempts section, look for "Time Started" information
- Click "View Logs" to see detailed backend processing information
- A popup will appear showing all processing details
- When status changes from "Enabled" to "Success," your knowledge is ready to use
Knowledge Objectsโ
After successful processing, you can access:
- All extracted file content in the Knowledge Objects section
- Each file will have its own entry with extracted data
- This content is now searchable through the search functionality
Reindexing Using Visionโ
The Files Connector offers an advanced feature called "Reindex using Vision" that leverages multimodal AI to extract more detailed information from uploaded documents by processing them as images.
Accessing the Reindex Using Vision Featureโ
- Navigate to Knowledge Management > Manage Knowledge
- Click on the knowledge name for your file knowledge
- Go to the Knowledge Objects tab
- In the Knowledge Objects table, locate the Actions column
- Click the Preview button for the document you want to reindex
- In the Preview Data popup, you'll find a button labeled Reindex using Vision
- Click this button to initiate the vision-based reindexing process
How Reindex Using Vision Worksโ
When you activate this feature, the system performs the following steps:
- Initiates the vision loader and loads the embedding model
- Converts your document (e.g., PDF) into a series of images
- Processes these images through a multimodal AI model
- Extracts more detailed information from visual elements in the document
- Creates a new, enhanced index of the document content
Monitoring the Reindexing Processโ
After initiating vision-based reindexing:
- A new job will appear in the Indexing Attempts section
- You can monitor progress through the logs by clicking the View Logs button
- The logs will show steps like:
- "Initiating vision loader"
- "Loading embedding model"
- "Starting convert PDF to images"
- "Complete conversion of PDF to images"
- "Extracting using vision"
Benefits of Vision-Based Reindexingโ
This feature provides several advantages:
- Enhanced Content Extraction: Captures information that text-only processing might miss
- Improved Visual Element Processing: Better handles documents with charts, diagrams, and tables
- More Comprehensive Indexing: Creates a more complete representation of document content
- Better Search Results: Enables more accurate responses to queries about the document
When to Use Vision-Based Reindexingโ
Consider using this feature for:
- Documents with complex layouts or formatting
- Content with significant visual elements like charts and diagrams
- Scanned documents where text extraction is suboptimal
- Technical documentation where visual precision is important
Note that vision-based reindexing is more computationally intensive and may take longer than standard indexing, especially for large documents with many pages.
Permission Managementโ
Control who can access your uploaded files:
- During upload, select the user groups that should have access
- Multiple groups can be selected for broader access
- Only users in the selected groups will be able to search and view this content
- Permissions can be edited later through the knowledge management interface
Best Practicesโ
- Organize Related Files: Upload related documents together as a single knowledge source
- Use Descriptive Names: Name your knowledge sources clearly for easy identification
- Check File Quality: Ensure documents are properly formatted before uploading
- Text Recognition: For scanned PDFs, use OCR (Optical Character Recognition) before uploading
- Regular Updates: Replace outdated documents with new versions as needed
Troubleshootingโ
Issue | Solution |
---|---|
Upload fails | Check file size and format, ensure it's within limits |
Content not extracted correctly | Verify the file isn't corrupted or password-protected |
Text appears garbled | Ensure the document uses standard encoding |
Tables not properly processed | Consider converting complex tables to simpler formats |
Images missing | Note that the connector primarily extracts text content |
Content Processing Detailsโ
Understanding how different file types are processed can help you prepare optimal documents:
PDF Documentsโ
- Text content is extracted page by page
- Basic formatting information is preserved when possible
- Scanned PDFs without OCR may not yield searchable text
- Complex layouts may be simplified during extraction
Word Documentsโ
- Text, tables, lists, and basic formatting are preserved
- Embedded images are noted but not fully processed
- Comments and tracked changes can be included or excluded
- Complex features like macros are not processed
Excel Spreadsheetsโ
- Cell values and formulas are extracted
- Sheet names and structure are preserved
- Charts are noted but not fully rendered
- Cell formatting may be simplified
PowerPoint Presentationsโ
- Slide text and basic structure are extracted
- Speaker notes can be included
- Slide titles are used as section headings
- Animations and transitions are not processed
FAQโ
Q: Can I upload password-protected documents?
A: No, files with password protection cannot be processed. Remove protection before uploading.
Q: How are file updates handled?
A: You'll need to create a new knowledge source with updated files. The system doesn't automatically track versions.
Q: Can I delete files after they've been processed?
A: Yes, but you should delete the entire knowledge source through the management interface.
Q: How are very large documents handled?
A: Large documents are broken into smaller chunks for processing but will appear as a single document in search results.
Q: Can I upload executable files or scripts?
A: No, for security reasons, executable files (.exe, .bat, .sh, etc.) are not supported.
Related Featuresโ
To get the most out of your uploaded files, consider using these related features:
- Knowledge Sets: Group related file uploads into unified collections
- FAQ Management: Create frequently asked questions based on document content
- Result Ranking: Adjust search relevance for important documents
- Knowledge Objects: View and manage individual content pieces extracted from documents
By following this guide, you can effectively use the Files Connector to incorporate your important documents into the knowledge management system, making their content searchable and accessible.