Ability to extend supported extension - file parser mapping #3690#3683
Conversation
|
I moved it to "draft" to add specific logging checks in tests. |
jmix-search/search/search.gradle
Outdated
| testImplementation 'org.junit.vintage:junit-vintage-engine' | ||
| testImplementation 'org.spockframework:spock-core' | ||
| testImplementation 'org.mockito:mockito-core' | ||
| testImplementation 'org.spockframework:spock-core' |
|
|
||
| package io.jmix.search.exception; | ||
|
|
||
| public abstract class ParserResolvingException extends Exception { |
| public class EmptyFileExtensionException extends ParserResolvingException { | ||
|
|
||
| public class UnsupportedFileFormatException extends Exception { | ||
| public static final String MESSAGE = "Extension of the file %s is empty"; |
There was a problem hiding this comment.
Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.
|
|
||
| public class UnsupportedFileExtensionException extends ParserResolvingException { | ||
|
|
||
| public static final String MESSAGE = "The file %s with '%s' extension is not supported. " + |
There was a problem hiding this comment.
Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.
| public class UnsupportedFileExtensionException extends ParserResolvingException { | ||
|
|
||
| public static final String MESSAGE = "The file %s with '%s' extension is not supported. " + | ||
| "Only following file extensions are supported %s."; |
There was a problem hiding this comment.
I thought we decided not to specify which extensions are allowed
There was a problem hiding this comment.
I left it so because the problem with "fogotten extensions" had been solved.
| this.parserSupplier = parserSupplier; | ||
| } | ||
|
|
||
| public String getSymbols() { |
There was a problem hiding this comment.
I did so to make this field name be different from the class name that countains a "extension" word.
| import java.util.Optional; | ||
|
|
||
| @Component | ||
| @Component("search_FileProcessor") |
There was a problem hiding this comment.
We don't change public API. Bean name is public API
| private static final Logger log = LoggerFactory.getLogger(FileProcessor.class); | ||
|
|
||
| protected FileStorageLocator fileStorageLocator; | ||
| private final FileParserResolver fileParserResolver; |
| return stringWriter.toString(); | ||
| } | ||
|
|
||
| protected Parser getParser(FileRef fileRef) throws UnsupportedFileFormatException { |
There was a problem hiding this comment.
Revert both methods and call fileParserResolver from them.
There was a problem hiding this comment.
Fixed. But this class is in the package is annotated with an "Internal" annotation.
|
|
||
| import java.util.function.Supplier; | ||
|
|
||
| public enum SupportedFileExtensions { |
There was a problem hiding this comment.
-
What is the point of this enum? It still cannot be extended if needed + you've created
FileParserResolverbean to iterate over available constants which can be easily done in the enum itself (see any enum for entity attribute). -
Remember that the issue that you fix is a bug, not a feature. But all these changes to much for a bug fix.
There was a problem hiding this comment.
- The main goal of using enums is much more safe using them as string constansts in the "switch" expressions. If the new item is added to a "enum" and the item is not added to the correspondant "swith" constructions the compile time error will occured. Another point is to incapsulate the string constants and the parsers into the "enum" constants. This makes code more clean and reliable.
The FileParserResolver have been created to icapsulate getting a parser into a "enum" counstant. If we should add any new extention we will have to point the correspondant parser in the same time. And there are no an ability to do forget to do it. Such incupsulating makes string link between the string constant of the extension(such as "txt" or "doc") and the parser. If we do so a proggamer can't add any new extension and not add a parser for it. As a result of adding such "enum" item a programmer doesn't need to add any changes to algoritms. The case of using the new file format will be supported automatically.
- Except the small change in the io.jmix.search.utils.FileProcessor#getParser method signature there are no any other breaking changes in the public API. Additionaly the changing type of the exception of this method was approved by Ivan Gavrilov and this change is already commited to the release 2.3 and to the "master" branches. The other changes are not breaking.
There was a problem hiding this comment.
And the io.jmix.search.utils.FileProcessor class is in the "internal" marked package. And with "The boy scout rule".
There was a problem hiding this comment.
Additionally I found at least two places in code with incorrect behaviour(excepting the main problem).
|
|
||
| @Component("search_PDFParserResolver") | ||
| @Order(100) | ||
| public class PDFParserResolver implements FileParserResolver { |
There was a problem hiding this comment.
Name and extension do not match
There was a problem hiding this comment.
Fixed and covered by the unit test
|
|
||
| @Component("search_RTFParserResolver") | ||
| @Order(100) | ||
| public class RTFParserResolver implements FileParserResolver { |
There was a problem hiding this comment.
Name and extension do not match
jmix-search/search/search.gradle
Outdated
| testImplementation 'org.junit.vintage:junit-vintage-engine' | ||
| testImplementation 'org.spockframework:spock-core' | ||
| testImplementation 'org.mockito:mockito-core' | ||
| testImplementation 'org.spockframework:spock-core' |
| * Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "DOCX", "DOCX"]. | ||
| * @return collection of supported extensions | ||
| */ | ||
| List<String> getExtension(); |
There was a problem hiding this comment.
getSupportedExtensions()- Usually, when we have a collection of beans that may or may not be applied, they either have a method that returns
true/falsewhether this bean can be used, e.g.boolean supports(...)(ConditionGenerator,AuthenticationLocaleResolver, etc. just try to search for such method). Another option is to have a@Nullablemethod and the first bean which return a value is used, e.g.ComponentGenerationStrategy.
The point is that the calling code should not check applicability itself. And one of the reasons is that the supports method can be implemented differently for each bean.
In this case, the option with the supports method is better then @Nullable return value.
Ok, you can leave getExtension() method to collect all available extensions for an exception, just rename it to getSupportedExtensions()
There was a problem hiding this comment.
The more flexible engine was implemented. This method was moved to the abstract class. The more general method "getCriteriaDescription()" was described instead of it.
| public interface FileParserResolver { | ||
|
|
||
| /** | ||
| * Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "DOCX", "DOCX"]. |
There was a problem hiding this comment.
None of FileParserResolver support uppercase extensions
There was a problem hiding this comment.
It is not an implementation. The interface gives the ability to provide both lowercase and uppercase extensions.
| def parserResolver = new FileParserResolverManager(List.of(resolver, resolver2)) | ||
|
|
||
| expect: | ||
| parserResolver.getParser(createFileRefMock("docx")) == parser1 |
There was a problem hiding this comment.
make tests for uppercase extensions
| "rtf" | RTFParser | ||
| "odt" | OpenDocumentParser | ||
| "ods" | OpenDocumentParser | ||
| "doc" | OfficeParser |
| this.fileParserResolvers = fileParserResolvers; | ||
| } | ||
|
|
||
| public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException { |
There was a problem hiding this comment.
The class is in the internal package. Is it neccessary to add JavaDoc for all its methods?
|
|
||
| public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException { | ||
| if (fileParserResolvers.isEmpty()) { | ||
| throw new EmptyFileParserResolversList(); |
There was a problem hiding this comment.
- Exception classes must be named in form of
FooException - You don't need a custom exception class until you explicitly handle it. Throw
IllegalStateException
| * An exception that is thrown when a user added some file of the type that is not supported | ||
| * and there are no any known parser for. | ||
| */ | ||
| public class UnsupportedFileTypeException extends Exception { |
There was a problem hiding this comment.
For 2.3 you have a different exception class name. You cannot rename public API, so either revert prev name or rename it in 2.3 until we published a next bug fix release.
| package io.jmix.search.exception; | ||
|
|
||
| import org.apache.commons.io.FilenameUtils; | ||
| public class EmptyFileParserResolversList extends RuntimeException { |
There was a problem hiding this comment.
- Exception class must be named in form
FooException - You don't need a custom exception class until you explicitly handle it.
There was a problem hiding this comment.
This exception was deleted. Fixed.
| @Component("search_OldOfficeDocumentsParserResolver") | ||
| @Order(100) | ||
| public class OldMSOfficeDocumentsParserResolver extends AbstractExtensionBasedFileParserResolver { | ||
| @Override |
There was a problem hiding this comment.
According to https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Java
Empty line after class or interface definition
P.S. fix all classes
| import java.util.List; | ||
|
|
||
| /** | ||
| * Is a part of the extendable engine the gives an ability to implement custom file parser resolvers and to support |
There was a problem hiding this comment.
JavaDoc is written for API users, not for developers who will implement it. For example:
Interface to be implemented by a custom file parser resolver
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
| public interface FileParserResolver { | ||
|
|
||
| /** | ||
| * This method should return the description that describes the constraints or the constraint for the files |
There was a problem hiding this comment.
JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
| Parser getParser(); | ||
|
|
||
| /** | ||
| * This method should implement the logic for checking |
There was a problem hiding this comment.
JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.
Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs
| "Only the following file parsing criteria are supported:\n -%s"; | ||
|
|
||
| /** | ||
| * @param fileName - the name of the file which type is not supported |
| public abstract class AbstractExtensionBasedFileParserResolver implements FileParserResolver { | ||
|
|
||
| /** | ||
| * Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "docx", "DOCX"]. |
There was a problem hiding this comment.
Do we have to explicitly specify both lowercase and uppercase extensions now?
Lets require (and mention it in javadoc) lowercase only and convert the actual extension to lowercase within support method.
There was a problem hiding this comment.
We discussed with Gleb such option but we desided not to implement it within this issue. It will be implemented in a separate issue. #3696
There was a problem hiding this comment.
The JavaDoc was corrected
| * | ||
| * @return collection of supported extensions | ||
| */ | ||
| public abstract List<String> getSupportedExtensions(); |
There was a problem hiding this comment.
Set is better for contains operation than List.
| public interface FileParserResolver { | ||
|
|
||
| /** | ||
| * This method should return the description that describes the constraints or the constraint for the files |
There was a problem hiding this comment.
Method name is getCriteriaDescription but javadoc tells about "constraints".
| * | ||
| * @return criteria description | ||
| */ | ||
| String getCriteriaDescription(); |
There was a problem hiding this comment.
Do we actually need this public API method?
Isn't it better to delegate this logic to the final consumer? The only purpose of this is to generate message like 'The file extension should be one of the following: ...'.
There was a problem hiding this comment.
The consumer doesn't know anything about the file checking criteria that are implemented in the resolvers. The aim was to give to the user ability to get comprehensive information what is going wrong. If we remove this method we just could say that "A resolver(and parser) for the file couldn't be found".
There was a problem hiding this comment.
The message of the AbstractExtensionBasedFileParserResolver was corrected.
There was a problem hiding this comment.
The method was left without changes as it was discussed.
| * A search principle is based on the sequential applying FileParserResolver objects' checks for the given file. | ||
| */ | ||
| @Component("search_FileParserResolverManager") | ||
| public class FileParserResolverManager { |
There was a problem hiding this comment.
This class consumes existing FileParserResolvers internally, but doesn't provide outside anything about them.
It provides the final Parsers.
So maybe this class should be named as FileParserProvider?
| if (resolver.supports(fileRef)) { | ||
| return resolver.getParser(); | ||
| } | ||
| messages.add(resolver.getCriteriaDescription()); |
There was a problem hiding this comment.
If we try to process completely unsupported file it will collect "criteria descriptions" from all existing resolvers.
As a result the final exception message will contain multiple "The file extension should be one of the following: ..." expressions.
That is the point in favor of the FileParserResolver#getCriteriaDescription method irrelevance.
This loop should collect extensions supported by unsuitable resolvers and provide them to exception if non of existing resolvers is suitable. And exception generates the final message based on provided extensions.
There was a problem hiding this comment.
Discussed. Left without changes.
| * @return an instance of a file parser | ||
| */ | ||
| Parser getParser(); | ||
| FileParsingBundle getParsingBundle(); |
There was a problem hiding this comment.
Maybe FileParserKit?
|
|
||
| public record FileParsingBundle( | ||
| @NotNull Parser parser, | ||
| @NotNull Function<StringWriter, BodyContentHandler> bodyContentHandlerGenerator, |
There was a problem hiding this comment.
Use ContentHandler.
BodyContentHandler is a specific implementation.
57e81d9 to
59c6a88
Compare




#3690