Cited by What comparing deep neural networks can teach us about human vision

Fine-grained knowledge about manipulable objects is well-predicted by CLIP DOI

Jon Walbrin, Nikita Sossounov, Morteza Mahdiani

и другие.

Опубликована: Май 11, 2024

Object recognition is an important human ability that relies on distinguishing between similar objects, for example, deciding which kitchen utensil(s) to use at different stages of meal preparation. Recent work describes the fine-grained organization knowledge about manipulable objects via study constituent dimensions are most relevant behavior, vision, manipulation, and function-based object properties. A logical extension this concerns whether or not these uniquely human, can be approximated by deep learning. Here, we show behavioral well-predicted a state-of-the-art multimodal network trained large diverse set image-text pairs - CLIP-ViT part, also generate good predictions behavior previously unseen objects. Moreover, model vastly outperforms comparison networks pre-trained with smaller, image-only training datasets. These results demonstrate impressive capacity approximate knowledge. We discuss possible sources benefit relative other tested models (e.g. pre-training vs. image only pre-training, dataset size, architecture).

Язык: Английский

Процитировано

Convolutional architectures are cortex-aligned de novo DOI

Atlas Kazemian, Eric Elmoznino, Michael Bonner

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 14, 2024

ABSTRACT What underlies the emergence of cortex-aligned representations in deep neural network models vision? The success widely varied architectures has motivated prevailing hypothesis that large-scale pre-training is primary factor underlying similarities between brains and networks. Here, we challenge this view by revealing role architectural inductive biases with minimal training. We examined networks but no quantified their ability to predict image visual cortices both monkeys humans. found emerge convolutional combine two key manipulations dimensionality: compression spatial domain expansion feature domain. further show are critical for obtaining performance gains from expansion—dimensionality were relatively ineffective other targeted lesions. Our findings suggest constraints sufficiently close biological vision allow many aspects cortical representation even before synaptic connections have been tuned through experience.

Язык: Английский

Процитировано

Distributed representations of behavior-derived object dimensions in the human visual system DOI

Oliver Contier, Chris I. Baker, Martin N. Hebart

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Авг. 24, 2023

Abstract Object vision is commonly thought to involve a hierarchy of brain regions processing increasingly complex image features, with high-level visual cortex supporting object recognition and categorization. However, supports diverse behavioral goals, suggesting basic limitations this category-centric framework. To address these limitations, we mapped series dimensions derived from large-scale analysis human similarity judgments directly onto the brain. Our results reveal broadly distributed representations behaviorally-relevant information, demonstrating selectivity wide variety novel while capturing known selectivities for features categories. Behavior-derived were superior categories at predicting responses, yielding mixed in much sparse category-selective clusters. This framework reconciles seemingly disparate findings regarding regional specialization, explaining category as special case response profiles among representational dimensions, more expansive view on

Язык: Английский

Процитировано

Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain DOI

Thirza Dado, Paolo Papale, Antonio Lozano

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Апрель 28, 2023

Abstract A challenging goal of neural coding is to characterize the representations underlying visual perception. To this end, multi-unit activity (MUA) macaque cortex was recorded in a passive fixation task upon presentation faces and natural images. We analyzed relationship between MUA latent state-of-the-art deep generative models, including conventional feature-disentangled adversarial networks (GANs) (i.e., z - w -latents StyleGAN, respectively) language-contrastive diffusion CLIP-latents Stable Diffusion). mass univariate encoding analysis showed that outperform both CLIP explaining responses. Further, -latent features were found be positioned at higher end complexity gradient which indicates they capture information relevant high-level activity. Subsequently, multivariate decoding resulted spatiotemporal reconstructions Taken together, our results not only highlight important role feature-disentanglement shaping perception but also serve as an benchmark for future coding. Author summary Neural seeks understand how brain represents world by modeling stimuli internal thereof. This field focuses on predicting responses (neural encoding) deciphering about from decoding). Recent advances (GANs; type machine learning model) have enabled creation photorealistic Like brain, GANs images create, referred “latents”. More recently, new “ -latent” has been developed more effectively separates different image (e.g., color; shape; texture). In study, we presented such GAN-generated pictures with cortical implants accurate predictors then used these reconstruct perceived high fidelity. The remarkable similarities predictions actual targets indicate alignment represent same stimulus, even though never optimized data. implies general principle shared phenomena, emphasizing importance feature disentanglement deeper areas.

Язык: Английский

Процитировано

What comparing deep neural networks can teach us about human vision DOI

Katja Seeliger, Martin N. Hebart

Опубликована: Янв. 24, 2024

Recent work has demonstrated impressive parallels between human visual representations and those found in deep neural networks. A new study by Wang et al. (2023) highlights what factors may determine this similarity. (commentary)

Язык: Английский

Процитировано